date:20150225

[1/2] drill git commit: DRILL-2130: Fixed JUnit/Hamcrest/Mockito/Paranamer class path problem.

2015-02-25 Thread adi

Repository: drill
Updated Branches:
  refs/heads/master 8bb6b08e5 - f7ef5ec78


DRILL-2130: Fixed JUnit/Hamcrest/Mockito/Paranamer class path problem.


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/b0faf708
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/b0faf708
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/b0faf708

Branch: refs/heads/master
Commit: b0faf708bdbeb53bc3a446d3782554640bdfd6df
Parents: 8bb6b08
Author: dbarclay dbarc...@maprtech.com
Authored: Sun Feb 22 00:45:42 2015 -0800
Committer: Aditya Kishore a...@apache.org
Committed: Wed Feb 25 11:08:20 2015 -0800

--
 ...rill2130CommonHamcrestConfigurationTest.java | 46 
 ...30StorageHBaseHamcrestConfigurationTest.java | 46 
 ...torageHiveCoreHamcrestConfigurationTest.java | 46 
 ...130InterpreterHamcrestConfigurationTest.java | 46 
 exec/java-exec/pom.xml  |  9 
 ...ll2130JavaExecHamcrestConfigurationTest.java | 46 
 ...ll2130JavaJdbcHamcrestConfigurationTest.java | 46 
 pom.xml |  8 
 8 files changed, 293 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/drill/blob/b0faf708/common/src/test/java/org/apache/drill/test/Drill2130CommonHamcrestConfigurationTest.java
--
diff --git 
a/common/src/test/java/org/apache/drill/test/Drill2130CommonHamcrestConfigurationTest.java
 
b/common/src/test/java/org/apache/drill/test/Drill2130CommonHamcrestConfigurationTest.java
new file mode 100644
index 000..99643b1
--- /dev/null
+++ 
b/common/src/test/java/org/apache/drill/test/Drill2130CommonHamcrestConfigurationTest.java
@@ -0,0 +1,46 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.test;
+
+import org.junit.Test;
+import static org.junit.Assert.assertThat;
+import static org.junit.Assert.fail;
+import static org.hamcrest.CoreMatchers.equalTo;
+
+
+public class Drill2130CommonHamcrestConfigurationTest {
+
+  @SuppressWarnings(unused)
+  private org.hamcrest.MatcherAssert forCompileTimeCheckForNewEnoughHamcrest;
+
+  @Test
+  public void testJUnitHamcrestMatcherFailureWorks() {
+try {
+  assertThat( 1, equalTo( 2 ) );
+}
+catch ( NoSuchMethodError e ) {
+  fail( Class search path seems broken re new JUnit and old Hamcrest.
+ +   Got NoSuchMethodError;  e:  + e );
+}
+catch ( AssertionError e ) {
+  System.out.println( Class path seems fine re new JUnit vs. old 
Hamcrest.
+  +  (Got AssertionError, not NoSuchMethodError.) );
+}
+  }
+
+}

http://git-wip-us.apache.org/repos/asf/drill/blob/b0faf708/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/test/Drill2130StorageHBaseHamcrestConfigurationTest.java
--
diff --git 
a/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/test/Drill2130StorageHBaseHamcrestConfigurationTest.java
 
b/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/test/Drill2130StorageHBaseHamcrestConfigurationTest.java
new file mode 100644
index 000..b52654d
--- /dev/null
+++ 
b/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/test/Drill2130StorageHBaseHamcrestConfigurationTest.java
@@ -0,0 +1,46 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS

[2/2] drill git commit: DRILL-1690: Issue with using HBase plugin to access row_key only

2015-02-25 Thread adi

DRILL-1690: Issue with using HBase plugin to access row_key only


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/f7ef5ec7
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/f7ef5ec7
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/f7ef5ec7

Branch: refs/heads/master
Commit: f7ef5ec784844a99b8b39fe10ab14f001ae149f2
Parents: b0faf70
Author: Aditya Kishore a...@apache.org
Authored: Wed Feb 25 01:10:48 2015 -0800
Committer: Aditya Kishore a...@apache.org
Committed: Wed Feb 25 11:17:06 2015 -0800

--
 .../exec/store/hbase/HBaseRecordReader.java | 35 +++-
 1 file changed, 19 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/drill/blob/f7ef5ec7/contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java
--
diff --git 
a/contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java
 
b/contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java
index da38707..42038e8 100644
--- 
a/contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java
+++ 
b/contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java
@@ -72,6 +72,8 @@ public class HBaseRecordReader extends AbstractRecordReader 
implements DrillHBas
   private Configuration hbaseConf;
   private OperatorContext operatorContext;
 
+  private boolean rowKeyOnly;
+
   public HBaseRecordReader(Configuration conf, HBaseSubScan.HBaseSubScanSpec 
subScanSpec,
   ListSchemaPath projectedColumns, FragmentContext context) throws 
OutOfMemoryException {
 hbaseConf = conf;
@@ -87,8 +89,8 @@ public class HBaseRecordReader extends AbstractRecordReader 
implements DrillHBas
   @Override
   protected CollectionSchemaPath transformColumns(CollectionSchemaPath 
columns) {
 SetSchemaPath transformed = Sets.newLinkedHashSet();
+rowKeyOnly = true;
 if (!isStarQuery()) {
-  boolean rowKeyOnly = true;
   for (SchemaPath column : columns) {
 if (column.getRootSegment().getPath().equalsIgnoreCase(ROW_KEY)) {
   transformed.add(ROW_KEY_PATH);
@@ -116,6 +118,7 @@ public class HBaseRecordReader extends AbstractRecordReader 
implements DrillHBas
 HBaseUtils.andFilterAtIndex(hbaseScan.getFilter(), 
HBaseUtils.LAST_FILTER, new FirstKeyOnlyFilter()));
   }
 } else {
+  rowKeyOnly = false;
   transformed.add(ROW_KEY_PATH);
 }
 
@@ -131,7 +134,6 @@ public class HBaseRecordReader extends AbstractRecordReader 
implements DrillHBas
 this.operatorContext = operatorContext;
   }
 
-
   @Override
   public void setup(OutputMutator output) throws ExecutionSetupException {
 this.outputMutator = output;
@@ -197,22 +199,23 @@ public class HBaseRecordReader extends 
AbstractRecordReader implements DrillHBas
   if (rowKeyVector != null) {
 rowKeyVector.getMutator().setSafe(rowCount, cells[0].getRowArray(), 
cells[0].getRowOffset(), cells[0].getRowLength());
   }
+  if (!rowKeyOnly) {
+for (Cell cell : cells) {
+  int familyOffset = cell.getFamilyOffset();
+  int familyLength = cell.getFamilyLength();
+  byte[] familyArray = cell.getFamilyArray();
+  MapVector mv = getOrCreateFamilyVector(new String(familyArray, 
familyOffset, familyLength), true);
 
-  for (Cell cell : cells) {
-int familyOffset = cell.getFamilyOffset();
-int familyLength = cell.getFamilyLength();
-byte[] familyArray = cell.getFamilyArray();
-MapVector mv = getOrCreateFamilyVector(new String(familyArray, 
familyOffset, familyLength), true);
+  int qualifierOffset = cell.getQualifierOffset();
+  int qualifierLength = cell.getQualifierLength();
+  byte[] qualifierArray = cell.getQualifierArray();
+  NullableVarBinaryVector v = getOrCreateColumnVector(mv, new 
String(qualifierArray, qualifierOffset, qualifierLength));
 
-int qualifierOffset = cell.getQualifierOffset();
-int qualifierLength = cell.getQualifierLength();
-byte[] qualifierArray = cell.getQualifierArray();
-NullableVarBinaryVector v = getOrCreateColumnVector(mv, new 
String(qualifierArray, qualifierOffset, qualifierLength));
-
-int valueOffset = cell.getValueOffset();
-int valueLength = cell.getValueLength();
-byte[] valueArray = cell.getValueArray();
-v.getMutator().setSafe(rowCount, valueArray, valueOffset, valueLength);
+  int valueOffset = cell.getValueOffset();
+  int valueLength = cell.getValueLength();
+  byte[] valueArray = cell.getValueArray();
+  v.getMutator().setSafe(rowCount,

svn commit: r1662344 [8/8] - in /drill/site/trunk/content/drill: ./ blog/2014/12/11/apache-drill-qa-panelist-spotlight/ docs/ docs/2014-q1-drill-report/ docs/advanced-properties/ docs/analyzing-yelp-j

2015-02-25 Thread adi

Modified: 
drill/site/trunk/content/drill/docs/supported-date-time-data-type-formats/index.html
URL: 
http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/supported-date-time-data-type-formats/index.html?rev=1662344r1=1662343r2=1662344view=diff
==
--- 
drill/site/trunk/content/drill/docs/supported-date-time-data-type-formats/index.html
 (original)
+++ 
drill/site/trunk/content/drill/docs/supported-date-time-data-type-formats/index.html
 Thu Feb 26 01:16:43 2015
@@ -136,14 +136,14 @@ Apache Drill does not support codetime
/tr
   /tbody
  /table
-  
+
 ## Time
 
 Drill supports the `time` data type in the following format:
 
 HH:mm:ss.SSS (hour:minute:sec.milliseconds)
 
-The following table provides some examples for the` time` data type:
+The following table provides some examples for the `time` data type:
 
 tabletbodytr
   thUse/th
@@ -162,7 +162,6 @@ The following table provides some exampl
   td colspan=1 valign=topcodespan style=color: rgb(0,0,0);select 
cast(time_col as time) from dfs.`/tmp/input.json`;/span/code/td
 /tr/tbody
 /table
-  
 
 h2 id=intervalInterval/h2
 
@@ -174,7 +173,7 @@ The following table provides some exampl
 supports the codeinterval/code data type in the following format:/p
 div class=highlightprecode class=language-text data-lang=textP 
[qty] Y [qty] M
 /code/pre/div
-pThe following table provides examples for codeinterval year/codedata 
type:/p
+pThe following table provides examples for codeinterval year/code data 
type:/p
 
 table tbodytr
 thUse/th
@@ -190,7 +189,6 @@ supports the codeinterval/code data
 td colspan=1 valign=topcodespan style=color: rgb(0,0,0);select 
cast(col as interval year) from dfs.`/tmp/input.json`;/span/code/td
   /tr
   /tbody/table 
-  
 
 h3 id=interval-dayInterval Day/h3
 
@@ -201,15 +199,14 @@ supports the codeinterval day/code d
 /code/pre/div
 pThe following table provides examples for codeinterval day/code data 
type:/p
 
-div class=table-wraptable class=confluenceTabletbodytrth 
class=confluenceThUse/thth class=confluenceThExample/th/trtrtd 
valign=topLiteral/tdtd valign=topcodespan style=color: 
rgb(0,0,0);select interval '1 10:20:30.123' day to second from 
dfs.`/tmp/input.json`;br //spanspan style=color: rgb(0,0,0);select 
interval '1 10' day to hour from dfs.`/tmp/input.json`;br //spanspan 
style=color: rgb(0,0,0);select interval '10' day  from 
dfs.`/tmp/input.json`;br //spanspan style=color: rgb(0,0,0);select 
interval '10' hour  from dfs.`/tmp/input.json`;/span/codecodespan 
style=color: rgb(0,0,0);select interval '10.999' second  from 
dfs.`/tmp/input.json`;/span/code/td/trtrtd colspan=1 
valign=topcodeJSON/code Input/tdtd colspan=1 
valign=topcodespan style=color: rgb(0,0,0);{quot;colquot; : 
quot;P1DT10H20M30Squot;}br //spanspan style=color: 
rgb(0,0,0);{quot;colquot; : quot;P1DT
 10H20M30.123Squot;}br //spanspan style=color: 
rgb(0,0,0);{quot;colquot; : quot;P1Dquot;}br //spanspan 
style=color: rgb(0,0,0);{quot;colquot; : quot;PT10Hquot;}br 
//spanspan style=color: rgb(0,0,0);{quot;colquot; : 
quot;PT10.10Squot;}br //spanspan style=color: 
rgb(0,0,0);{quot;colquot; : quot;PT20Squot;}br //spanspan 
style=color: rgb(0,0,0);{quot;colquot; : 
quot;PT10H10Squot;}/span/code/td/trtrtd colspan=1 
valign=topcodeCAST/code from codeVARCHAR/code/tdtd colspan=1 
valign=topcodespan style=color: rgb(0,0,0);select cast(col as interval 
day) from dfs.`/tmp/input.json`;/span/code/td/tr/tbody/table/div 
-  
+table tbodytrth Use/thth Example/th/trtrtd 
valign=topLiteral/tdtd valign=topcodespan style=color: 
rgb(0,0,0);select interval '1 10:20:30.123' day to second from 
dfs.`/tmp/input.json`;br //spanspan style=color: rgb(0,0,0);select 
interval '1 10' day to hour from dfs.`/tmp/input.json`;br //spanspan 
style=color: rgb(0,0,0);select interval '10' day  from 
dfs.`/tmp/input.json`;br //spanspan style=color: rgb(0,0,0);select 
interval '10' hour  from dfs.`/tmp/input.json`;/span/codecodespan 
style=color: rgb(0,0,0);select interval '10.999' second  from 
dfs.`/tmp/input.json`;/span/code/td/trtrtd colspan=1 
valign=topcodeJSON/code Input/tdtd colspan=1 
valign=topcodespan style=color: rgb(0,0,0);{quot;colquot; : 
quot;P1DT10H20M30Squot;}br //spanspan style=color: 
rgb(0,0,0);{quot;colquot; : quot;P1DT10H20M30.123Squot;}br 
//spanspan style=color: rgb(0,0,0);{quot;colquot; : q
 uot;P1Dquot;}br //spanspan style=color: rgb(0,0,0);{quot;colquot; : 
quot;PT10Hquot;}br //spanspan style=color: 
rgb(0,0,0);{quot;colquot; : quot;PT10.10Squot;}br //spanspan 
style=color: rgb(0,0,0);{quot;colquot; : quot;PT20Squot;}br 
//spanspan style=color: rgb(0,0,0);{quot;colquot; : 
quot;PT10H10Squot;}/span/code/td/trtrtd colspan=1 
valign=topcodeCAST/code from codeVARCHAR/code/tdtd colspan=1 
valign=topcodespan style=color: rgb(0,0,0);select cast(col as interval 
day) from dfs.`/tmp/input.json`;/span/code/td/tr/tbody/table 
 
 h2 id=literalLiteral/h2
 
-pThe following table

svn commit: r1662344 [4/8] - in /drill/site/trunk/content/drill: ./ blog/2014/12/11/apache-drill-qa-panelist-spotlight/ docs/ docs/2014-q1-drill-report/ docs/advanced-properties/ docs/analyzing-yelp-j

2015-02-25 Thread adi

Modified: drill/site/trunk/content/drill/docs/index.html
URL: 
http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/index.html?rev=1662344r1=1662343r2=1662344view=diff
==
--- drill/site/trunk/content/drill/docs/index.html (original)
+++ drill/site/trunk/content/drill/docs/index.html Thu Feb 26 01:16:43 2015
@@ -71,7 +71,7 @@
 ul
 
   
-lia href=/docs/apache-drill-documentation/Apache Drill 
Documentation/a/li
+lia href=/docs/architectural-overview/Architectural Overview/a/li
 
 
 
@@ -80,7 +80,48 @@
 
   
 
-  lia href=/docs/architectural-overview/Architectural 
Overview/a/li
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  lia href=/docs/core-modules-within-a-drillbit/Core Modules 
within a Drillbit/a/li
+  
+  
+  
+
+  
+
+  lia href=/docs/architectural-highlights/Architectural 
Highlights/a/li
   
   
   
@@ -123,226 +164,15 @@
   
 
   
-lia href=/docs/core-modules-within-a-drillbit/Core 
Modules within a Drillbit/a/li
-  
-
-  
-lia href=/docs/architectural-highlights/Architectural 
Highlights/a/li
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
 
   
 
   
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-/ul
-  
-
-  
-
-  lia href=/docs/apache-drill-tutorial/Apache Drill 
Tutorial/a/li
-  
-  
-  
-ul
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
-  
-
+lia href=/docs/flexibility/Flexibility/a/li
   
 
   
-
+lia href=/docs/performance/Performance/a/li

svn commit: r1662344 [2/8] - in /drill/site/trunk/content/drill: ./ blog/2014/12/11/apache-drill-qa-panelist-spotlight/ docs/ docs/2014-q1-drill-report/ docs/advanced-properties/ docs/analyzing-yelp-j

2015-02-25 Thread adi

Modified: 
drill/site/trunk/content/drill/docs/apache-drill-in-10-minutes/index.html
URL: 
http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/apache-drill-in-10-minutes/index.html?rev=1662344r1=1662343r2=1662344view=diff
==
--- drill/site/trunk/content/drill/docs/apache-drill-in-10-minutes/index.html 
(original)
+++ drill/site/trunk/content/drill/docs/apache-drill-in-10-minutes/index.html 
Thu Feb 26 01:16:43 2015
@@ -85,13 +85,13 @@
 liMore Information/li
 /ul
 
-h1 id=objectiveObjective/h1
+h2 id=objectiveObjective/h2
 
 pUse Apache Drill to query sample data in 10 minutes. For simplicity, 
youâll
 run Drill in emembedded/em mode rather than emdistributed/em mode to 
try out Drill
 without having to perform any setup tasks./p
 
-h1 id=a-few-bits-about-apache-drillA Few Bits About Apache Drill/h1
+h2 id=a-few-bits-about-apache-drillA Few Bits About Apache Drill/h2
 
 pDrill is a clustered, powerful MPP (Massively Parallel Processing) query
 engine for Hadoop that can process petabytes of data, fast. Drill is useful
@@ -100,7 +100,7 @@ capable of querying nested data in forma
 performing dynamic schema discovery. Drill does not require a centralized
 metadata repository./p
 
-h3 id=_dynamic-schema-discovery-_strong_Dynamic schema discovery 
_/strong/h3
+h3 id=dynamic-schema-discoverystrongemDynamic schema 
discovery/em/strong/h3
 
 pDrill does not require schema or type specification for data in order to 
start
 the query execution process. Drill starts data processing in record-batches
@@ -144,7 +144,7 @@ extend the layer to a broader array of u
 classpath scanning and plugin concept to add additional storage plugins,
 functions, and operators with minimal configuration./p
 
-h1 id=process-overviewProcess Overview/h1
+h2 id=process-overviewProcess Overview/h2
 
 pDownload the Apache Drill archive and extract the contents to a directory on
 your machine. The Apache Drill archive contains sample JSON and Parquet files
@@ -159,19 +159,19 @@ commands. SQLLine is used as the shell f
 
 pYou must have the following software installed on your machine to run 
Drill:/p
 
-div class=table-wraptable class=confluenceTabletbodytrtd 
class=confluenceTdpstrongSoftware/strong/p/tdtd 
class=confluenceTdpstrongDescription/strong/p/td/trtrtd 
class=confluenceTdpa 
href=http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html;
 class=external-link rel=nofollowOracle JDK version 7/a/p/tdtd 
class=confluenceTdpA set of programming tools for developing Java 
applications./p/td/tr/tbody/table/div
+table tbodytrtd strongSoftware/strong/tdtd 
strongDescription/strong/td/trtrtd a 
href=http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html;
 class=external-link rel=nofollowOracle JDK version 7/a/tdtd A set 
of programming tools for developing Java applications./td/tr/tbody/table
 
 h3 id=prerequisite-validationPrerequisite Validation/h3
 
 pRun the following command to verify that the system meets the software
 prerequisite:
-table class=confluenceTabletbodytrtd 
class=confluenceTdpstrongCommand /strong/p/tdtd 
class=confluenceTdpstrongExample Output/strong/p/td/trtrtd 
class=confluenceTdpcodejava âversion/code/p/tdtd 
class=confluenceTdpcodejava version quot;1.7.0_65quot;/codebr 
/codeJava(TM) SE Runtime Environment (build 1.7.0_65-b19)/codebr 
/codeJava HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed 
mode)/code/p/td/tr/tbody/table/p
+table tbodytrtd strongCommand /strong/tdtd strongExample 
Output/strong/td/trtrtd codejava âversion/code/tdtd 
codejava version quot;1.7.0_65quot;/codebr /codeJava(TM) SE Runtime 
Environment (build 1.7.0_65-b19)/codebr /codeJava HotSpot(TM) 64-Bit 
Server VM (build 24.65-b04, mixed mode)/code/td/tr/tbody/table/p
 
-h1 id=install-drillInstall Drill/h1
+h2 id=install-drillInstall Drill/h2
 
 pYou can install Drill on a machine running Linux, Mac OS X, or Windows.  
/p
 
-h2 id=installing-drill-on-linuxInstalling Drill on Linux/h2
+h3 id=installing-drill-on-linuxInstalling Drill on Linux/h3
 
 pComplete the following steps to install Drill:/p
 
@@ -182,7 +182,7 @@ prerequisite:
 lipIssue the following command to create a new directory to which you can 
extract the contents of the Drill codetar.gz/code file:/p
 div class=highlightprecode class=language-text data-lang=textsudo 
mkdir -p /opt/drill
 /code/pre/div/li
-lipNavigate to the directory where you downloaded the Drill 
codetar.gz/code file.  /p/li
+lipNavigate to the directory where you downloaded the Drill 
codetar.gz/code file./p/li
 lipIssue the following command to extract the contents of the Drill 
codetar.gz/code file:/p
 div class=highlightprecode class=language-text data-lang=textsudo 
tar -xvzf apache-drill-lt;versiongt;.tar.gz -C /opt/drill
 /code/pre/div/li
@@ -191,9 +191,9 @@ prerequisite:
 /code/pre/div/li
 /ol
 
-pAt this point, you can a

svn commit: r1662344 [7/8] - in /drill/site/trunk/content/drill: ./ blog/2014/12/11/apache-drill-qa-panelist-spotlight/ docs/ docs/2014-q1-drill-report/ docs/advanced-properties/ docs/analyzing-yelp-j

2015-02-25 Thread adi

Modified: drill/site/trunk/content/drill/docs/release-notes/index.html
URL: 
http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/release-notes/index.html?rev=1662344r1=1662343r2=1662344view=diff
==
--- drill/site/trunk/content/drill/docs/release-notes/index.html (original)
+++ drill/site/trunk/content/drill/docs/release-notes/index.html Thu Feb 26 
01:16:43 2015
@@ -80,7 +80,7 @@ Drill has been tested against MapR, Clou
 distributions. There are associated build profiles and JIRAs that can help you
 run Drill against your preferred distribution/p
 
-pApache Drill 0.7.0 Key Features/p
+h3 id=apache-drill-0.7.0-key-featuresApache Drill 0.7.0 Key Features/h3
 
 ul
 lipNo more dependency on UDP/Multicast - Making it possible for Drill to 
work well in the following scenarios:/p
@@ -104,7 +104,7 @@ run Drill against your preferred distrib
 lipStability improvements in ODBC and JDBC drivers/p/li
 /ul
 
-pApache Drill 0.7.0 Key Notes and Limitations/p
+h3 id=apache-drill-0.7.0-key-notes-and-limitationsApache Drill 0.7.0 Key 
Notes and Limitations/h3
 
 ul
 liThe current release supports in-memory and beyond-memory execution. 
However, you must disable memory-intensive hash aggregate and hash join 
operations to leverage this functionality./li
@@ -123,18 +123,18 @@ against Apache Hadoop. Drill has been te
 Hortonworks Hadoop distributions. There are associated build profiles and
 JIRAs that can help you run Drill against your preferred distribution./p
 
-pApache Drill 0.6.0 Key Features/p
+h3 id=apache-drill-0.6.0-key-featuresApache Drill 0.6.0 Key Features/h3
 
 pThis release is primarily a bug fix release, with a 
href=https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820amp;vers%0Aion=12327472;more
 than 30 JIRAs closed/a, but there are some notable features:/p
 
 ul
-liDirect ANSI SQL access to MongoDB, using the latest a 
href=/confluence/display/DRILL/MongoDB+Plugin+for+Apache+DrillMongoDB Plugin 
for Apache Drill/a/li
+liDirect ANSI SQL access to MongoDB, using the latest a 
href=/drill/docs/mongodb-plugin-for-apache-drillMongoDB Plugin for Apache 
Drill/a/li
 liFilesystem query performance improvements with partition pruning/li
 liAbility to use the file system as a persistent store for query profiles 
and diagnostic information/li
 liWindow function support (alpha)/li
 /ul
 
-pApache Drill 0.6.0 Key Notes and Limitations/p
+h3 id=apache-drill-0.6.0-key-notes-and-limitationsApache Drill 0.6.0 Key 
Notes and Limitations/h3
 
 ul
 liThe current release supports in-memory and beyond-memory execution. 
However, you must disable memory-intensive hash aggregate and hash join 
operations to leverage this functionality./li
@@ -157,7 +157,7 @@ against Apache Hadoop. Drill has been te
 Hortonworks Hadoop distributions. There are associated build profiles and
 JIRAs that can help you run Drill against your preferred distribution./p
 
-pApache Drill 0.5.0 Key Notes and Limitations/p
+h3 id=apache-drill-0.5.0-key-notes-and-limitationsApache Drill 0.5.0 Key 
Notes and Limitations/h3
 
 ul
 liThe current release supports in memory and beyond memory execution. 
However, you must disable memory-intensive hash aggregate and hash join 
operations to leverage this functionality./li
@@ -191,7 +191,7 @@ MapR, Cloudera and Hortonworks Hadoop di
 build profiles or JIRAs that can help you run against your preferred
 distribution./p
 
-pSome Key Notes amp; Limitations/p
+h3 id=some-key-notes-amp;-limitationsSome Key Notes amp; Limitations/h3
 
 ul
 liThe current release supports in memory and beyond memory execution. 
However, users must disable memory-intensive hash aggregate and hash join 
operations to leverage this functionality./li
@@ -241,7 +241,7 @@ will be correct in a future milestone re
 liDrill Alpha does not include, there are currently a couple of differences 
for how to write a query in In order to query against/li
 /ul
 
-pUDFs/p
+h3 id=udfsUDFs/h3
 
 ul
 liDrill currently supports simple and aggregate functions using scalar, 
repeated and/li

Modified: drill/site/trunk/content/drill/docs/repeated-count-function/index.html
URL: 
http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/repeated-count-function/index.html?rev=1662344r1=1662343r2=1662344view=diff
==
--- drill/site/trunk/content/drill/docs/repeated-count-function/index.html 
(original)
+++ drill/site/trunk/content/drill/docs/repeated-count-function/index.html Thu 
Feb 26 01:16:43 2015
@@ -94,7 +94,7 @@ the count to be grouped by other columns
 this example)./p
 
 pFor another example of this function, see the following lesson in the Apache
-Drill Tutorial for Hadoop: a 
href=/conf%0Aluence/display/DRILL/Lesson+3%3A+Run+Queries+on+Complex+Data+TypesLesson
 3: Run Queries on Complex Data Types/a./p
+Drill Tutorial for Hadoop: a

[03/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

2015-02-25 Thread adi

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/query/005-query-info-skema.md
--
diff --git a/_docs/query/005-query-info-skema.md 
b/_docs/query/005-query-info-skema.md
new file mode 100644
index 000..1ad0008
--- /dev/null
+++ b/_docs/query/005-query-info-skema.md
@@ -0,0 +1,109 @@
+---
+title: Querying the INFORMATION SCHEMA
+parent: Query Data
+---
+When you are using Drill to connect to multiple data sources, you need a
+simple mechanism to discover what each data source contains. The information
+schema is an ANSI standard set of metadata tables that you can query to return
+information about all of your Drill data sources (or schemas). Data sources
+may be databases or file systems; they are all known as schemas in this
+context. You can query the following INFORMATION_SCHEMA tables:
+
+  * SCHEMATA
+  * CATALOGS
+  * TABLES
+  * COLUMNS 
+  * VIEWS
+
+## SCHEMATA
+
+The SCHEMATA table contains the CATALOG_NAME and SCHEMA_NAME columns. To allow
+maximum flexibility inside BI tools, the only catalog that Drill supports is
+`DRILL`.
+
+0: jdbc:drill:zk=local select CATALOG_NAME, SCHEMA_NAME as 
all_my_data_sources from INFORMATION_SCHEMA.SCHEMATA order by SCHEMA_NAME;
++--+-+
+| CATALOG_NAME | all_my_data_sources |
++--+-+
+| DRILL| INFORMATION_SCHEMA  |
+| DRILL| cp.default  |
+| DRILL| dfs.default |
+| DRILL| dfs.root|
+| DRILL| dfs.tmp |
+| DRILL| HiveTest.SalesDB|
+| DRILL| maprfs.logs |
+| DRILL| sys |
++--+-+
+
+The INFORMATION_SCHEMA name and associated keywords are case-sensitive. You
+can also return a list of schemas by running the SHOW DATABASES command:
+
+0: jdbc:drill:zk=local show databases;
++-+
+| SCHEMA_NAME |
++-+
+| dfs.default |
+| dfs.root|
+| dfs.tmp |
+...
+
+## CATALOGS
+
+The CATALOGS table returns only one row, with the hardcoded DRILL catalog name
+and description.
+
+## TABLES
+
+The TABLES table returns the table name and type for each table or view in
+your databases. (Type means TABLE or VIEW.) Note that Drill does not return
+files available for querying in file-based data sources. Instead, use SHOW
+FILES to explore these data sources.
+
+## COLUMNS
+
+The COLUMNS table returns the column name and other metadata (such as the data
+type) for each column in each table or view.
+
+## VIEWS
+
+The VIEWS table returns the name and definition for each view in your
+databases. Note that file schemas are the canonical repository for views in
+Drill. Depending on how you create a view, the may only be displayed in Drill
+after it has been used.
+
+## Useful Queries
+
+Run an ``INFORMATION_SCHEMA.`TABLES` ``query to view all of the tables and 
views
+within a database. TABLES is a reserved word in Drill and requires back ticks
+(`).
+
+For example, the following query identifies all of the tables and views that
+Drill can access:
+
+SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE
+FROM INFORMATION_SCHEMA.`TABLES`
+ORDER BY TABLE_NAME DESC;
+
+TABLE_SCHEMA TABLE_NAMETABLE_TYPE
+
+HiveTest.CustomersDB Customers TABLE
+HiveTest.SalesDB OrdersTABLE
+HiveTest.SalesDB OrderLinesTABLE
+HiveTest.SalesDB USOrders  VIEW
+dfs.default  CustomerSocialProfile VIEW
+
+
+**Note:** Currently, Drill only supports querying Drill views; Hive views are 
not yet supported.
+
+You can run a similar query to identify columns in tables and the data types
+of those columns:
+
+SELECT COLUMN_NAME, DATA_TYPE 
+FROM INFORMATION_SCHEMA.COLUMNS 
+WHERE TABLE_NAME = 'Orders' AND TABLE_SCHEMA = 'HiveTest.SalesDB' AND 
COLUMN_NAME LIKE '%Total';
++-++
+| COLUMN_NAME | DATA_TYPE  |
++-++
+| OrderTotal  | Decimal|
++-++
+

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/query/006-query-sys-tbl.md
--
diff --git a/_docs/query/006-query-sys-tbl.md b/_docs/query/006-query-sys-tbl.md
new file mode 100644
index 000..9b853ec
--- /dev/null
+++ b/_docs/query/006-query-sys-tbl.md
@@ -0,0 +1,159 @@
+---
+title: Querying System Tables
+parent: Query Data
+---
+Drill has a sys database that contains system tables. You can query the system
+tables for information

[05/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

2015-02-25 Thread adi

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/ngram_plugin2.png
--
diff --git a/_docs/img/ngram_plugin2.png b/_docs/img/ngram_plugin2.png
new file mode 100644
index 000..60d432d
Binary files /dev/null and b/_docs/img/ngram_plugin2.png differ

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/settings.png
--
diff --git a/_docs/img/settings.png b/_docs/img/settings.png
new file mode 100644
index 000..dcff0d9
Binary files /dev/null and b/_docs/img/settings.png differ

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/img/student_hive.png
--
diff --git a/_docs/img/student_hive.png b/_docs/img/student_hive.png
new file mode 100644
index 000..7e22b88
Binary files /dev/null and b/_docs/img/student_hive.png differ

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/install/001-drill-in-10.md
--
diff --git a/_docs/install/001-drill-in-10.md b/_docs/install/001-drill-in-10.md
new file mode 100644
index 000..13d2410
--- /dev/null
+++ b/_docs/install/001-drill-in-10.md
@@ -0,0 +1,365 @@
+---
+title: Apache Drill in 10 Minutes
+parent: Install Drill
+---
+* Objective
+* A Few Bits About Apache Drill
+* Process Overview
+* Install Drill
+  * Installing Drill on Linux
+  * Installing Drill on Mac OS X
+  * Installing Drill on Windows 
+* Start Drill 
+* Query Sample Data 
+* Summary 
+* Next Steps
+* More Information
+
+## Objective
+
+Use Apache Drill to query sample data in 10 minutes. For simplicity, youâll
+run Drill in _embedded_ mode rather than _distributed_ mode to try out Drill
+without having to perform any setup tasks.
+
+## A Few Bits About Apache Drill
+
+Drill is a clustered, powerful MPP (Massively Parallel Processing) query
+engine for Hadoop that can process petabytes of data, fast. Drill is useful
+for short, interactive ad-hoc queries on large-scale data sets. Drill is
+capable of querying nested data in formats like JSON and Parquet and
+performing dynamic schema discovery. Drill does not require a centralized
+metadata repository.
+
+### **_Dynamic schema discovery_**
+
+Drill does not require schema or type specification for data in order to start
+the query execution process. Drill starts data processing in record-batches
+and discovers the schema during processing. Self-describing data formats such
+as Parquet, JSON, AVRO, and NoSQL databases have schema specified as part of
+the data itself, which Drill leverages dynamically at query time. Because
+schema can change over the course of a Drill query, all Drill operators are
+designed to reconfigure themselves when schemas change.
+
+### **_Flexible data model_**
+
+Drill allows access to nested data attributes, just like SQL columns, and
+provides intuitive extensions to easily operate on them. From an architectural
+point of view, Drill provides a flexible hierarchical columnar data model that
+can represent complex, highly dynamic and evolving data models. Drill allows
+for efficient processing of these models without the need to flatten or
+materialize them at design time or at execution time. Relational data in Drill
+is treated as a special or simplified case of complex/multi-structured data.
+
+### **_De-centralized metadata_**
+
+Drill does not have a centralized metadata requirement. You do not need to
+create and manage tables and views in a metadata repository, or rely on a
+database administrator group for such a function. Drill metadata is derived
+from the storage plugins that correspond to data sources. Storage plugins
+provide a spectrum of metadata ranging from full metadata (Hive), partial
+metadata (HBase), or no central metadata (files). De-centralized metadata
+means that Drill is NOT tied to a single Hive repository. You can query
+multiple Hive repositories at once and then combine the data with information
+from HBase tables or with a file in a distributed file system. You can also
+use SQL DDL syntax to create metadata within Drill, which gets organized just
+like a traditional database. Drill metadata is accessible through the ANSI
+standard INFORMATION_SCHEMA database.
+
+### **_Extensibility_**
+
+Drill provides an extensible architecture at all layers, including the storage
+plugin, query, query optimization/execution, and client API layers. You can
+customize any layer for the specific needs of an organization or you can
+extend the layer to a broader array of use cases. Drill provides a built in
+classpath scanning and plugin concept to add additional storage plugins,
+functions, and operators with minimal configuration.
+
+## Process Overview
+
+Download the Apache Drill archive and extract the contents to a directory on
+your machine. The Apache Drill archive

[09/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

2015-02-25 Thread adi

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/datasets/001-aol.md
--
diff --git a/_docs/drill-docs/datasets/001-aol.md 
b/_docs/drill-docs/datasets/001-aol.md
deleted file mode 100644
index 472f52f..000
--- a/_docs/drill-docs/datasets/001-aol.md
+++ /dev/null
@@ -1,47 +0,0 @@

-title: AOL Search
-parent: Sample Datasets

-## Quick Stats
-
-The [AOL Search dataset](http://en.wikipedia.org/wiki/AOL_search_data_leak) is
-a collection of real query log data that is based on real users.
-
-## The Data Source
-
-The dataset consists of 20M Web queries from 650k users over a period of three
-months, 440MB in total and available [for
-download](http://zola.di.unipi.it/smalltext/datasets.html). The format used in
-the dataset is:
-
-AnonID, Query, QueryTime, ItemRank, ClickURL
-
-... with:
-
-  * AnonID, an anonymous user ID number.
-  * Query, the query issued by the user, case shifted with most punctuation 
removed.
-  * QueryTime, the time at which the query was submitted for search.
-  * ItemRank, if the user clicked on a search result, the rank of the item on 
which they clicked is listed.
-  * [ClickURL](http://www.dietkart.com/), if the user clicked on a search 
result, the domain portion of the URL in the clicked result is listed.
-
-Each line in the data represents one of two types of events
-
-  * A query that was NOT followed by the user clicking on a result item.
-  * A click through on an item in the result list returned from a query.
-
-In the first case (query only) there is data in only the first three columns,
-in the second case (click through), there is data in all five columns. For
-click through events, the query that preceded the click through is included.
-Note that if a user clicked on more than one result in the list returned from
-a single query, there will be TWO lines in the data to represent the two
-events.
-
-## The Queries
-
-Interesting queries, for example
-
-  * Users querying for topic X
-  * Users that click on the first (second, third) ranked item
-  * TOP 10 domains searched
-  * TOP 10 domains clicked at
-

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/datasets/002-enron.md
--
diff --git a/_docs/drill-docs/datasets/002-enron.md 
b/_docs/drill-docs/datasets/002-enron.md
deleted file mode 100644
index 2ddbef6..000
--- a/_docs/drill-docs/datasets/002-enron.md
+++ /dev/null
@@ -1,21 +0,0 @@

-title: Enron Emails
-parent: Sample Datasets

-## Quick Stats
-
-The [Enron Email dataset](http://www.cs.cmu.edu/~enron/) contains data from
-about 150 users, mostly senior management of Enron.
-
-## The Data Source
-
-Totalling some 500,000 messages, the [raw
-data](http://www.cs.cmu.edu/~enron/enron_mail_20110402.tgz) (2009 version of
-the dataset; ~423MB) is available for download as well as a [MySQL
-dump](ftp://ftp.isi.edu/sims/philpot/data/enron-mysqldump.sql.gz) (~177MB).
-
-## The Queries
-
-Interesting queries, for example
-
-  * Via [Query Dataset for Email 
Search](https://dbappserv.cis.upenn.edu/spell/)
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/datasets/003-wikipedia.md
--
diff --git a/_docs/drill-docs/datasets/003-wikipedia.md 
b/_docs/drill-docs/datasets/003-wikipedia.md
deleted file mode 100644
index 99e6e24..000
--- a/_docs/drill-docs/datasets/003-wikipedia.md
+++ /dev/null
@@ -1,105 +0,0 @@

-title: Wikipedia Edit History
-parent: Sample Datasets

-# Quick Stats
-
-The Wikipedia Edit History is a public dump of the website made available by
-the wikipedia foundation. You can find details
-[here](http://en.wikipedia.org/wiki/Wikipedia:Database_download). The dumps
-are made available as SQL or XML dumps. You can find the entire schema drawn
-together in this great [diagram](http://upload.wikimedia.org/wikipedia/commons
-/thumb/4/42/MediaWiki_1.20_%2844edaa2%29_database_schema.svg/2193px-
-MediaWiki_1.20_%2844edaa2%29_database_schema.svg.png).
-
-# Approach
-
-The _main_ distribution files are:
-
-  * Current Pages: As of January 2013 this SQL dump was 9.0GB in its 
compressed format.
-  * Complere Archive: This is what we actually want, but at a size of multiple 
terrabytes, clearly exceeds the storage available at home.
-
-To have some real historic data, it is recommended to download a _Special
-Export_ use this
-[link](http://en.wikipedia.org/w/index.php?title=Special:Export). Using this
-tool you generate a category specific XML dump and configure various export
-options. There are some limits like a maximum of 1000 revisions per export,
-but otherwise this should work out just fine.
-
-![](../../img/Overview.png)
-
-The entities used in the query use cases.
-
-# Use Cases
-
-## Select Change Volume Based on

[07/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

2015-02-25 Thread adi

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/query/query-fs/001-query-json.md
--
diff --git a/_docs/drill-docs/query/query-fs/001-query-json.md 
b/_docs/drill-docs/query/query-fs/001-query-json.md
deleted file mode 100644
index 048903b..000
--- a/_docs/drill-docs/query/query-fs/001-query-json.md
+++ /dev/null
@@ -1,41 +0,0 @@

-title: Querying JSON Files
-parent: Querying a File System

-Your Drill installation includes a sample JSON file located in Drill's
-classpath. The sample JSON file, `employee.json`, contains fictitious employee
-data. Use SQL syntax to query the sample `JSON` file.
-
-To view the data in the `employee.json` file, submit the following SQL query
-to Drill:
-
-``0: jdbc:drill:zk=local SELECT * FROM cp.`employee.json`;``
-
-The query returns the following results:
-
-**Example of partial output**
-
-
+-++++-+---+
-| employee_id | full_name  | first_name | last_name  | position_id | 
position_ |
-
+-++++-+---+
-| 1101| Steve Eurich | Steve  | Eurich | 16  | 
Store T |
-| 1102| Mary Pierson | Mary   | Pierson| 16  | 
Store T |
-| 1103| Leo Jones  | Leo| Jones  | 16  | Store 
Tem |
-| 1104| Nancy Beatty | Nancy  | Beatty | 16  | 
Store T |
-| 1105| Clara McNight | Clara  | McNight| 16  | 
Store  |
-| 1106| Marcella Isaacs | Marcella   | Isaacs | 17  | 
Stor |
-| 1107| Charlotte Yonce | Charlotte  | Yonce  | 17  | 
Stor |
-| 1108| Benjamin Foster | Benjamin   | Foster | 17  | 
Stor |
-| 1109| John Reed  | John   | Reed   | 17  | Store 
Per |
-| 1110| Lynn Kwiatkowski | Lynn   | Kwiatkowski | 17  
| St |
-| | Donald Vann | Donald | Vann   | 17  | 
Store Pe |
-| 1112| William Smith | William| Smith  | 17  | 
Store  |
-| 1113| Amy Hensley | Amy| Hensley| 17  | 
Store Pe |
-| 1114| Judy Owens | Judy   | Owens  | 17  | Store 
Per |
-| 1115| Frederick Castillo | Frederick  | Castillo   | 17  
| S |
-| 1116| Phil Munoz | Phil   | Munoz  | 17  | Store 
Per |
-| 1117| Lori Lightfoot | Lori   | Lightfoot  | 17  | 
Store |
-...
-
+-++++-+---+
-1,155 rows selected (0.762 seconds)
-0: jdbc:drill:zk=local
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/query/query-fs/002-query-parquet.md
--
diff --git a/_docs/drill-docs/query/query-fs/002-query-parquet.md 
b/_docs/drill-docs/query/query-fs/002-query-parquet.md
deleted file mode 100644
index 9b4e874..000
--- a/_docs/drill-docs/query/query-fs/002-query-parquet.md
+++ /dev/null
@@ -1,99 +0,0 @@

-title: Querying Parquet Files
-parent: Querying a File System

-Your Drill installation includes a `sample-date` directory with Parquet files
-that you can query. Use SQL syntax to query the `region.parquet` and
-`nation.parquet` files in the `sample-data` directory.
-
-**Note:** Your Drill installation location may differ from the examples used 
here. The examples assume that Drill was installed in embedded mode on your 
machine following the [Apache Drill in 10 Minutes 
](https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes)tutorial.
 If you installed Drill in distributed mode, or your `sample-data` directory 
differs from the location used in the examples, make sure to change the 
`sample-data` directory to the correct location before you run the queries.
-
- Region File
-
-If you followed the Apache Drill in 10 Minutes instructions to install Drill
-in embedded mode, the path to the parquet file varies between operating
-systems.
-
-To view the data in the `region.parquet` file, issue the query appropriate for
-your operating system:
-
-  * Linux  
-``SELECT * FROM dfs.`/opt/drill/apache-drill-0.4.0-incubating/sample-
-data/region.parquet`; ``
-
-   * Mac OS X  
-``SELECT * FROM dfs.`/Users/max/drill/apache-drill-0.4.0-incubating/sample-
-data/region.parquet`;``
-
-   * Windows  
-``SELECT * FROM dfs.`C:\drill\apache-drill-0.4.0-incubating\sample-
-data\region.parquet`;``
-
-The query returns the following results:
-
-+++
-|   EXPR$0   |   EXPR$1   |
-+++
-| AFRICA | lar deposits. blithely final packages cajole. regular 
waters ar |
-|

[12/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

2015-02-25 Thread bridgetb

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/arch/001-core-mod.md
--
diff --git a/_docs/arch/001-core-mod.md b/_docs/arch/001-core-mod.md
new file mode 100644
index 000..17fa18d
--- /dev/null
+++ b/_docs/arch/001-core-mod.md
@@ -0,0 +1,29 @@
+---
+title: Core Modules within a Drillbit
+parent: Architectural Overview
+---
+The following image represents components within each Drillbit:
+
+![drill query flow]({{ site.baseurl }}/docs/img/DrillbitModules.png)
+
+The following list describes the key components of a Drillbit:
+
+  * **RPC end point**: Drill exposes a low overhead protobuf-based RPC 
protocol to communicate with the clients. Additionally, a C++ and Java API 
layers are also available for the client applications to interact with Drill. 
Clients can communicate to a specific Drillbit directly or go through a 
ZooKeeper quorum to discover the available Drillbits before submitting queries. 
It is recommended that the clients always go through ZooKeeper to shield 
clients from the intricacies of cluster management, such as the addition or 
removal of nodes. 
+
+  * **SQL parser**: Drill uses Optiq, the open source framework, to parse 
incoming queries. The output of the parser component is a language agnostic, 
computer-friendly logical plan that represents the query. 
+  * **Storage plugin interfaces**: Drill serves as a query layer on top of 
several data sources. Storage plugins in Drill represent the abstractions that 
Drill uses to interact with the data sources. Storage plugins provide Drill 
with the following information:
+* Metadata available in the source
+* Interfaces for Drill to read from and write to data sources
+* Location of data and a set of optimization rules to help with efficient 
and faster execution of Drill queries on a specific data source 
+
+In the context of Hadoop, Drill provides storage plugins for files and
+HBase/M7. Drill also integrates with Hive as a storage plugin since Hive
+provides a metadata abstraction layer on top of files, HBase/M7, and provides
+libraries to read data and operate on these sources (Serdes and UDFs).
+
+When users query files and HBase/M7 with Drill, they can do it directly or 
go
+through Hive if they have metadata defined there. Drill integration with Hive
+is only for metadata. Drill does not invoke the Hive execution engine for any
+requests.
+
+  * **Distributed cache**: Drill uses a distributed cache to manage metadata 
(not the data) and configuration information across various nodes. Sample 
metadata information that is stored in the cache includes query plan fragments, 
intermediate state of the query execution, and statistics. Drill uses 
Infinispan as its cache technology.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/arch/002-arch-hilite.md
--
diff --git a/_docs/arch/002-arch-hilite.md b/_docs/arch/002-arch-hilite.md
new file mode 100644
index 000..5ac51bc
--- /dev/null
+++ b/_docs/arch/002-arch-hilite.md
@@ -0,0 +1,10 @@
+---
+title: Architectural Highlights
+parent: Architectural Overview
+---
+The goal for Drill is to bring the **SQL Ecosystem** and **Performance** of
+the relational systems to **Hadoop scale** data **WITHOUT** compromising on
+the **Flexibility** of Hadoop/NoSQL systems. There are several core
+architectural elements in Apache Drill that make it a highly flexible and
+efficient query engine.
+

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/arch/arch-hilite/001-flexibility.md
--
diff --git a/_docs/arch/arch-hilite/001-flexibility.md 
b/_docs/arch/arch-hilite/001-flexibility.md
new file mode 100644
index 000..0b5c5e3
--- /dev/null
+++ b/_docs/arch/arch-hilite/001-flexibility.md
@@ -0,0 +1,78 @@
+---
+title: Flexibility
+parent: Architectural Highlights
+---
+The following features contribute to Drill's flexible architecture:
+
+**_Dynamic schema discovery_**
+
+Drill does not require schema or type specification for the data in order to
+start the query execution process. Instead, Drill starts processing the data
+in units called record-batches and discovers the schema on the fly during
+processing. Self-describing data formats such as Parquet, JSON, AVRO, and
+NoSQL databases have schema specified as part of the data itself, which Drill
+leverages dynamically at query time. Schema can change over the course of a
+Drill query, so all of the Drill operators are designed to reconfigure
+themselves when such schema changing events occur.
+
+**_Flexible data model_**
+
+Drill is purpose-built from the ground up for complex/multi-structured data
+commonly seen in Hadoop/NoSQL applications such as social/mobile, clickstream,
+logs, and sensor equipped IOT. From a user point of view, Drill allows

[11/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

2015-02-25 Thread bridgetb

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/design/005-value.md
--
diff --git a/_docs/design/005-value.md b/_docs/design/005-value.md
new file mode 100644
index 000..828376a
--- /dev/null
+++ b/_docs/design/005-value.md
@@ -0,0 +1,163 @@
+---
+title: Value Vectors
+parent: Design Docs
+---
+This document defines the data structures required for passing sequences of
+columnar data between 
[Operators](https://docs.google.com/a/maprtech.com/document/d/1zaxkcrK9mYyfpGwX1kAV80z0PCi8abefL45zOzb97dI/edit#bookmark=id.iip15ful18mm).
+
+## Goals
+
+### Support Operators Written in Multiple Language
+
+ValueVectors should support operators written in C/C++/Assembly. To support
+this, the underlying ByteBuffer will not require modification when passed
+through the JNI interface. The ValueVector will be considered immutable once
+constructed. Endianness has not yet been considered.
+
+### Access
+
+Reading a random element from a ValueVector must be a constant time operation.
+To accomodate, elements are identified by their offset from the start of the
+buffer. Repeated, nullable and variable width ValueVectors utilize in an
+additional fixed width value vector to index each element. Write access is not
+supported once the ValueVector has been constructed by the RecordBatch.
+
+### Efficient Subsets of Value Vectors
+
+When an operator returns a subset of values from a ValueVector, it should
+reuse the original ValueVector. To accomplish this, a level of indirection is
+introduced to skip over certain values in the vector. This level of
+indirection is a sequence of offsets which reference an offset in the original
+ValueVector and the count of subsequent values which are to be included in the
+subset.
+
+### Pooled Allocation
+
+ValueVectors utilize one or more buffers under the covers. These buffers will
+be drawn from a pool. Value vectors are themselves created and destroyed as a
+schema changes during the course of record iteration.
+
+### Homogenous Value Types
+
+Each value in a Value Vector is of the same type. The [Record 
Batch](https://docs.google.com/a/maprtech.com/document/d/1zaxkcrK9mYyfpGwX1kAV80z0PCi8abefL45zOzb97dI/edit#bookmark=kix.s2xuoqnr8obe)
 implementation is responsible for
+creating a new Value Vector any time there is a change in schema.
+
+## Definitions
+
+Data Types
+
+The canonical source for value type definitions is the [Drill
+Datatypes](http://bit.ly/15JO9bC) document. The individual types are listed
+under the âBasic Data Typesâ tab, while the value vector types can be found
+under the âValue Vectorsâ tab.
+
+Operators
+
+An operator is responsible for transforming a stream of fields. It operates on
+Record Batches or constant values.
+
+Record Batch
+
+A set of field values for some range of records. The batch may be composed of
+Value Vectors, in which case each batch consists of exactly one schema.
+
+Value Vector
+
+The value vector is comprised of one or more contiguous buffers; one which
+stores a sequence of values, and zero or more which store any metadata
+associated with the ValueVector.
+
+## Data Structure
+
+A ValueVector stores values in a ByteBuf, which is a contiguous region of
+memory. Additional levels of indirection are used to support variable value
+widths, nullable values, repeated values and selection vectors. These levels
+of indirection are primarily lookup tables which consist of one or more fixed
+width ValueVectors which may be combined (e.g. for nullable, variable width
+values). A fixed width ValueVector of non-nullable, non-repeatable values does
+not require an indirect lookup; elements can be accessed directly by
+multiplying position by stride.
+
+Fixed Width Values
+
+Fixed width ValueVectors simply contain a packed sequence of values. Random
+access is supported by accessing element n at ByteBuf[0] + Index * Stride,
+where Index is 0-based. The following illustrates the underlying buffer of
+INT4 values [1 .. 6]:
+
+![drill query flow]({{ site.baseurl }}/docs/img/value1.png)
+
+Nullable Values
+
+Nullable values are represented by a vector of bit values. Each bit in the
+vector corresponds to an element in the ValueVector. If the bit is not set,
+the value is NULL. Otherwise the value is retrieved from the underlying
+buffer. The following illustrates a NullableValueVector of INT4 values 2, 3
+and 6:
+
+![drill query flow]({{ site.baseurl }}/docs/img/value2.png)
+  
+### Repeated Values
+
+A repeated ValueVector is used for elements which can contain multiple values
+(e.g. a JSON array). A table of offset and count pairs is used to represent
+each repeated element in the ValueVector. A count of zero means the element
+has no values (note the offset field is unused in this case). The following
+illustrates three fields; one with two values, one with no values, and one
+with a single value:
+
+![drill query flow]({{ site.baseurl

[02/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

2015-02-25 Thread bridgetb

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/cmd-summary/003-select.md
--
diff --git a/_docs/sql-ref/cmd-summary/003-select.md 
b/_docs/sql-ref/cmd-summary/003-select.md
new file mode 100644
index 000..4a4
--- /dev/null
+++ b/_docs/sql-ref/cmd-summary/003-select.md
@@ -0,0 +1,85 @@
+---
+title: SELECT Statements
+parent: SQL Commands Summary
+---
+Drill supports the following ANSI standard clauses in the SELECT statement:
+
+  * WITH clause
+  * SELECT list
+  * FROM clause
+  * WHERE clause
+  * GROUP BY clause
+  * HAVING clause
+  * ORDER BY clause (with an optional LIMIT clause)
+
+You can use the same SELECT syntax in the following commands:
+
+  * CREATE TABLE AS (CTAS)
+  * CREATE VIEW
+
+INSERT INTO SELECT is not yet supported.
+
+## Column Aliases
+
+You can use named column aliases in the SELECT list to provide meaningful
+names for regular columns and computed columns, such as the results of
+aggregate functions. See the section on running queries for examples.
+
+You cannot reference column aliases in the following clauses:
+
+  * WHERE
+  * GROUP BY
+  * HAVING
+
+Because Drill works with schema-less data sources, you cannot use positional
+aliases (1, 2, etc.) to refer to SELECT list columns, except in the ORDER BY
+clause.
+
+## UNION ALL Set Operator
+
+Drill supports the UNION ALL set operator to combine two result sets. The
+distinct UNION operator is not yet supported.
+
+The EXCEPT, EXCEPT ALL, INTERSECT, and INTERSECT ALL operators are not yet
+supported.
+
+## Joins
+
+Drill supports ANSI standard joins in the FROM and WHERE clauses:
+
+  * Inner joins
+  * Left, full, and right outer joins
+
+The following types of join syntax are supported:
+
+Join type| Syntax  
+---|---  
+Join condition in WHERE clause|FROM table1, table 2 WHERE 
table1.col1=table2.col1  
+USING join in FROM clause|FROM table1 JOIN table2 USING(col1, ...)  
+ON join in FROM clause|FROM table1 JOIN table2 ON table1.col1=table2.col1  
+NATURAL JOIN in FROM clause|FROM table 1 NATURAL JOIN table 2  
+
+Cross-joins are not yet supported. You must specify a join condition when more
+than one table is listed in the FROM clause.
+
+Non-equijoins are supported if the join also contains an equality condition on
+the same two tables as part of a conjunction:
+
+table1.col1 = table2.col1 AND table1.c2  table2.c2
+
+This restriction applies to both inner and outer joins.
+
+## Subqueries
+
+You can use the following subquery operators in Drill queries. These operators
+all return Boolean results.
+
+  * ALL
+  * ANY
+  * EXISTS
+  * IN
+  * SOME
+
+In general, correlated subqueries are supported. EXISTS and NOT EXISTS
+subqueries that do not contain a correlation join are not yet supported.
+

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/cmd-summary/004-show-files.md
--
diff --git a/_docs/sql-ref/cmd-summary/004-show-files.md 
b/_docs/sql-ref/cmd-summary/004-show-files.md
new file mode 100644
index 000..1fcf395
--- /dev/null
+++ b/_docs/sql-ref/cmd-summary/004-show-files.md
@@ -0,0 +1,65 @@
+---
+title: SHOW FILES Command
+parent: SQL Commands Summary
+---
+The SHOW FILES command provides a quick report of the file systems that are
+visible to Drill for query purposes. This command is unique to Apache Drill.
+
+## Syntax
+
+The SHOW FILES command supports the following syntax.
+
+SHOW FILES [ FROM filesystem.directory_name | IN filesystem.directory_name 
];
+
+The FROM or IN clause is required if you do not specify a default file system
+first. You can do this with the USE command. FROM and IN are synonyms.
+
+The directory name is optional. (If the directory name is a Drill reserved
+word, you must use back ticks around the name.)
+
+The command returns standard Linux `stat` information for each file or
+directory, such as permissions, owner, and group values. This information is
+not specific to Drill.
+
+## Examples
+
+The following example returns information about directories and files in the
+local (`dfs`) file system.
+
+   0: jdbc:drill: use dfs;
+
+   +++
+   | ok |  summary   |
+   +++
+   | true   | Default schema changed to 'dfs' |
+   +++
+   1 row selected (0.318 seconds)
+
+   0: jdbc:drill: show files;
+   
++-+++++-++--+
+   |name| isDirectory |   isFile   |   length   |   owner|   
group| permissions | accessTime | modificationTime |
+   
++-+++++-++--+
+   | user   | true| false  | 1  | mapr

[06/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

2015-02-25 Thread bridgetb

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/sql-ref/nested/001-flatten.md
--
diff --git a/_docs/drill-docs/sql-ref/nested/001-flatten.md 
b/_docs/drill-docs/sql-ref/nested/001-flatten.md
deleted file mode 100644
index 124db91..000
--- a/_docs/drill-docs/sql-ref/nested/001-flatten.md
+++ /dev/null
@@ -1,89 +0,0 @@

-title: FLATTEN Function
-parent: Nested Data Functions

-The FLATTEN function is useful for flexible exploration of repeated data.
-FLATTEN separates the elements in a repeated field into individual records. To
-maintain the association between each flattened value and the other fields in
-the record, all of the other columns are copied into each new record. A very
-simple example would turn this data (one record):
-
-{
-  x : 5,
-  y : a string,
-  z : [ 1,2,3]
-}
-
-into three distinct records:
-
-select flatten(z) from table;
-| x   | y  | z |
-+-++---+
-| 5   | a string | 1 |
-| 5   | a string | 2 |
-| 5   | a string | 3 |
-
-The function takes a single argument, which must be an array (the `z` column
-in this example).
-
-  
-
-For a more interesting example, consider the JSON data in the publicly
-available [Yelp](https://www.yelp.com/dataset_challenge/dataset) data set. The
-first query below returns three columns from the
-`yelp_academic_dataset_business.json` file: `name`, `hours`, and `categories`.
-The query is restricted to distinct rows where the name is `z``pizza`. The
-query returns only one row that meets those criteria; however, note that this
-row contains an array of four categories:
-
-0: jdbc:drill:zk=local select distinct name, hours, categories 
-from dfs.yelp.`yelp_academic_dataset_business.json` 
-where name ='zpizza';
-++++
-|name|   hours| categories |
-++++
-| zpizza | 
{Tuesday:{close:22:00,open:10:00},Friday:{close:23:00,open:10:00},Monday:{close:22:00,open:10:00},Wednesday:{close:22:00,open:10:00},Thursday:{close:22:00,open:10:00},Sunday:{close:22:00,open:10:00},Saturday:{close:23:00,open:10:00}}
 | [Gluten-Free,Pizza,Vegan,Restaurants] |
-
-The FLATTEN function can operate on this single row and return multiple rows,
-one for each category:
-
-0: jdbc:drill:zk=local select distinct name, flatten(categories) as 
categories 
-from dfs.yelp.`yelp_academic_dataset_business.json` 
-where name ='zpizza' order by 2;
-++-+
-|name| categories  |
-++-+
-| zpizza | Gluten-Free |
-| zpizza | Pizza   |
-| zpizza | Restaurants |
-| zpizza | Vegan   |
-++-+
-4 rows selected (2.797 seconds)
-
-Having used the FLATTEN function to break down arrays into distinct rows, you
-can run queries that do deeper analysis on the flattened result set. For
-example, you can use FLATTEN in a subquery, then apply WHERE clause
-constraints or aggregate functions to the results in the outer query.
-
-The following query uses the same data file as the previous query to flatten
-the categories array, then run a COUNT function on the flattened result:
-
-select celltbl.catl, count(celltbl.catl) catcount 
-from (select flatten(categories) catl 
-from dfs.yelp.`yelp_academic_dataset_business.json`) celltbl 
-group by celltbl.catl 
-order by count(celltbl.catl) desc limit 5;
- 
-+---++
-|catl   |  catcount  |
-+---++
-| Restaurants   | 14303  |
-| Shopping  | 6428   |
-| Food  | 5209   |
-| Beauty  Spas | 3421   |
-| Nightlife | 2870   |
-+---|+
-
-A common use case for FLATTEN is its use in conjunction with the
-[KVGEN](/confluence/display/DRILL/KVGEN+Function) function.
-

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/sql-ref/nested/002-kvgen.md
--
diff --git a/_docs/drill-docs/sql-ref/nested/002-kvgen.md 
b/_docs/drill-docs/sql-ref/nested/002-kvgen.md
deleted file mode 100644
index a27a781..000
--- a/_docs/drill-docs/sql-ref/nested/002-kvgen.md
+++ /dev/null
@@ -1,150 +0,0 @@

-title: KVGEN Function
-parent: Nested Data Functions

-KVGEN stands for _key-value generation_. This function is useful when complex
-data files contain arbitrary maps that consist of relatively unknown column
-names. Instead of having to specify columns in the map to access the data, you
-can use KVGEN to return a list of the keys that exist in the map. KVGEN turns
-a map with a wide set of columns into an

Git Push Summary

2015-02-25 Thread adi

Repository: drill
Updated Branches:
  refs/heads/gh-pages-master [created] 23f82db9f

[1/2] drill git commit: DRILL-2130: Fixed JUnit/Hamcrest/Mockito/Paranamer class path problem.

[2/2] drill git commit: DRILL-1690: Issue with using HBase plugin to access row_key only

svn commit: r1662344 [8/8] - in /drill/site/trunk/content/drill: ./ blog/2014/12/11/apache-drill-qa-panelist-spotlight/ docs/ docs/2014-q1-drill-report/ docs/advanced-properties/ docs/analyzing-yelp-j

svn commit: r1662344 [4/8] - in /drill/site/trunk/content/drill: ./ blog/2014/12/11/apache-drill-qa-panelist-spotlight/ docs/ docs/2014-q1-drill-report/ docs/advanced-properties/ docs/analyzing-yelp-j

svn commit: r1662344 [2/8] - in /drill/site/trunk/content/drill: ./ blog/2014/12/11/apache-drill-qa-panelist-spotlight/ docs/ docs/2014-q1-drill-report/ docs/advanced-properties/ docs/analyzing-yelp-j

svn commit: r1662344 [7/8] - in /drill/site/trunk/content/drill: ./ blog/2014/12/11/apache-drill-qa-panelist-spotlight/ docs/ docs/2014-q1-drill-report/ docs/advanced-properties/ docs/analyzing-yelp-j

[03/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

[05/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

[09/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

[07/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

[12/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

[11/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

[02/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

[06/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

Git Push Summary

15 matches

Site Navigation

Mail list logo

Footer information