[02/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

bridgetb Wed, 25 Feb 2015 16:32:03 -0800

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/cmd-summary/003-select.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/cmd-summary/003-select.md 
b/_docs/sql-ref/cmd-summary/003-select.md
new file mode 100644
index 0000000..00004a4
--- /dev/null
+++ b/_docs/sql-ref/cmd-summary/003-select.md
@@ -0,0 +1,85 @@
+---
+title: "SELECT Statements"
+parent: "SQL Commands Summary"
+---
+Drill supports the following ANSI standard clauses in the SELECT statement:
+
+  * WITH clause
+  * SELECT list
+  * FROM clause
+  * WHERE clause
+  * GROUP BY clause
+  * HAVING clause
+  * ORDER BY clause (with an optional LIMIT clause)
+
+You can use the same SELECT syntax in the following commands:
+
+  * CREATE TABLE AS (CTAS)
+  * CREATE VIEW
+
+INSERT INTO SELECT is not yet supported.
+
+## Column Aliases
+
+You can use named column aliases in the SELECT list to provide meaningful
+names for regular columns and computed columns, such as the results of
+aggregate functions. See the section on running queries for examples.
+
+You cannot reference column aliases in the following clauses:
+
+  * WHERE
+  * GROUP BY
+  * HAVING
+
+Because Drill works with schema-less data sources, you cannot use positional
+aliases (1, 2, etc.) to refer to SELECT list columns, except in the ORDER BY
+clause.
+
+## UNION ALL Set Operator
+
+Drill supports the UNION ALL set operator to combine two result sets. The
+distinct UNION operator is not yet supported.
+
+The EXCEPT, EXCEPT ALL, INTERSECT, and INTERSECT ALL operators are not yet
+supported.
+
+## Joins
+
+Drill supports ANSI standard joins in the FROM and WHERE clauses:
+
+  * Inner joins
+  * Left, full, and right outer joins
+
+The following types of join syntax are supported:
+
+Join type| Syntax  
+---|---  
+Join condition in WHERE clause|FROM table1, table 2 WHERE 
table1.col1=table2.col1  
+USING join in FROM clause|FROM table1 JOIN table2 USING(col1, ...)  
+ON join in FROM clause|FROM table1 JOIN table2 ON table1.col1=table2.col1  
+NATURAL JOIN in FROM clause|FROM table 1 NATURAL JOIN table 2  
+
+Cross-joins are not yet supported. You must specify a join condition when more
+than one table is listed in the FROM clause.
+
+Non-equijoins are supported if the join also contains an equality condition on
+the same two tables as part of a conjunction:
+
+    table1.col1 = table2.col1 AND table1.c2 < table2.c2
+
+This restriction applies to both inner and outer joins.
+
+## Subqueries
+
+You can use the following subquery operators in Drill queries. These operators
+all return Boolean results.
+
+  * ALL
+  * ANY
+  * EXISTS
+  * IN
+  * SOME
+
+In general, correlated subqueries are supported. EXISTS and NOT EXISTS
+subqueries that do not contain a correlation join are not yet supported.
+


http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/cmd-summary/004-show-files.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/cmd-summary/004-show-files.md 
b/_docs/sql-ref/cmd-summary/004-show-files.md
new file mode 100644
index 0000000..1fcf395
--- /dev/null
+++ b/_docs/sql-ref/cmd-summary/004-show-files.md
@@ -0,0 +1,65 @@
+---
+title: "SHOW FILES Command"
+parent: "SQL Commands Summary"
+---
+The SHOW FILES command provides a quick report of the file systems that are
+visible to Drill for query purposes. This command is unique to Apache Drill.
+
+## Syntax
+
+The SHOW FILES command supports the following syntax.
+
+    SHOW FILES [ FROM filesystem.directory_name | IN filesystem.directory_name 
];
+
+The FROM or IN clause is required if you do not specify a default file system
+first. You can do this with the USE command. FROM and IN are synonyms.
+
+The directory name is optional. (If the directory name is a Drill reserved
+word, you must use back ticks around the name.)
+
+The command returns standard Linux `stat` information for each file or
+directory, such as permissions, owner, and group values. This information is
+not specific to Drill.
+
+## Examples
+
+The following example returns information about directories and files in the
+local (`dfs`) file system.
+
+       0: jdbc:drill:> use dfs;
+        
+       +------------+------------+
+       |     ok     |  summary   |
+       +------------+------------+
+       | true       | Default schema changed to 'dfs' |
+       +------------+------------+
+       1 row selected (0.318 seconds)
+        
+       0: jdbc:drill:> show files;
+       
+------------+-------------+------------+------------+------------+------------+-------------+------------+------------------+
+       |    name    | isDirectory |   isFile   |   length   |   owner    |   
group    | permissions | accessTime | modificationTime |
+       
+------------+-------------+------------+------------+------------+------------+-------------+------------+------------------+
+       | user       | true        | false      | 1          | mapr       | 
mapr       | rwxr-xr-x   | 2014-07-30 21:37:06.0 | 2014-07-31 22:15:53.193 |
+       | backup.tgz | false       | true       | 36272      | root       | 
root       | rw-r--r--   | 2014-07-31 22:09:13.0 | 2014-07-31 22:09:13.211 |
+       | JSON       | true        | false      | 1          | root       | 
root       | rwxr-xr-x   | 2014-07-31 15:22:42.0 | 2014-08-04 15:43:07.083 |
+       | scripts    | true        | false      | 3          | root       | 
root       | rwxr-xr-x   | 2014-07-31 22:10:51.0 | 2014-08-04 18:23:09.236 |
+       | temp       | true        | false      | 2          | root       | 
root       | rwxr-xr-x   | 2014-08-01 20:07:37.0 | 2014-08-01 20:09:42.595 |
+       | hbase      | true        | false      | 10         | mapr       | 
mapr       | rwxr-xr-x   | 2014-07-30 21:36:08.0 | 2014-08-04 18:31:13.778 |
+       | tables     | true        | false      | 0          | root       | 
root       | rwxrwxrwx   | 2014-07-31 22:14:35.0 | 2014-08-04 15:42:43.415 |
+       | CSV        | true        | false      | 4          | root       | 
root       | rwxrwxrwx   | 2014-07-31 17:34:53.0 | 2014-08-04
+       ...
+
+The following example shows the files in a specific directory in the `dfs`
+file system:
+
+       0: jdbc:drill:> show files in dfs.CSV;
+        
+       
+------------+-------------+------------+------------+------------+------------+-------------+------------+------------------+
+       |    name    | isDirectory |   isFile   |   length   |   owner    |   
group    | permissions | accessTime | modificationTime |
+       
+------------+-------------+------------+------------+------------+------------+-------------+------------+------------------+
+       | customers.csv | false       | true       | 62011      | root       | 
root       | rw-r--r--   | 2014-08-04 18:30:39.0 | 2014-08-04 18:30:39.314 |
+       | products.csv.small | false       | true       | 34972      | root     
  | root       | rw-r--r--   | 2014-07-31 23:58:42.0 | 2014-07-31 23:59:16.849 |
+       | products.csv | false       | true       | 34972      | root       | 
root       | rw-r--r--   | 2014-08-01 06:39:34.0 | 2014-08-04 15:58:09.325 |
+       | products.csv.bad | false       | true       | 62307      | root       
| root       | rw-r--r--   | 2014-08-04 15:58:02.0 | 2014-08-04 15:58:02.612 |
+       
+------------+-------------+------------+------------+------------+------------+-------------+------------+------------------+
+       4 rows selected (0.165 seconds)
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/data-types/001-date.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/data-types/001-date.md 
b/_docs/sql-ref/data-types/001-date.md
new file mode 100644
index 0000000..6340e35
--- /dev/null
+++ b/_docs/sql-ref/data-types/001-date.md
@@ -0,0 +1,148 @@
+---
+title: "Supported Date/Time Data Type Formats"
+parent: "Data Types"
+---
+You must use supported `date` and `time` formats when you `SELECT` date and
+time literals or when you `CAST()` from `VARCHAR `to `date` and `time` data
+types. Apache Drill currently supports specific formats for the following
+`date` and `time` data types:
+
+  * Date
+  * Timestamp
+  * Time
+  * Interval
+    * Interval Year
+    * Interval Day
+  * Literal
+
+The following query provides an example of how to `SELECT` a few of the
+supported date and time formats as literals:
+
+    select date '2008-2-23', timestamp '2008-1-23 14:24:23', time '10:20:30' 
from dfs.`/tmp/input.json`;
+
+The following query provides an example where `VARCHAR` data in a file is
+`CAST()` to supported `date `and `time` formats:
+
+    select cast(col_A as date), cast(col_B as timestamp), cast(col_C as time) 
from dfs.`/tmp/dates.json`;
+
+`Date`, t`imestamp`, and` time` data types store values in `UTC`. Currently,
+Apache Drill does not support `timestamp` with time zone.
+
+## Date
+
+Drill supports the `date` data type in the following format:
+
+    YYYY-MM-DD (year-month-date)
+
+The following table provides some examples for the `date` data type:
+
+  | Use | Example |
+  | --- | ------- |
+  |Literal| `select date â2008-2-23â from dfs.`/tmp/input.json`;`|
+  |`JSON` input | `{"date_col" : "2008-2-23"} 
+  | `CAST` from `VARCHAR`| `` select CAST(date_col as date) as CAST_DATE from 
dfs.`/tmp/input.json`; ``|
+
+## Timestamp
+
+Drill supports the `timestamp` data type in the following format:
+
+    yyyy-MM-dd HH:mm:ss.SSS (year-month-date hour:minute:sec.milliseconds)
+
+The following table provides some examples for the `timestamp` data type:
+
+<table>
+ <tbody>
+  <tr>
+   <th>Use</th>
+   <th>CAST Example</th>
+  </tr>
+  <tr>
+   <td valign="top">Literal</td>
+   <td valign="top"><code><span style="color: rgb(0,0,0);">select timestamp 
â2008-2-23 10:20:30.345â, timestamp â2008-2-23 10:20:30â from 
dfs.`/tmp/input.json`;</span></code>
+   </td></tr>
+  <tr>
+   <td colspan="1" valign="top"><code>JSON</code> Input</td>
+   <td colspan="1" valign="top"><code><span style="color: 
rgb(0,0,0);">{âtimestamp_colâ: â2008-2-23 15:20:30.345â}<br 
/></span><span style="color: rgb(0,0,0);">{âtimestamp_colâ: â2008-2-23 
10:20:30â}</span></code><span style="color: rgb(0,0,0);">The fractional 
millisecond component is optional.</span></td>
+   </tr>
+   <tr>
+    <td colspan="1" valign="top"><code>CAST</code> from 
<code>VARCHAR</code></td>
+    <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select 
cast(timestamp_col as timestamp) from dfs.`/tmp/input.json`; </span></code></td>
+   </tr>
+  </tbody>
+ </table>
+
+## Time
+
+Drill supports the `time` data type in the following format:
+
+    HH:mm:ss.SSS (hour:minute:sec.milliseconds)
+
+The following table provides some examples for the `time` data type:
+
+<table><tbody><tr>
+  <th>Use</th>
+  <th>Example</th>
+  </tr>
+  <tr>
+   <td valign="top">Literal</td>
+   <td valign="top"><code><span style="color: rgb(0,0,0);">select time 
â15:20:30â, time â10:20:30.123â from 
dfs.`/tmp/input.json`;</span></code></td>
+  </tr>
+  <tr>
+  <td colspan="1" valign="top"><code>JSON</code> Input</td>
+  <td colspan="1" valign="top"><code><span style="color: 
rgb(0,0,0);">{âtime_colâ : â10:20:30.999â}<br /></span><span 
style="color: rgb(0,0,0);">{âtime_colâ: â10:20:30â}</span></code></td>
+ </tr>
+ <tr>
+  <td colspan="1" valign="top"><code>CAST</code> from <code>VARCHAR</code></td>
+  <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select 
cast(time_col as time) from dfs.`/tmp/input.json`;</span></code></td>
+</tr></tbody>
+</table>
+
+## Interval
+
+Drill supports the `interval year` and `interval day` data types.
+
+### Interval Year
+
+The `interval year` data type stores time duration in years and months. Drill
+supports the `interval` data type in the following format:
+
+    P [qty] Y [qty] M
+
+The following table provides examples for `interval year` data type:
+
+<table ><tbody><tr>
+<th>Use</th>
+<th>Example</th></tr>
+  <tr>
+    <td valign="top">Literals</td>
+    <td valign="top"><code><span style="color: rgb(0,0,0);">select interval 
â1-2â year to month from dfs.`/tmp/input.json`;<br /></span><span 
style="color: rgb(0,0,0);">select interval â1â year from 
dfs.`/tmp/input.json`;<br /></span><span style="color: rgb(0,0,0);">select 
interval '13â month from dfs.`/tmp/input.json`;</span></code></td></tr><tr>
+    <td colspan="1" valign="top"><code>JSON</code> Input</td>
+    <td colspan="1" valign="top"><code><span style="color: 
rgb(0,0,0);">{âcolâ : âP1Y2Mâ}<br /></span><span style="color: 
rgb(0,0,0);">{âcolâ : âP-1Y2Mâ}<br /></span><span style="color: 
rgb(0,0,0);">{âcolâ : âP-1Y-2Mâ}<br /></span><span style="color: 
rgb(0,0,0);">{âcolâ: âP10Mâ}<br /></span><span style="color: 
rgb(0,0,0);">{âcolâ: âP5Yâ}</span></code></td>
+  </tr>
+  <tr>
+    <td colspan="1" valign="top"><code>CAST</code> from 
<code>VARCHAR</code></td>
+    <td colspan="1" valign="top"><code><span style="color: rgb(0,0,0);">select 
cast(col as interval year) from dfs.`/tmp/input.json`;</span></code></td>
+  </tr>
+  </tbody></table> 
+
+### Interval Day
+
+The `interval day` data type stores time duration in days, hours, minutes, and
+seconds. You do not need to specify all fields in a given interval. Drill
+supports the `interval day` data type in the following format:
+
+    P [qty] D T [qty] H [qty] M [qty] S
+
+The following table provides examples for `interval day` data type:
+
+<table ><tbody><tr><th >Use</th><th >Example</th></tr><tr><td 
valign="top">Literal</td><td valign="top"><code><span style="color: 
rgb(0,0,0);">select interval '1 10:20:30.123' day to second from 
dfs.`/tmp/input.json`;<br /></span><span style="color: rgb(0,0,0);">select 
interval '1 10' day to hour from dfs.`/tmp/input.json`;<br /></span><span 
style="color: rgb(0,0,0);">select interval '10' day  from 
dfs.`/tmp/input.json`;<br /></span><span style="color: rgb(0,0,0);">select 
interval '10' hour  from dfs.`/tmp/input.json`;</span></code><code><span 
style="color: rgb(0,0,0);">select interval '10.999' second  from 
dfs.`/tmp/input.json`;</span></code></td></tr><tr><td colspan="1" 
valign="top"><code>JSON</code> Input</td><td colspan="1" 
valign="top"><code><span style="color: rgb(0,0,0);">{&quot;col&quot; : 
&quot;P1DT10H20M30S&quot;}<br /></span><span style="color: 
rgb(0,0,0);">{&quot;col&quot; : &quot;P1DT10H20M30.123S&quot;}<br 
/></span><span style="color: rgb(0,0,0);">{&quot;col&quot; : &q
 uot;P1D&quot;}<br /></span><span style="color: rgb(0,0,0);">{&quot;col&quot; : 
&quot;PT10H&quot;}<br /></span><span style="color: 
rgb(0,0,0);">{&quot;col&quot; : &quot;PT10.10S&quot;}<br /></span><span 
style="color: rgb(0,0,0);">{&quot;col&quot; : &quot;PT20S&quot;}<br 
/></span><span style="color: rgb(0,0,0);">{&quot;col&quot; : 
&quot;PT10H10S&quot;}</span></code></td></tr><tr><td colspan="1" 
valign="top"><code>CAST</code> from <code>VARCHAR</code></td><td colspan="1" 
valign="top"><code><span style="color: rgb(0,0,0);">select cast(col as interval 
day) from dfs.`/tmp/input.json`;</span></code></td></tr></tbody></table> 
+
+## Literal
+
+The following table provides a list of `date/time` literals that Drill
+supports with examples of each:
+
+<table ><tbody><tr><th >Format</th><th colspan="1" >Interpretation</th><th 
>Example</th></tr><tr><td colspan="1" valign="top"><code><span style="color: 
rgb(0,0,0);">interval '1 10:20:30.123' day to second</span></code></td><td 
colspan="1" valign="top"><code>1 day, 10 hours, 20 minutes, 30 seconds, and 123 
thousandths of a second</code></td><td colspan="1" valign="top"><code><span 
style="color: rgb(0,0,0);">select interval '1 10:20:30.123' day to second from 
dfs.`/tmp/input.json`;</span></code></td></tr><tr><td colspan="1" 
valign="top"><code><span style="color: rgb(0,0,0);">interval '1 10' day to 
hour</span></code></td><td colspan="1" valign="top"><code>1 day 10 
hours</code></td><td colspan="1" valign="top"><code><span style="color: 
rgb(0,0,0);">select interval '1 10' day to hour from 
dfs.`/tmp/input.json`;</span></code></td></tr><tr><td colspan="1" 
valign="top"><code><span style="color: rgb(0,0,0);">interval '10' 
day</span></code></td><td colspan="1" valign="top"><code>10 days</code
 ></td><td colspan="1" valign="top"><code><span style="color: 
 >rgb(0,0,0);">select interval '10' day from 
 >dfs.`/tmp/input.json`;</span></code></td></tr><tr><td colspan="1" 
 >valign="top"><code><span style="color: rgb(0,0,0);">interval '10' 
 >hour</span></code></td><td colspan="1" valign="top"><code>10 
 >hours</code></td><td colspan="1" valign="top"><code><span style="color: 
 >rgb(0,0,0);">select interval '10' hour from 
 >dfs.`/tmp/input.json`;</span></code></td></tr><tr><td colspan="1" 
 >valign="top"><code><span style="color: rgb(0,0,0);">interval '10.999' 
 >second</span></code></td><td colspan="1" valign="top"><code>10.999 
 >seconds</code></td><td colspan="1" valign="top"><code><span style="color: 
 >rgb(0,0,0);">select interval '10.999' second from dfs.`/tmp/input.json`; 
 ></span></code></td></tr></tbody></table>
+
+
+

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/nested/001-flatten.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/nested/001-flatten.md 
b/_docs/sql-ref/nested/001-flatten.md
new file mode 100644
index 0000000..2769000
--- /dev/null
+++ b/_docs/sql-ref/nested/001-flatten.md
@@ -0,0 +1,89 @@
+---
+title: "FLATTEN Function"
+parent: "Nested Data Functions"
+---
+The FLATTEN function is useful for flexible exploration of repeated data.
+FLATTEN separates the elements in a repeated field into individual records. To
+maintain the association between each flattened value and the other fields in
+the record, all of the other columns are copied into each new record. A very
+simple example would turn this data (one record):
+
+    {
+      "x" : 5,
+      "y" : "a string",
+      "z" : [ 1,2,3]
+    }
+
+into three distinct records:
+
+    select flatten(z) from table;
+    | x           | y              | z         |
+    +-------------+----------------+-----------+
+    | 5           | "a string"     | 1         |
+    | 5           | "a string"     | 2         |
+    | 5           | "a string"     | 3         |
+
+The function takes a single argument, which must be an array (the `z` column
+in this example).
+
+  
+
+For a more interesting example, consider the JSON data in the publicly
+available [Yelp](https://www.yelp.com/dataset_challenge/dataset) data set. The
+first query below returns three columns from the
+`yelp_academic_dataset_business.json` file: `name`, `hours`, and `categories`.
+The query is restricted to distinct rows where the name is `z``pizza`. The
+query returns only one row that meets those criteria; however, note that this
+row contains an array of four categories:
+
+    0: jdbc:drill:zk=local> select distinct name, hours, categories 
+    from dfs.yelp.`yelp_academic_dataset_business.json` 
+    where name ='zpizza';
+    +------------+------------+------------+
+    |    name    |   hours    | categories |
+    +------------+------------+------------+
+    | zpizza     | 
{"Tuesday":{"close":"22:00","open":"10:00"},"Friday":{"close":"23:00","open":"10:00"},"Monday":{"close":"22:00","open":"10:00"},"Wednesday":{"close":"22:00","open":"10:00"},"Thursday":{"close":"22:00","open":"10:00"},"Sunday":{"close":"22:00","open":"10:00"},"Saturday":{"close":"23:00","open":"10:00"}}
 | ["Gluten-Free","Pizza","Vegan","Restaurants"] |
+
+The FLATTEN function can operate on this single row and return multiple rows,
+one for each category:
+
+    0: jdbc:drill:zk=local> select distinct name, flatten(categories) as 
categories 
+    from dfs.yelp.`yelp_academic_dataset_business.json` 
+    where name ='zpizza' order by 2;
+    +------------+-------------+
+    |    name    | categories  |
+    +------------+-------------+
+    | zpizza     | Gluten-Free |
+    | zpizza     | Pizza       |
+    | zpizza     | Restaurants |
+    | zpizza     | Vegan       |
+    +------------+-------------+
+    4 rows selected (2.797 seconds)
+
+Having used the FLATTEN function to break down arrays into distinct rows, you
+can run queries that do deeper analysis on the flattened result set. For
+example, you can use FLATTEN in a subquery, then apply WHERE clause
+constraints or aggregate functions to the results in the outer query.
+
+The following query uses the same data file as the previous query to flatten
+the categories array, then run a COUNT function on the flattened result:
+
+    select celltbl.catl, count(celltbl.catl) catcount 
+    from (select flatten(categories) catl 
+    from dfs.yelp.`yelp_academic_dataset_business.json`) celltbl 
+    group by celltbl.catl 
+    order by count(celltbl.catl) desc limit 5;
+ 
+    +---------------+------------+
+    |    catl       |  catcount  |
+    +---------------+------------+
+    | Restaurants   | 14303      |
+    | Shopping      | 6428       |
+    | Food          | 5209       |
+    | Beauty & Spas | 3421       |
+    | Nightlife     | 2870       |
+    +---------------|------------+
+
+A common use case for FLATTEN is its use in conjunction with the
+[KVGEN](/drill/docs/flatten-function) function.
+

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/nested/002-kvgen.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/nested/002-kvgen.md 
b/_docs/sql-ref/nested/002-kvgen.md
new file mode 100644
index 0000000..f619864
--- /dev/null
+++ b/_docs/sql-ref/nested/002-kvgen.md
@@ -0,0 +1,150 @@
+---
+title: "KVGEN Function"
+parent: "Nested Data Functions"
+---
+KVGEN stands for _key-value generation_. This function is useful when complex
+data files contain arbitrary maps that consist of relatively "unknown" column
+names. Instead of having to specify columns in the map to access the data, you
+can use KVGEN to return a list of the keys that exist in the map. KVGEN turns
+a map with a wide set of columns into an array of key-value pairs.
+
+In turn, you can write analytic queries that return a subset of the generated
+keys or constrain the keys in some way. For example, you can use the
+[FLATTEN](/drill/docs/flatten-function) function to break the
+array down into multiple distinct rows and further query those rows.
+
+  
+
+For example, assume that a JSON file contains this data:  
+
+    {"a": "valA", "b": "valB"}
+    {"c": "valC", "d": "valD"}
+  
+
+KVGEN would operate on this data to generate:
+
+    [{"key": "a", "value": "valA"}, {"key": "b", "value": "valB"}]
+    [{"key": "c", "value": "valC"}, {"key": "d", "value": "valD"}]
+
+Applying the [FLATTEN](/drill/docs/flatten-function) function to
+this data would return:
+
+    {"key": "a", "value": "valA"}
+    {"key": "b", "value": "valB"}
+    {"key": "c", "value": "valC"}
+    {"key": "d", "value": "valD"}
+
+Assume that a JSON file called `kvgendata.json` includes multiple records that
+look like this one:
+
+    {
+           "rownum": 1,
+           "bigintegercol": {
+               "int_1": 1,
+               "int_2": 2,
+               "int_3": 3
+           },
+           "varcharcol": {
+               "varchar_1": "abc",
+               "varchar_2": "def",
+               "varchar_3": "xyz"
+           },
+           "boolcol": {
+               "boolean_1": true,
+               "boolean_2": false,
+               "boolean_3": true
+           },
+           "float8col": {
+               "f8_1": 1.1,
+               "f8_2": 2.2
+           },
+           "complex": [
+               {
+                   "col1": 3
+               },
+               {
+                   "col2": 2,
+                   "col3": 1
+               },
+               {
+                   "col1": 7
+               }
+           ]
+    }
+ 
+       {
+           "rownum": 3,
+           "bigintegercol": {
+               "int_1": 1,
+               "int_3": 3
+           },
+           "varcharcol": {
+               "varchar_1": "abcde",
+               "varchar_2": null,
+               "varchar_3": "xyz",
+               "varchar_4": "xyz2"
+           },
+           "boolcol": {
+               "boolean_1": true,
+               "boolean_2": false
+           },
+           "float8col": {
+               "f8_1": 1.1,
+               "f8_3": 6.6
+           },
+           "complex": [
+               {
+                   "col1": 2,
+                   "col3": 1
+               }
+           ]
+       }
+       ...
+
+
+A SELECT * query against this specific record returns the following row:
+
+    0: jdbc:drill:zk=local> select * from dfs.yelp.`kvgendata.json` where 
rownum=1;
+ 
+       
+------------+---------------+------------+------------+------------+------------+
+       |   rownum   | bigintegercol | varcharcol |  boolcol   | float8col  |  
complex   |
+       
+------------+---------------+------------+------------+------------+------------+
+       | 1          | {"int_1":1,"int_2":2,"int_3":3} | 
{"varchar_1":"abc","varchar_2":"def","varchar_3":"xyz"} | 
{"boolean_1":true,"boolean_2":false,"boolean_3":true} | {"f8_1":1.1,"f8_2":2.2} 
| [{"col1":3},{"col2":2,"col3":1},{"col1":7}] |
+       
+------------+---------------+------------+------------+------------+------------+
+       1 row selected (0.122 seconds)
+
+You can use the KVGEN function to turn the maps in this data into key-value
+pairs. For example:
+
+       0: jdbc:drill:zk=local> select kvgen(varcharcol) from 
dfs.yelp.`kvgendata.json`;
+       +------------+
+       |   EXPR$0   |
+       +------------+
+       | 
[{"key":"varchar_1","value":"abc"},{"key":"varchar_2","value":"def"},{"key":"varchar_3","value":"xyz"}]
 |
+       | [{"key":"varchar_1","value":"abcd"}] |
+       | 
[{"key":"varchar_1","value":"abcde"},{"key":"varchar_3","value":"xyz"},{"key":"varchar_4","value":"xyz2"}]
 |
+       | [{"key":"varchar_1","value":"abc"},{"key":"varchar_2","value":"def"}] 
|
+       +------------+
+       4 rows selected (0.091 seconds)
+
+Now you can apply the FLATTEN function to break out the key-value pairs into
+distinct rows:
+
+       0: jdbc:drill:zk=local> select flatten(kvgen(varcharcol)) from 
dfs.yelp.`kvgendata.json`;
+       +------------+
+       |   EXPR$0   |
+       +------------+
+       | {"key":"varchar_1","value":"abc"} |
+       | {"key":"varchar_2","value":"def"} |
+       | {"key":"varchar_3","value":"xyz"} |
+       | {"key":"varchar_1","value":"abcd"} |
+       | {"key":"varchar_1","value":"abcde"} |
+       | {"key":"varchar_3","value":"xyz"} |
+       | {"key":"varchar_4","value":"xyz2"} |
+       | {"key":"varchar_1","value":"abc"} |
+       | {"key":"varchar_2","value":"def"} |
+       +------------+
+       9 rows selected (0.151 seconds)
+
+See the description of [FLATTEN](/drill/docs/flatten-function)
+for an example of a query against the flattened data.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/nested/003-repeated-cnt.md
----------------------------------------------------------------------
diff --git a/_docs/sql-ref/nested/003-repeated-cnt.md 
b/_docs/sql-ref/nested/003-repeated-cnt.md
new file mode 100644
index 0000000..2b332b3
--- /dev/null
+++ b/_docs/sql-ref/nested/003-repeated-cnt.md
@@ -0,0 +1,33 @@
+---
+title: "REPEATED_COUNT Function"
+parent: "Nested Data Functions"
+---
+This function counts the values in an array. The following example returns the
+counts for the `categories` array in the `yelp_academic_dataset_business.json`
+file. The counts are restricted to rows that contain the string `pizza`.
+
+       SELECT name, REPEATED_COUNT(categories) 
+       FROM   dfs.yelp.`yelp_academic_dataset_business.json` 
+       WHERE  name LIKE '%pizza%';
+        
+       +---------------+------------+
+       |    name       |   EXPR$1   |
+       +---------------+------------+
+       | Villapizza    | 2          |
+       | zpizza        | 4          |
+       | zpizza        | 4          |
+       | Luckys pizza  | 2          |
+       | Zpizza        | 2          |
+       | S2pizzabar    | 4          |
+       | Dominos pizza | 5          |
+       +---------------+------------+
+        
+       7 rows selected (2.03 seconds)
+
+The function requires a single argument, which must be an array. Note that
+this function is not a standard SQL aggregate function and does not require
+the count to be grouped by other columns in the select list (such as `name` in
+this example).
+
+For another example of this function, see the following lesson in the Apache
+Drill Tutorial for Hadoop: [Lesson 3: Run Queries on Complex Data 
Types](/drill/docs/lession-3-run-queries-on-complex-data-types/).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/tutorial/001-install-sandbox.md
----------------------------------------------------------------------
diff --git a/_docs/tutorial/001-install-sandbox.md 
b/_docs/tutorial/001-install-sandbox.md
new file mode 100644
index 0000000..26360ff
--- /dev/null
+++ b/_docs/tutorial/001-install-sandbox.md
@@ -0,0 +1,33 @@
+---
+title: "Installing the Apache Drill Sandbox"
+parent: "Apache Drill Tutorial"
+---
+## Prerequisites
+
+The MapR Sandbox with Apache Drill runs on VMware Player and VirtualBox, free
+desktop applications that you can use to run a virtual machine on a Windows,
+Mac, or Linux PC. Before you install the MapR Sandbox with Apache Drill,
+verify that the host system meets the following prerequisites:
+
+  * VMware Player or VirtualBox is installed.
+  * At least 20 GB free hard disk space, at least 4 physical cores, and 8 GB 
of RAM is available. Performance increases with more RAM and free hard disk 
space.
+  * Uses one of the following 64-bit x86 architectures:
+    * A 1.3 GHz or faster AMD CPU with segment-limit support in long mode
+    * A 1.3 GHz or faster Intel CPU with VT-x support
+  * If you have an Intel CPU with VT-x support, verify that VT-x support is 
enabled in the host system BIOS. The BIOS settings that must be enabled for 
VT-x support vary depending on the system vendor. See the VMware knowledge base 
article at <http://kb.vmware.com/kb/1003944> for information about how to 
determine if VT-x support is enabled.
+
+### VM Player Downloads
+
+For Linux, Mac, or Windows, download the free [VMware Player](https://my.vmwar
+e.com/web/vmware/free#desktop_end_user_computing/vmware_player/6_0) or
+[VirtualBox](https://www.virtualbox.org/wiki/Downloads). Optionally, you can
+purchase [VMware Fusion](http://www.vmware.com/products/fusion/) for Mac.
+
+### VM Player Installation
+
+The following list provides links to the virtual machine installation
+instructions:
+
+  * To install the VMware Player, see the [VMware 
documentation](http://www.vmware.com/support/pubs/player_pubs.html). Use of 
VMware Player is subject to the VMware Player end user license terms. VMware 
does not provide support for VMware Player. For self-help resources, see the 
[VMware Player FAQ](http://www.vmware.com/products/player/faqs.html).
+  * To install VirtualBox, see the [Oracle VM VirtualBox User 
Manual](http://dlc.sun.com.edgesuite.net/virtualbox/4.3.4/UserManual.pdf). By 
downloading VirtualBox, you agree to the terms and conditions of the respective 
license.
+

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/tutorial/002-get2kno-sb.md
----------------------------------------------------------------------
diff --git a/_docs/tutorial/002-get2kno-sb.md b/_docs/tutorial/002-get2kno-sb.md
new file mode 100644
index 0000000..9b11b9d
--- /dev/null
+++ b/_docs/tutorial/002-get2kno-sb.md
@@ -0,0 +1,232 @@
+---
+title: "Getting to Know the Drill Sandbox"
+parent: "Apache Drill Tutorial"
+---
+This section describes the configuration of the Apache Drill system that you
+have installed and introduces the overall use case for the tutorial.
+
+# Storage Plugins Overview
+
+The Hadoop cluster within the sandbox is set up with MapR-FS, MapR-DB, and
+Hive, which all serve as data sources for Drill in this tutorial. Before you
+can run queries against these data sources, Drill requires each one to be
+configured as a storage plugin. A storage plugin defines the abstraction on
+the data sources for Drill to talk to and provides interfaces to read/write
+and get metadata from the data source. Each storage plugin also exposes
+optimization rules for Drill to leverage for efficient query execution.
+
+Take a look at the pre-configured storage plugins by opening the Drill Web UI.
+
+Feel free to skip this section and jump directly to the queries: [Lesson 1:
+Learn About the Data
+Set](/drill/docs/lession-1-learn-about-the-data-set)
+
+  * Launch a web browser and go to: `http://<IP address of the sandbox>:8047`
+  * Go to the Storage tab
+  * Open the configured storage plugins one at a time by clicking Update
+  * You will see the following plugins configured.
+
+## maprdb
+
+A storage plugin configuration for MapR-DB in the sandbox. Drill uses a single
+storage plugin for connecting to HBase as well as MapR-DB, which is an
+enterprise grade in-Hadoop NoSQL database. In addition to the following brief 
example, see the [Registering HBase](/drill/docs/registering-hbase) for more
+information on how to configure Drill to query HBase.
+
+    {
+      "type" : "hbase",
+      "enabled" : true,
+      "config" : {
+        "hbase.table.namespace.mappings" : "*:/tables"
+      }
+     }
+
+## dfs
+
+This is a storage plugin configuration for the MapR file system (MapR-FS) in
+the sandbox. The connection attribute indicates the type of distributed file
+system: in this case, MapR-FS. Drill can work with any distributed system,
+including HDFS, S3, and so on.
+
+The configuration also includes a set of workspaces; each one represents a
+location in MapR-FS:
+
+  * root: access to the root file system location
+  * clicks: access to nested JSON log data
+  * logs: access to flat (non-nested) JSON log data in the logs directory and 
its subdirectories
+  * views: a workspace for creating views
+
+A workspace in Drill is a location where users can easily access a specific
+set of data and collaborate with each other by sharing artifacts. Users can
+create as many workspaces as they need within Drill.
+
+Each workspace can also be configured as âwritableâ or not, which indicates
+whether users can write data to this location and defines the storage format
+in which the data will be written (parquet, csv, json). These attributes
+become relevant when you explore SQL commands, especially CREATE TABLE
+AS (CTAS) and CREATE VIEW.
+
+Drill can query files and directories directly and can detect the file formats
+based on the file extension or the first few bits of data within the file.
+However, additional information around formats is required for Drill, such as
+delimiters for text files, which are specified in the âformatsâ section 
below.
+
+    {
+      "type": "file",
+      "enabled": true,
+      "connection": "maprfs:///",
+      "workspaces": {
+        "root": {
+          "location": "/mapr/demo.mapr.com/data",
+          "writable": false,
+          "storageformat": null
+        },
+        "clicks": {
+          "location": "/mapr/demo.mapr.com/data/nested",
+          "writable": true,
+          "storageformat": "parquet"
+        },
+        "logs": {
+          "location": "/mapr/demo.mapr.com/data/flat",
+          "writable": true,
+          "storageformat": "parquet"
+        },
+        "views": {
+          "location": "/mapr/demo.mapr.com/data/views",
+          "writable": true,
+          "storageformat": "parquet"
+     },
+     "formats": {
+       "psv": {
+         "type": "text",
+         "extensions": [
+           "tbl"
+         ],
+         "delimiter": "|"
+     },
+     "csv": {
+       "type": "text",
+       "extensions": [
+         "csv"
+       ],
+       "delimiter": ","
+     },
+     "tsv": {
+       "type": "text",
+       "extensions": [
+         "tsv"
+       ],
+       "delimiter": "\t"
+     },
+     "parquet": {
+       "type": "parquet"
+     },
+     "json": {
+       "type": "json"
+     }
+    }}
+
+## hive
+
+A storage plugin configuration for a Hive data warehouse within the sandbox.
+Drill connects to the Hive metastore by using the configured metastore thrift
+URI. Metadata for Hive tables is automatically available for users to query.
+
+     {
+      "type": "hive",
+      "enabled": true,
+      "configProps": {
+        "hive.metastore.uris": "thrift://localhost:9083",
+        "hive.metastore.sasl.enabled": "false"
+      }
+    }
+
+# Client Application Interfaces
+
+Drill also provides additional application interfaces for the client tools to
+connect and access from Drill. The interfaces include the following.
+
+### ODBC/JDBC drivers
+
+Drill provides ODBC/JDBC drivers to connect from BI tools such as Tableau,
+MicroStrategy, SQUirrel, and Jaspersoft; refer to [Using ODBC to Access Apache
+Drill from BI Tools](/drill/docs/odbc-jdbc-interfaces/using-odbc-to- 
access-apache-drill-from-bi-tools) and [Using JDBC to Access Apache 
Drill](/drill/docs/odbc-jdbc-interfaces#using-jdbc-to-access-apache-drill-from-squirrel)
 to learn
+more.
+
+### SQLLine
+
+SQLLine is a JDBC application that comes packaged with Drill. In order to
+start working with it, you can use the command line on the demo cluster to log
+in as root, then enter `sqlline`. Use `mapr` as the login password. For
+example:
+
+    $ ssh root@localhost -p 2222
+    Password:
+    Last login: Mon Sep 15 13:46:08 2014 from 10.250.0.28
+    Welcome to your Mapr Demo virtual machine.
+    [root@maprdemo ~]# sqlline
+    sqlline version 1.1.6
+    0: jdbc:drill:>
+
+### Drill Web UI
+
+The Drill Web UI is a simple user interface for configuring and manage Apache
+Drill. This UI can be launched from any of the nodes in the Drill cluster. The
+configuration for Drill includes setting up storage plugins that represent the
+data sources on which Drill performs queries. The sandbox comes with storage
+plugins configured for the Hive, HBase, MapR file system, and local file
+system.
+
+Users and developers can get the necessary information for tuning and
+performing diagnostics on queries, such as the list of queries executed in a
+session and detailed query plan profiles for each.
+
+Detailed configuration and management of Drill is out of scope for this
+tutorial.
+
+The Web interface for Apache Drill also provides a query UI where users can
+submit queries to Drill and observe results. Here is a screen shot of the Web
+UI for Apache Drill:
+
+![drill query flow]({{ site.baseurl }}/docs/img/DrillWebUI.png)
+
+### REST API
+
+Drill provides a simple REST API for the users to query data as well as manage
+the system. The Web UI leverages the REST API to talk to Drill.
+
+This tutorial introduces sample queries that you can run by using SQLLine.
+Note that you can run the queries just as easily by launching the Drill Web
+UI. No additional installation or configuration is required.
+
+# Use Case Overview
+
+As you run through the queries in this tutorial, put yourself in the shoes of
+an analyst with basic SQL skills. Let us imagine that the analyst works for an
+emerging online retail business that accepts purchases from its customers
+through both an established web-based interface and a new mobile application.
+
+The analyst is data-driven and operates mostly on the business side with
+little or no interaction with the IT department. Recently the central IT team
+has implemented a Hadoop-based infrastructure to reduce the cost of the legacy
+database system, and most of the DWH/ETL workload is now handled by
+Hadoop/Hive. The master customer profile information and product catalog are
+managed in MapR-DB, which is a NoSQL database. The IT team has also started
+acquiring clickstream data that comes from web and mobile applications. This
+data is stored in Hadoop as JSON files.
+
+The analyst has a number of data sources that he could explore, but exploring
+them in isolation is not the way to go. There are some potentially very
+interesting analytical connections between these data sources. For example, it
+would be good to be able to analyze customer records in the clickstream data
+and tie them to the master customer data in MapR DB.
+
+The analyst decides to explore various data sources and he chooses to do that
+by using Apache Drill. Think about the flexibility and analytic capability of
+Apache Drill as you work through the tutorial.
+
+# What's Next
+
+Start running queries by going to [Lesson 1: Learn About the Data
+Set](/drill/docs/lession-1-learn-about-the-data-set).
+

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/tutorial/003-lesson1.md
----------------------------------------------------------------------
diff --git a/_docs/tutorial/003-lesson1.md b/_docs/tutorial/003-lesson1.md
new file mode 100644
index 0000000..119d67f
--- /dev/null
+++ b/_docs/tutorial/003-lesson1.md
@@ -0,0 +1,396 @@
+---
+title: "Lession 1: Learn about the Data Set"
+parent: "Apache Drill Tutorial"
+---
+## Goal
+
+This lesson is simply about discovering what data is available, in what
+format, using simple SQL SELECT statements. Drill is capable of analyzing data
+without prior knowledge or definition of its schema. This means that you can
+start querying data immediately (and even as it changes), regardless of its
+format.
+
+The data set for the tutorial consists of:
+
+  * Transactional data: stored as a Hive table
+  * Product catalog and master customer data: stored as MapR-DB tables
+  * Clickstream and logs data: stored in the MapR file system as JSON files
+
+## Queries in This Lesson
+
+This lesson consists of select * queries on each data source.
+
+## Before You Begin
+
+### Start sqlline
+
+If sqlline is not already started, use a Terminal or Command window to log
+into the demo VM as root, then enter `sqlline`:
+
+    $ ssh root@10.250.0.6
+    Password:
+    Last login: Mon Sep 15 13:46:08 2014 from 10.250.0.28
+    Welcome to your Mapr Demo virtual machine.
+    [root@maprdemo ~]# sqlline
+    sqlline version 1.1.6
+    0: jdbc:drill:>
+
+You can run queries from this prompt to complete the tutorial. To exit from
+`sqlline`, type:
+
+    0: jdbc:drill:> !quit
+
+Note that though this tutorial demonstrates the queries using SQLLine, you can
+also execute queries using the Drill Web UI.
+
+### List the available workspaces and databases:
+
+    0: jdbc:drill:> show databases;
+    +-------------+
+    | SCHEMA_NAME |
+    +-------------+
+    | hive.default |
+    | dfs.default |
+    | dfs.logs    |
+    | dfs.root    |
+    | dfs.views   |
+    | dfs.clicks  |
+    | dfs.data    |
+    | dfs.tmp     |
+    | sys         |
+    | maprdb      |
+    | cp.default  |
+    | INFORMATION_SCHEMA |
+    +-------------+
+    12 rows selected
+
+Note that this command exposes all the metadata available from the storage
+plugins configured with Drill as a set of schemas. This includes the Hive and
+MapR-DB databases as well as the workspaces configured in the file system. As
+you run queries in the tutorial, you will switch among these schemas by
+submitting the USE command. This behavior resembles the ability to use
+different database schemas (namespaces) in a relational database system.
+
+## Query Hive Tables
+
+The orders table is a six-column Hive table defined in the Hive metastore.
+This is a Hive external table pointing to the data stored in flat files on the
+MapR file system. The orders table contains 122,000 rows.
+
+### Set the schema to hive:
+
+    0: jdbc:drill:> use hive;
+    +------------+------------+
+    | ok | summary |
+    +------------+------------+
+    | true | Default schema changed to 'hive' |
+    +------------+------------+
+
+You will run the USE command throughout this tutorial. The USE command sets
+the schema for the current session.
+
+### Describe the table:
+
+You can use the DESCRIBE command to show the columns and data types for a Hive
+table:
+
+    0: jdbc:drill:> describe orders;
+    +-------------+------------+-------------+
+    | COLUMN_NAME | DATA_TYPE  | IS_NULLABLE |
+    +-------------+------------+-------------+
+    | order_id    | BIGINT     | YES         |
+    | month       | VARCHAR    | YES         |
+    | cust_id     | BIGINT     | YES         |
+    | state       | VARCHAR    | YES         |
+    | prod_id     | BIGINT     | YES         |
+    | order_total | INTEGER    | YES         |
+    +-------------+------------+-------------+
+
+The DESCRIBE command returns complete schema information for Hive tables based
+on the metadata available in the Hive metastore.
+
+### Select 5 rows from the orders table:
+
+    0: jdbc:drill:> select * from orders limit 5;
+    
+------------+------------+------------+------------+------------+-------------+
+    | order_id | month | cust_id | state | prod_id | order_total |
+    
+------------+------------+------------+------------+------------+-------------+
+    | 67212 | June | 10001 | ca | 909 | 13 |
+    | 70302 | June | 10004 | ga | 420 | 11 |
+    | 69090 | June | 10011 | fl | 44 | 76 |
+    | 68834 | June | 10012 | ar | 0 | 81 |
+    | 71220 | June | 10018 | az | 411 | 24 |
+    
+------------+------------+------------+------------+------------+-------------+
+
+Because orders is a Hive table, you can query the data in the same way that
+you would query the columns in a relational database table. Note the use of
+the standard LIMIT clause, which limits the result set to the specified number
+of rows. You can use LIMIT with or without an ORDER BY clause.
+
+Drill provides seamless integration with Hive by allowing queries on Hive
+tables defined in the metastore with no extra configuration. Note that Hive is
+not a prerequisite for Drill, but simply serves as a storage plugin or data
+source for Drill. Drill also lets users query all Hive file formats (including
+custom serdes). Additionally, any UDFs defined in Hive can be leveraged as
+part of Drill queries.
+
+Because Drill has its own low-latency SQL query execution engine, you can
+query Hive tables with high performance and support for interactive and ad-hoc
+data exploration.
+
+## Query MapR-DB and HBase Tables
+
+The customers and products tables are MapR-DB tables. MapR-DB is an enterprise
+in-Hadoop NoSQL database. It exposes the HBase API to support application
+development. Every MapR-DB table has a row_key, in addition to one or more
+column families. Each column family contains one or more specific columns. The
+row_key value is a primary key that uniquely identifies each row.
+
+Drill allows direct queries on MapR-DB and HBase tables. Unlike other SQL on
+Hadoop options, Drill requires no overlay schema definitions in Hive to work
+with this data. Think about a MapR-DB or HBase table with thousands of
+columns, such as a time-series database, and the pain of having to manage
+duplicate schemas for it in Hive!
+
+### Products Table
+
+The products table has two column families.
+
+<table ><colgroup><col /><col /></colgroup><tbody><tr><td ><span style="color: 
rgb(0,0,0);">Column Family</span></td><td ><span style="color: 
rgb(0,0,0);">Columns</span></td></tr><tr><td ><span style="color: 
rgb(0,0,0);">details</span></td><td ><span style="color: 
rgb(0,0,0);">name</br></span><span style="color: 
rgb(0,0,0);">category</span></td></tr><tr><td ><span style="color: 
rgb(0,0,0);">pricing</span></td><td ><span style="color: 
rgb(0,0,0);">price</span></td></tr></tbody></table>  
+The products table contains 965 rows.
+
+### Customers Table
+
+The Customers table has three column families.
+
+<table ><colgroup><col /><col /></colgroup><tbody><tr><td ><span style="color: 
rgb(0,0,0);">Column Family</span></td><td ><span style="color: 
rgb(0,0,0);">Columns</span></td></tr><tr><td ><span style="color: 
rgb(0,0,0);">address</span></td><td ><span style="color: 
rgb(0,0,0);">state</span></td></tr><tr><td ><span style="color: 
rgb(0,0,0);">loyalty</span></td><td ><span style="color: 
rgb(0,0,0);">agg_rev</br></span><span style="color: 
rgb(0,0,0);">membership</span></td></tr><tr><td ><span style="color: 
rgb(0,0,0);">personal</span></td><td ><span style="color: 
rgb(0,0,0);">age</br></span><span style="color: 
rgb(0,0,0);">gender</span></td></tr></tbody></table>  
+  
+The customers table contains 993 rows.
+
+### Set the workspace to maprdb:
+
+    0: jdbc:drill:> use maprdb;
+    +------------+------------+
+    | ok | summary |
+    +------------+------------+
+    | true | Default schema changed to 'maprdb' |
+    +------------+------------+
+
+### Describe the tables:
+
+    0: jdbc:drill:> describe customers;
+    +-------------+------------+-------------+
+    | COLUMN_NAME | DATA_TYPE  | IS_NULLABLE |
+    +-------------+------------+-------------+
+    | row_key     | ANY        | NO          |
+    | address     | (VARCHAR(1), ANY) MAP | NO          |
+    | loyalty     | (VARCHAR(1), ANY) MAP | NO          |
+    | personal    | (VARCHAR(1), ANY) MAP | NO          |
+    +-------------+------------+-------------+
+ 
+    0: jdbc:drill:> describe products;
+    +-------------+------------+-------------+
+    | COLUMN_NAME | DATA_TYPE  | IS_NULLABLE |
+    +-------------+------------+-------------+
+    | row_key     | ANY        | NO          |
+    | details     | (VARCHAR(1), ANY) MAP | NO          |
+    | pricing     | (VARCHAR(1), ANY) MAP | NO          |
+    +-------------+------------+-------------+
+
+Unlike the Hive example, the DESCRIBE command does not return the full schema
+up to the column level. Wide-column NoSQL databases such as MapR-DB and HBase
+can be schema-less by design; every row has its own set of column name-value
+pairs in a given column family, and the column value can be of any data type,
+as determined by the application inserting the data.
+
+A âMAPâ complex type in Drill represents this variable column name-value
+structure, and âANYâ represents the fact that the column value can be of 
any
+data type. Observe the row_key, which is also simply bytes and has the type
+ANY.
+
+### Select 5 rows from the products table:
+
+    0: jdbc:drill:> select * from products limit 5;
+    +------------+------------+------------+
+    | row_key | details | pricing |
+    +------------+------------+------------+
+    | [B@a1a3e25 | {"category":"bGFwdG9w","name":"IlNvbnkgbm90ZWJvb2si"} | 
{"price":"OTU5"} |
+    | [B@103a43af | 
{"category":"RW52ZWxvcGVz","name":"IzEwLTQgMS84IHggOSAxLzIgUHJlbWl1bSBEaWFnb25hbCBTZWFtIEVudmVsb3Blcw=="}
 | {"price":"MT |
+    | [B@61319e7b | 
{"category":"U3RvcmFnZSAmIE9yZ2FuaXphdGlvbg==","name":"MjQgQ2FwYWNpdHkgTWF4aSBEYXRhIEJpbmRlciBSYWNrc1BlYXJs"}
 | {"price" |
+    | [B@9bcf17 | {"category":"TGFiZWxz","name":"QXZlcnkgNDk4"} | 
{"price":"Mw=="} |
+    | [B@7538ef50 | {"category":"TGFiZWxz","name":"QXZlcnkgNDk="} | 
{"price":"Mw=="} |
+
+Given that Drill requires no up front schema definitions indicating data
+types, the query returns the raw byte arrays for column values, just as they
+are stored in MapR-DB (or HBase). Observe that the column families (details
+and pricing) have the map data type and appear as JSON strings.
+
+In Lesson 2, you will use CAST functions to return typed data for each column.
+
+### Select 5 rows from the customers table:
+
+
+    +0: jdbc:drill:> select * from customers limit 5;
+    +------------+------------+------------+------------+
+    | row_key | address | loyalty | personal |
+    +------------+------------+------------+------------+
+    | [B@284bae62 | {"state":"Imt5Ig=="} | 
{"agg_rev":"IjEwMDEtMzAwMCI=","membership":"ImJhc2ljIg=="} | 
{"age":"IjI2LTM1Ig==","gender":"Ik1B |
+    | [B@7ffa4523 | {"state":"ImNhIg=="} | 
{"agg_rev":"IjAtMTAwIg==","membership":"ImdvbGQi"} | 
{"age":"IjI2LTM1Ig==","gender":"IkZFTUFMRSI= |
+    | [B@7d13e79 | {"state":"Im9rIg=="} | 
{"agg_rev":"IjUwMS0xMDAwIg==","membership":"InNpbHZlciI="} | 
{"age":"IjI2LTM1Ig==","gender":"IkZFT |
+    | [B@3a5c7df1 | {"state":"ImtzIg=="} | 
{"agg_rev":"IjMwMDEtMTAwMDAwIg==","membership":"ImdvbGQi"} | 
{"age":"IjUxLTEwMCI=","gender":"IkZF |
+    | [B@e507726 | {"state":"Im5qIg=="} | 
{"agg_rev":"IjAtMTAwIg==","membership":"ImJhc2ljIg=="} | 
{"age":"IjIxLTI1Ig==","gender":"Ik1BTEUi" |
+    +------------+------------+------------+------------+
+
+Again the table returns byte data that needs to be cast to readable data
+types.
+
+## Query the File System
+
+Along with querying a data source with full schemas (such as Hive) and partial
+schemas (such as MapR-DB and HBase), Drill offers the unique capability to
+perform SQL queries directly on file system. The file system could be a local
+file system, or a distributed file system such as MapR-FS, HDFS, or S3.
+
+In the context of Drill, a file or a directory is considered as synonymous to
+a relational database âtable.â Therefore, you can perform SQL operations
+directly on files and directories without the need for up-front schema
+definitions or schema management for any model changes. The schema is
+discovered on the fly based on the query. Drill supports queries on a variety
+of file formats including text, CSV, Parquet, and JSON in the 0.5 release.
+
+In this example, the clickstream data coming from the mobile/web applications
+is in JSON format. The JSON files have the following structure:
+
+    
{"trans_id":31920,"date":"2014-04-26","time":"12:17:12","user_info":{"cust_id":22526,"device":"IOS5","state":"il"},"trans_info":{"prod_id":[174,2],"purch_flag":"false"}}
+    
{"trans_id":31026,"date":"2014-04-20","time":"13:50:29","user_info":{"cust_id":16368,"device":"AOS4.2","state":"nc"},"trans_info":{"prod_id":[],"purch_flag":"false"}}
+    
{"trans_id":33848,"date":"2014-04-10","time":"04:44:42","user_info":{"cust_id":21449,"device":"IOS6","state":"oh"},"trans_info":{"prod_id":[582],"purch_flag":"false"}}
+
+
+The clicks.json and clicks.campaign.json files contain metadata as part of the
+data itself (referred to as âself-describingâ data). Also note that the 
data
+elements are complex, or nested. The initial queries below do not show how to
+unpack the nested data, but they show that easy access to the data requires no
+setup beyond the definition of a workspace.
+
+### Query nested clickstream data
+
+#### Set the workspace to dfs.clicks:
+
+     0: jdbc:drill:> use dfs.clicks;
+    +------------+------------+
+    | ok | summary |
+    +------------+------------+
+    | true | Default schema changed to 'dfs.clicks' |
+    +------------+------------+
+
+In this case, setting the workspace is a mechanism for making queries easier
+to write. When you specify a file system workspace, you can shorten references
+to files in the FROM clause of your queries. Instead of having to provide the
+complete path to a file, you can provide the path relative to a directory
+location specified in the workspace. For example:
+
+    "location": "/mapr/demo.mapr.com/data/nested"
+
+Any file or directory that you want to query in this path can be referenced
+relative to this path. The clicks directory referred to in the following query
+is directly below the nested directory.
+
+#### Select 2 rows from the clicks.json file:
+
+    0: jdbc:drill:> select * from `clicks/clicks.json` limit 2;
+    +------------+------------+------------+------------+------------+
+    |  trans_id  |    date    |    time    | user_info  | trans_info |
+    +------------+------------+------------+------------+------------+
+    | 31920      | 2014-04-26 | 12:17:12   | 
{"cust_id":22526,"device":"IOS5","state":"il"} | 
{"prod_id":[174,2],"purch_flag":"false"} |
+    | 31026      | 2014-04-20 | 13:50:29   | 
{"cust_id":16368,"device":"AOS4.2","state":"nc"} | 
{"prod_id":[],"purch_flag":"false"} |
+    +------------+------------+------------+------------+------------+
+    2 rows selected
+
+Note that the FROM clause reference points to a specific file. Drill expands
+the traditional concept of a âtable referenceâ in a standard SQL FROM 
clause
+to refer to a file in a local or distributed file system.
+
+The only special requirement is the use of back ticks to enclose the file
+path. This is necessary whenever the file path contains Drill reserved words
+or characters.
+
+#### Select 2 rows from the campaign.json file:
+
+    0: jdbc:drill:> select * from `clicks/clicks.campaign.json` limit 2;
+    
+------------+------------+------------+------------+------------+------------+
+    |  trans_id  |    date    |    time    | user_info  |  ad_info   | 
trans_info |
+    
+------------+------------+------------+------------+------------+------------+
+    | 35232      | 2014-05-10 | 00:13:03   | 
{"cust_id":18520,"device":"AOS4.3","state":"tx"} | {"camp_id":"null"} | 
{"prod_id":[7,7],"purch_flag":"true"} |
+    | 31995      | 2014-05-22 | 16:06:38   | 
{"cust_id":17182,"device":"IOS6","state":"fl"} | {"camp_id":"null"} | 
{"prod_id":[],"purch_flag":"false"} |
+    
+------------+------------+------------+------------+------------+------------+
+    2 rows selected
+
+Notice that with a select * query, any complex data types such as maps and
+arrays return as JSON strings. You will see how to unpack this data using
+various SQL functions and operators in the next lesson.
+
+## Query Logs Data
+
+Unlike the previous example where we performed queries against clicks data in
+one file, logs data is stored as partitioned directories on the file system.
+The logs directory has three subdirectories:
+
+  * 2012
+  * 2013
+  * 2014
+
+Each of these year directories fans out to a set of numbered month
+directories, and each month directory contains a JSON file with log records
+for that month. The total number of records in all log files is 48000.
+
+The files in the logs directory and its subdirectories are JSON files. There
+are many of these files, but you can use Drill to query them all as a single
+data source, or to query a subset of the files.
+
+#### Set the workspace to dfs.logs:
+
+     0: jdbc:drill:> use dfs.logs;
+    +------------+------------+
+    | ok | summary |
+    +------------+------------+
+    | true | Default schema changed to 'dfs.logs' |
+    +------------+------------+
+
+#### Select 2 rows from the logs directory:
+
+    0: jdbc:drill:> select * from logs limit 2;
+    
+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
+    | dir0 | dir1 | trans_id | date | time | cust_id | device | state | 
camp_id | keywords | prod_id | purch_fl |
+    
+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
+    | 2014 | 8 | 24181 | 08/02/2014 | 09:23:52 | 0 | IOS5 | il | 2 | wait | 
128 | false |
+    | 2014 | 8 | 24195 | 08/02/2014 | 07:58:19 | 243 | IOS5 | mo | 6 | hmm | 
107 | false |
+    
+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+----------+
+
+Note that this is flat JSON data. The dfs.clicks workspace location property
+points to a directory that contains the logs directory, making the FROM clause
+reference for this query very simple. You do not have to refer to the complete
+directory path on the file system.
+
+The column names dir0 and dir1 are special Drill variables that identify
+subdirectories below the logs directory. In Lesson 3, you will do more complex
+queries that leverage these dynamic variables.
+
+#### Find the total number of rows in the logs directory (all files):
+
+    0: jdbc:drill:> select count(*) from logs;
+    +------------+
+    | EXPR$0 |
+    +------------+
+    | 48000 |
+    +------------+
+
+This query traverses all of the files in the logs directory and its
+subdirectories to return the total number of rows in those files.
+
+# What's Next
+
+Go to [Lesson 2: Run Queries with ANSI
+SQL](/drill/docs/lession-2-run-queries-with-ansi-sql).
+
+
+

http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/tutorial/004-lesson2.md
----------------------------------------------------------------------
diff --git a/_docs/tutorial/004-lesson2.md b/_docs/tutorial/004-lesson2.md
new file mode 100644
index 0000000..73c4329
--- /dev/null
+++ b/_docs/tutorial/004-lesson2.md
@@ -0,0 +1,388 @@
+---
+title: "Lession 2: Run Queries with ANSI SQL"
+parent: "Apache Drill Tutorial"
+---
+## Goal
+
+This lesson shows how to do some standard SQL analysis in Apache Drill: for
+example, summarizing data by using simple aggregate functions and connecting
+data sources by using joins. Note that Apache Drill provides ANSI SQL support,
+not a âSQL-likeâ interface.
+
+## Queries in This Lesson
+
+Now that you know what the data sources look like in their raw form, using
+select * queries, try running some simple but more useful queries on each data
+source. These queries demonstrate how Drill supports ANSI SQL constructs and
+also how you can combine data from different data sources in a single SELECT
+statement.
+
+  * Show an aggregate query on a single file or table. Use GROUP BY, WHERE, 
HAVING, and ORDER BY clauses.
+  * Perform joins between Hive, MapR-DB, and file system data sources.
+  * Use table and column aliases.
+  * Create a Drill view.
+
+## Aggregation
+
+
+### Set the schema to hive:
+
+    0: jdbc:drill:> use hive;
+    +------------+------------+
+    |     ok     |  summary   |
+    +------------+------------+
+    | true       | Default schema changed to 'hive' |
+    +------------+------------+
+    1 row selected
+
+### Return sales totals by month:
+
+    0: jdbc:drill:> select `month`, sum(order_total)
+    from orders group by `month` order by 2 desc;
+    +------------+------------+
+    | month | EXPR$1 |
+    +------------+------------+
+    | June | 950481 |
+    | May | 947796 |
+    | March | 836809 |
+    | April | 807291 |
+    | July | 757395 |
+    | October | 676236 |
+    | August | 572269 |
+    | February | 532901 |
+    | September | 373100 |
+    | January | 346536 |
+    +------------+------------+
+
+Drill supports SQL aggregate functions such as SUM, MAX, AVG, and MIN.
+Standard SQL clauses work in the same way in Drill queries as in relational
+database queries.
+
+Note that back ticks are required for the âmonthâ column only because 
âmonthâ
+is a reserved word in SQL.
+
+### Return the top 20 sales totals by month and state:
+
+    0: jdbc:drill:> select `month`, state, sum(order_total) as sales from 
orders group by `month`, state
+    order by 3 desc limit 20;
+    +------------+------------+------------+
+    |   month    |   state    |   sales    |
+    +------------+------------+------------+
+    | May        | ca         | 119586     |
+    | June       | ca         | 116322     |
+    | April      | ca         | 101363     |
+    | March      | ca         | 99540      |
+    | July       | ca         | 90285      |
+    | October    | ca         | 80090      |
+    | June       | tx         | 78363      |
+    | May        | tx         | 77247      |
+    | March      | tx         | 73815      |
+    | August     | ca         | 71255      |
+    | April      | tx         | 68385      |
+    | July       | tx         | 63858      |
+    | February   | ca         | 63527      |
+    | June       | fl         | 62199      |
+    | June       | ny         | 62052      |
+    | May        | fl         | 61651      |
+    | May        | ny         | 59369      |
+    | October    | tx         | 55076      |
+    | March      | fl         | 54867      |
+    | March      | ny         | 52101      |
+    +------------+------------+------------+
+    20 rows selected
+
+Note the alias for the result of the SUM function. Drill supports column
+aliases and table aliases.
+
+## HAVING Clause
+
+This query uses the HAVING clause to constrain an aggregate result.
+
+### Set the workspace to dfs.clicks
+
+    0: jdbc:drill:> use dfs.clicks;
+    +------------+------------+
+    |     ok     |  summary   |
+    +------------+------------+
+    | true       | Default schema changed to 'dfs.clicks' |
+    +------------+------------+
+    1 row selected
+
+### Return total number of clicks for devices that indicate high 
click-throughs:
+
+    0: jdbc:drill:> select t.user_info.device, count(*) from 
`clicks/clicks.json` t 
+    group by t.user_info.device
+    having count(*) > 1000;
+    +------------+------------+
+    |   EXPR$0   |   EXPR$1   |
+    +------------+------------+
+    | IOS5       | 11814      |
+    | AOS4.2     | 5986       |
+    | IOS6       | 4464       |
+    | IOS7       | 3135       |
+    | AOS4.4     | 1562       |
+    | AOS4.3     | 3039       |
+    +------------+------------+
+
+The aggregate is a count of the records for each different mobile device in
+the clickstream data. Only the activity for the devices that registered more
+than 1000 transactions qualify for the result set.
+
+## UNION Operator
+
+Use the same workspace as before (dfs.clicks).
+
+### Combine clicks activity from before and after the marketing campaign
+
+    0: jdbc:drill:> select t.trans_id transaction, t.user_info.cust_id 
customer from `clicks/clicks.campaign.json` t 
+    union all 
+    select u.trans_id, u.user_info.cust_id  from `clicks/clicks.json` u limit 
5;
+    +-------------+------------+
+    | transaction |  customer  |
+    +-------------+------------+
+    | 35232       | 18520      |
+    | 31995       | 17182      |
+    | 35760       | 18228      |
+    | 37090       | 17015      |
+    | 37838       | 18737      |
+    +-------------+------------+
+
+This UNION ALL query returns rows that exist in two files (and includes any
+duplicate rows from those files): `clicks.campaign.json` and `clicks.json`.
+
+## Subqueries
+
+### Set the workspace to hive:
+
+    0: jdbc:drill:> use hive;
+    +------------+------------+
+    |     ok     |  summary   |
+    +------------+------------+
+    | true       | Default schema changed to 'hive' |
+    +------------+------------+
+    
+### Compare order totals across states:
+
+    0: jdbc:drill:> select o1.cust_id, sum(o1.order_total) as ny_sales,
+    (select sum(o2.order_total) from hive.orders o2
+    where o1.cust_id=o2.cust_id and state='ca') as ca_sales
+    from hive.orders o1 where o1.state='ny' group by o1.cust_id
+    order by cust_id limit 20;
+    +------------+------------+------------+
+    |  cust_id   |  ny_sales  |  ca_sales  |
+    +------------+------------+------------+
+    | 1001       | 72         | 47         |
+    | 1002       | 108        | 198        |
+    | 1003       | 83         | null       |
+    | 1004       | 86         | 210        |
+    | 1005       | 168        | 153        |
+    | 1006       | 29         | 326        |
+    | 1008       | 105        | 168        |
+    | 1009       | 443        | 127        |
+    | 1010       | 75         | 18         |
+    | 1012       | 110        | null       |
+    | 1013       | 19         | null       |
+    | 1014       | 106        | 162        |
+    | 1015       | 220        | 153        |
+    | 1016       | 85         | 159        |
+    | 1017       | 82         | 56         |
+    | 1019       | 37         | 196        |
+    | 1020       | 193        | 165        |
+    | 1022       | 124        | null       |
+    | 1023       | 166        | 149        |
+    | 1024       | 233        | null       |
+    +------------+------------+------------+
+
+This example demonstrates Drill support for correlated subqueries. This query
+uses a subquery in the select list and correlates the result of the subquery
+with the outer query, using the cust_id column reference. The subquery returns
+the sum of order totals for California, and the outer query returns the
+equivalent sum, for the same cust_id, for New York.
+
+The result set is sorted by the cust_id and presents the sales totals side by
+side for easy comparison. Null values indicate customer IDs that did not
+register any sales in that state.
+
+## CAST Function
+
+### Use the maprdb workspace:
+
+    0: jdbc:drill:> use maprdb;
+    +------------+------------+
+    |     ok     |  summary   |
+    +------------+------------+
+    | true       | Default schema changed to 'maprdb' |
+    +------------+------------+
+    1 row selected
+
+### Return customer data with appropriate data types
+
+    0: jdbc:drill:> select cast(row_key as int) as cust_id, 
cast(t.personal.name as varchar(20)) as name, 
+    cast(t.personal.gender as varchar(10)) as gender, cast(t.personal.age as 
varchar(10)) as age,
+    cast(t.address.state as varchar(4)) as state, cast(t.loyalty.agg_rev as 
dec(7,2)) as agg_rev, 
+    cast(t.loyalty.membership as varchar(20)) as membership
+    from customers t limit 5;
+    +------------+------------+------------+------------+------------    
+------------+------------+
+    |  cust_id   |    name    |   gender   |    age     |   state    |  
agg_rev   | membership |
+    
+------------+------------+------------+------------+------------+------------+------------+
+    | 10001      | "Corrine Mecham" | "FEMALE"   | "15-20"    | "va"       | 
197.00     | "silver"   |
+    | 10005      | "Brittany Park" | "MALE"     | "26-35"    | "in"       | 
230.00     | "silver"   |
+    | 10006      | "Rose Lokey" | "MALE"     | "26-35"    | "ca"       | 
250.00     | "silver"   |
+    | 10007      | "James Fowler" | "FEMALE"   | "51-100"   | "me"       | 
263.00     | "silver"   |
+    | 10010      | "Guillermo Koehler" | "OTHER"    | "51-100"   | "mn"       
| 202.00     | "silver"   |
+    
+------------+------------+------------+------------+------------+------------+------------+
+    5 rows selected
+
+Note the following features of this query:
+
+  * The CAST function is required for every column in the table. This function 
returns the MapR-DB/HBase binary data as readable integers and strings. 
Alternatively, you can use CONVERT_TO/CONVERT_FROM functions to decode the 
columns. CONVERT_TO and CONVERT_FROM are more efficient than CAST in most cases.
+  * The row_key column functions as the primary key of the table (a customer 
ID in this case).
+  * The table alias t is required; otherwise the column family names would be 
parsed as table names and the query would return an error.
+
+### Remove the quotes from the strings:
+
+You can use the regexp_replace function to remove the quotes around the
+strings in the query results. For example, to return a state name va instead
+of âvaâ:
+
+    0: jdbc:drill:> select cast(row_key as int), 
regexp_replace(cast(t.address.state as varchar(10)),'"','')
+    from customers t limit 1;
+    +------------+------------+
+    |   EXPR$0   |   EXPR$1   |
+    +------------+------------+
+    | 10001      | va         |
+    +------------+------------+
+    1 row selected
+
+## CREATE VIEW Command
+
+    0: jdbc:drill:> use dfs.views;
+    +------------+------------+
+    | ok | summary |
+    +------------+------------+
+    | true | Default schema changed to 'dfs.views' |
+    +------------+------------+
+
+### Use a mutable workspace:
+
+A mutable (or writable) workspace is a workspace that is enabled for 
âwriteâ
+operations. This attribute is part of the storage plugin configuration. You
+can create Drill views and tables in mutable workspaces.
+
+### Create a view on a MapR-DB table
+
+    0: jdbc:drill:> create or replace view custview as select cast(row_key as 
int) as cust_id,
+    cast(t.personal.name as varchar(20)) as name, 
+    cast(t.personal.gender as varchar(10)) as gender, 
+    cast(t.personal.age as varchar(10)) as age, 
+    cast(t.address.state as varchar(4)) as state,
+    cast(t.loyalty.agg_rev as dec(7,2)) as agg_rev,
+    cast(t.loyalty.membership as varchar(20)) as membership
+    from maprdb.customers t;
+    +------------+------------+
+    |     ok     |  summary   |
+    +------------+------------+
+    | true       | View 'custview' replaced successfully in 'dfs.views' schema 
|
+    +------------+------------+
+    1 row selected
+
+Drill provides CREATE OR REPLACE VIEW syntax similar to relational databases
+to create views. Use the OR REPLACE option to make it easier to update the
+view later without having to remove it first. Note that the FROM clause in
+this example must refer to maprdb.customers. The MapR-DB tables are not
+directly visible to the dfs.views workspace.
+
+Unlike a traditional database where views typically are DBA/developer-driven
+operations, file system-based views in Drill are very lightweight. A view is
+simply a special file with a specific extension (.drill). You can store views
+even in your local file system or point to a specific workspace. You can
+specify any query against any Drill data source in the body of the CREATE VIEW
+statement.
+
+Drill provides a decentralized metadata model. Drill is able to query metadata
+defined in data sources such as Hive, HBase, and the file system. Drill also
+supports the creation of metadata in the file system.
+
+### Query data from the view:
+
+    0: jdbc:drill:> select * from custview limit 1;
+    
+------------+------------+------------+------------+------------+------------+------------+
+    |  cust_id   |    name    |   gender   |    age     |   state    |  
agg_rev   | membership |
+    
+------------+------------+------------+------------+------------+------------+------------+
+    | 10001      | "Corrine Mecham" | "FEMALE"   | "15-20"    | "va"       | 
197.00     | "silver"   |
+    
+------------+------------+------------+------------+------------+------------+------------+
+
+Once the users get an idea on what data is available by exploring it directly
+from file system , views can be used as a way to take the data in downstream
+tools like Tableau, Microstrategy etc for downstream analysis and
+visualization. For these tools, a view appears simply as a âtableâ with
+selectable âcolumnsâ in it.
+
+## Query Across Data Sources
+
+Continue using `dfs.views` for this query.
+
+### Join the customers view and the orders table:
+
+    0: jdbc:drill:> select membership, sum(order_total) as sales from 
hive.orders, custview
+    where orders.cust_id=custview.cust_id
+    group by membership order by 2;
+    +------------+------------+
+    | membership |   sales    |
+    +------------+------------+
+    | "basic"    | 380665     |
+    | "silver"   | 708438     |
+    | "gold"     | 2787682    |
+    +------------+------------+
+    3 rows selected
+
+In this query, we are reading data from a MapR-DB table (represented by
+custview) and combining it with the order information in Hive. When doing
+cross data source queries such as this, you need to use fully qualified
+table/view names. For example, the orders table is prefixed by âhive,â 
which
+is the storage plugin name registered with Drill. We are not using any prefix
+for âcustviewâ because we explicitly switched the dfs.views workspace where
+custview is stored.
+
+Note: If the results of any of your queries appear to be truncated because the
+rows are wide, set the maximum width of the display to 10000:
+
+Do not use a semicolon for this SET command.
+
+### Join the customers, orders, and clickstream data:
+
+    0: jdbc:drill:> select custview.membership, sum(orders.order_total) as 
sales from hive.orders, custview,
+    dfs.`/mapr/demo.mapr.com/data/nested/clicks/clicks.json` c 
+    where orders.cust_id=custview.cust_id and 
orders.cust_id=c.user_info.cust_id 
+    group by custview.membership order by 2;
+    +------------+------------+
+    | membership |   sales    |
+    +------------+------------+
+    | "basic"    | 372866     |
+    | "silver"   | 728424     |
+    | "gold"     | 7050198    |
+    +------------+------------+
+    3 rows selected
+
+This three-way join selects from three different data sources in one query:
+
+  * hive.orders table
+  * custview (a view of the HBase customers table)
+  * clicks.json file
+
+The join column for both sets of join conditions is the cust_id column. The
+views workspace is used for this query so that custview can be accessed. The
+hive.orders table is also visible to the query.
+
+However, note that the JSON file is not directly visible from the views
+workspace, so the query specifies the full path to the file:
+
+    dfs.`/mapr/demo.mapr.com/data/nested/clicks/clicks.json`
+
+
+## What's Next
+
+Go to [Lesson 3: Run Queries on Complex Data 
Types](/drill/docs/lession-3-run-queries-on-complex-data-types). 
+
+
+

[02/13] drill git commit: DRILL-2315: Confluence conversion plus fixes

Reply via email to