[3/3] drill git commit: DRILL-2316: Add hive, parquet, json ref docs, basics tutorial, and minor edits

2015-03-02 Thread bridgetb
DRILL-2316: Add hive, parquet, json ref docs, basics tutorial, and minor edits


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/2a34ac89
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/2a34ac89
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/2a34ac89

Branch: refs/heads/gh-pages-master
Commit: 2a34ac8931326f30b34986868f4c4e5ad61fec59
Parents: d959a21
Author: Kristine Hahn kh...@maprtech.com
Authored: Wed Feb 25 18:31:56 2015 -0800
Committer: Bridget Bevens bbev...@maprtech.com
Committed: Mon Mar 2 14:18:23 2015 -0800

--
 _docs/009-datasources.md|  27 ++
 _docs/010-dev-custom-func.md|  37 ++
 _docs/011-manage.md |  14 +
 _docs/012-develop.md|   9 +
 _docs/013-rn.md | 191 
 _docs/014-contribute.md |   9 +
 _docs/015-sample-ds.md  |  10 +
 _docs/016-design.md |  13 +
 _docs/018-progress.md   |   8 +
 _docs/019-bylaws.md | 170 
 _docs/connect/005-reg-hive.md   |   7 +-
 _docs/connect/007-mongo-plugin.md   |   6 +-
 _docs/data-sources/001-hive-types.md| 188 
 _docs/data-sources/002-hive-udf.md  |  39 ++
 _docs/data-sources/003-parquet-ref.md   | 287 
 _docs/data-sources/004-json-ref.md  | 432 +++
 _docs/dev-custom-fcn/002-dev-aggregate.md   |   2 +-
 _docs/img/Untitled.png  | Bin 39796 - 0 bytes
 _docs/img/json-workaround.png   | Bin 0 - 20786 bytes
 _docs/install/001-drill-in-10.md|   2 +-
 _docs/interfaces/001-odbc-win.md|   3 +-
 .../interfaces/odbc-win/003-connect-odbc-win.md |   2 +-
 .../interfaces/odbc-win/004-tableau-examples.md |   6 +-
 _docs/manage/002-start-stop.md  |   2 +-
 _docs/manage/003-ports.md   |   2 +-
 _docs/manage/conf/002-startup-opt.md|   2 +-
 _docs/manage/conf/003-plan-exec.md  |   3 +-
 _docs/manage/conf/004-persist-conf.md   |   2 +-
 _docs/query/001-get-started.md  |  75 
 _docs/query/001-query-fs.md |  35 --
 _docs/query/002-query-fs.md |  35 ++
 _docs/query/002-query-hbase.md  | 151 ---
 _docs/query/003-query-complex.md|  56 ---
 _docs/query/003-query-hbase.md  | 151 +++
 _docs/query/004-query-complex.md|  56 +++
 _docs/query/004-query-hive.md   |  45 --
 _docs/query/005-query-hive.md   |  45 ++
 _docs/query/005-query-info-skema.md | 109 -
 _docs/query/006-query-info-skema.md | 109 +
 _docs/query/006-query-sys-tbl.md| 159 ---
 _docs/query/007-query-sys-tbl.md| 159 +++
 _docs/query/get-started/001-lesson1-connect.md  |  88 
 _docs/query/get-started/002-lesson2-download.md | 103 +
 _docs/query/get-started/003-lesson3-plugin.md   | 142 ++
 _docs/sql-ref/003-functions.md  |  19 +-
 _docs/sql-ref/005-cmd-summary.md|   2 +-
 _docs/sql-ref/006-reserved-wds.md   |   2 +-
 _docs/sql-ref/data-types/001-date.md|   4 +-
 _docs/tutorial/005-lesson3.md   |   2 +-
 49 files changed, 2433 insertions(+), 587 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/009-datasources.md
--
diff --git a/_docs/009-datasources.md b/_docs/009-datasources.md
new file mode 100644
index 000..3f3d431
--- /dev/null
+++ b/_docs/009-datasources.md
@@ -0,0 +1,27 @@
+---
+title: Data Sources and File Formats
+---
+Included in the data sources that  Drill supports are these key data sources:
+
+* Hbase
+* Hive
+* MapR-DB
+* File system
+
+. . .
+
+Drill supports the following input formats for data:
+
+* CSV (Comma-Separated-Values)
+* TSV (Tab-Separated-Values)
+* PSV (Pipe-Separated-Values)
+* Parquet
+* JSON
+
+You set the input format for data coming from data sources to Drill in the 
workspace portion of the [storage 
plugin](/drill/docs/storage-plugin-registration) definition. The default input 
format in Drill is Parquet. 
+
+You change the [sys.options table](/drill/docs/planning-and-execution-options) 
to set the output format of Drill data. The default storage format for Drill 
Create Table AS (CTAS) statements is Parquet.
+
+
+ 
+

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/010-dev-custom-func.md

[1/3] drill git commit: DRILL-2316: Add hive, parquet, json ref docs, basics tutorial, and minor edits

2015-03-02 Thread bridgetb
Repository: drill
Updated Branches:
  refs/heads/gh-pages-master d959a2100 - 2a34ac893


http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/query/get-started/003-lesson3-plugin.md
--
diff --git a/_docs/query/get-started/003-lesson3-plugin.md 
b/_docs/query/get-started/003-lesson3-plugin.md
new file mode 100644
index 000..9aab881
--- /dev/null
+++ b/_docs/query/get-started/003-lesson3-plugin.md
@@ -0,0 +1,142 @@
+---
+title: Lesson 3: Create a Storage Plugin
+parent: Getting Started Tutorial
+---
+The Drill default storage plugins support common file formats. If you need
+support for some other file format, create a custom storage plugin. You can 
also create a storage plugin to simplify querying file having long path names. 
A workspace name replaces the long path name.
+
+This lesson covers how to create and use a storage plugin to simplify queries. 
First,
+you create the storage plugin in the Drill Web UI. Next, you connect to the
+file through the plugin to query a file, and then a directory, and finally you
+query multiple files in a directory.
+
+## Create a Storage Plugin
+
+You can create a storage plugin using the Apache Drill Web UI.
+
+  1. Create an `ngram` directory on your file system.
+  2. Copy `googlebooks-eng-all-5gram-20120701-zo.tsv` to the `ngram` directory.
+  3. Open the Drill Web UI by navigating to http://localhost:8047/storage.   
+ To open the Drill Web UI, SQLLine must still be running.
+  4. In New Storage Plugin, type `myplugin`.  
+ ![new plugin]({{ site.baseurl }}/docs/img/ngram_plugin.png)
+  5. Click **Create**.  
+ The Configuration screen appears.
+  6. Replace null with the following storage plugin definition, except on the 
location line, use the path to your `ngram` directory instead of the 
drilluser's path and give your workspace an arbitrary name, for example, ngram:
+  
+{
+  type: file,
+  enabled: true,
+  connection: file:///,
+  workspaces: {
+ngram: {
+  location: /Users/drilluser/ngram,
+  writable: false,
+  defaultInputFormat: null
+   }
+ },
+ formats: {
+   tsv: {
+ type: text,
+ extensions: [
+   tsv
+ ],
+ delimiter: \t
+}
+  }
+}
+
+  7. Click **Create**.  
+ The success message appears briefly.
+  8. Click **Back**.  
+ The new plugin appears in Enabled Storage Plugins.  
+ ![new plugin]({{ site.baseurl }}/docs/img/ngram_plugin.png) 
+  9. Go back to the SQLLine prompt in the CLI, and list the storage plugins. 
Press RETURN in the CLI to get a prompt if necessary.
+
+Your custom plugin appears in the list and has two workspaces: the `ngram`
+workspace that you defined and a default workspace.
+
+## Connect to and Query a File
+
+When querying the same data source repeatedly, avoiding long path names is
+important. This exercise demonstrates how to simplify the query. Instead of
+using the full path to the Ngram file, you use dot notation in the FROM
+clause.
+
+``workspace name.`location```
+
+This syntax assumes you connected to a storage plugin that defines the
+location of the data. To query the data source while you are _not_ connected to
+that storage plugin, include the plugin name:
+
+``plugin name.workspace name.`location```
+
+This exercise shows how to query Ngram data when you are, and when you are
+not, connected to `myplugin`.
+
+  1. Connect to the ngram file through the custom storage plugin.  
+ `USE myplugin;`
+  2. Get data about Zoological Journal of the Linnean that appears more than 
250 times a year in the books that Google scans. In the FROM clause, instead of 
using the full path to the file as you did in the last exercise, connect to the 
data using the storage plugin workspace name ngram.
+  
+ SELECT COLUMNS[0], 
+COLUMNS[1], 
+COLUMNS[2] 
+ FROM ngram.`/googlebooks-eng-all-5gram-20120701-zo.tsv` 
+ WHERE ((columns[0] = 'Zoological Journal of the Linnean') 
+  AND (columns[2]  250)) 
+ LIMIT 10;
+
+ The output consists of 5 rows of data.  
+  3. Switch to the `dfs` storage plugin.
+  
+ 0: jdbc:drill:zk=local USE dfs;
+ +++
+ | ok |  summary   |
+ +++
+ | true   | Default schema changed to 'dfs' |
+ +++
+ 1 row selected (0.019 seconds)
+  4. Query the TSV file again. Because you switched to `dfs`, Drill does not 
know the location of the file. To provide the information to Drill, preface the 
file name with the storage plugin and workspace names in the FROM clause.  
+  
+ SELECT COLUMNS[0], 
+COLUMNS[1], 
+COLUMNS[2] 
+ FROM 

[2/3] drill git commit: DRILL-2316: Add hive, parquet, json ref docs, basics tutorial, and minor edits

2015-03-02 Thread bridgetb
http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/manage/002-start-stop.md
--
diff --git a/_docs/manage/002-start-stop.md b/_docs/manage/002-start-stop.md
index 76a76f4..d37f840 100644
--- a/_docs/manage/002-start-stop.md
+++ b/_docs/manage/002-start-stop.md
@@ -28,7 +28,7 @@ can indicate the schema name when you invoke SQLLine.
 To start SQLLine, issue the appropriate command for your Drill installation
 type:
 
-table tbodytrtd valign=topstrongDrill Install 
Type/strong/tdtd valign=topstrongExample/strong/tdtd 
valign=topstrongCommand/strong/td/trtrtd 
valign=topEmbedded/tdtd valign=topDrill installed locally (embedded 
mode);Hive with embedded metastore/tdtd valign=topTo connect without 
specifying a schema, navigate to the Drill installation directory and issue the 
following command:code$ bin/sqlline -u jdbc:drill:zk=local -n admin -p admin 
/codespan /spanOnce you are in the prompt, you can issuecode USE 
lt;schemagt; /codeor you can use absolute notation: 
codeschema.table.column./codeTo connect to a schema directly, issue the 
command with the schema name:code$ bin/sqlline -u 
jdbc:drill:schema=lt;databasegt;;zk=local -n admin -p 
admin/code/td/trtrtd valign=topDistributed/tdtd 
valign=topDrill installed in distributed mode;Hive with remote 
metastore;HBase/tdtd valign=topTo connect without specify
 ing a schema, navigate to the Drill installation directory and issue the 
following command:code$ bin/sqlline -u 
jdbc:drill:zk=lt;zk1hostgt;:lt;portgt;,lt;zk2hostgt;:lt;portgt;,lt;zk3hostgt;:lt;portgt;
 -n admin -p admin/codeOnce you are in the prompt, you can issuecode USE 
lt;schemagt; /codeor you can use absolute notation: 
codeschema.table.column./codeTo connect to a schema directly, issue the 
command with the schema name:code$ bin/sqlline -u 
jdbc:drill:schema=lt;databasegt;;zk=lt;zk1hostgt;:lt;portgt;,lt;zk2hostgt;:lt;portgt;,lt;zk3hostgt;:lt;portgt;
 -n admin -p admin/code/td/tr/tbody/table
+table tbodytrtd valign=topstrongDrill Install 
Type/strong/tdtd valign=topstrongExample/strong/tdtd 
valign=topstrongCommand/strong/td/trtrtd 
valign=topEmbedded/tdtd valign=topDrill installed locally (embedded 
mode);Hive with embedded metastore/tdtd valign=topTo connect without 
specifying a schema, navigate to the Drill installation directory and issue the 
following command:code$ bin/sqlline -u jdbc:drill:zk=local -n admin -p admin 
/codespan /spanOnce you are in the prompt, you can issuecode USE 
lt;schemagt; /codeor you can use absolute notation: 
codeschema.table.column./codeTo connect to a schema directly, issue the 
command with the schema name:code$ bin/sqlline -u 
jdbc:drill:schema=lt;databasegt;;zk=local -n admin -p 
admin/code/td/trtrtd valign=topDistributed/tdtd 
valign=topDrill installed in distributed mode;Hive with remote 
metastore;HBase/tdtd valign=topTo connect without specify
 ing a schema, navigate to the Drill installation directory and issue the 
following command:code$ bin/sqlline -u 
jdbc:drill:zk=lt;zk1hostgt;:lt;portgt;,lt;zk2hostgt;:lt;portgt;,lt;zk3hostgt;:lt;portgt;
 -n admin -p admin/codeOnce you are in the prompt, you can issuecode USE 
lt;schemagt; /codeor you can use absolute notation: 
codeschema.table.column./codeTo connect to a schema directly, issue the 
command with the schema name:code$ bin/sqlline -u 
jdbc:drill:schema=lt;databasegt;;zk=lt;zk1hostgt;:lt;portgt;,lt;zk2hostgt;:lt;portgt;,lt;zk3hostgt;:lt;portgt;
 -n admin -p admin/code/td/tr/tbody/table/div
   
 When SQLLine starts, the system displays the following prompt:
 

http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/manage/003-ports.md
--
diff --git a/_docs/manage/003-ports.md b/_docs/manage/003-ports.md
index df1d362..c72beff 100644
--- a/_docs/manage/003-ports.md
+++ b/_docs/manage/003-ports.md
@@ -5,5 +5,5 @@ parent: Manage Drill
 The following table provides a list of the ports that Drill uses, the port
 type, and a description of how Drill uses the port:
 
-table tbodytrth Port/thth colspan=1 Type/thth 
Description/th/trtrtd valign=top 8047/tdtd valign=top 
colspan=1 TCP/tdtd valign=top Needed for span style=color: 
rgb(34,34,34);the Drill Web UI./spanspan style=color: rgb(34,34,34); 
/span/td/trtrtd valign=top 31010/tdtd valign=top colspan=1 
TCP/tdtd valign=top User port address. Used between nodes in a Drill 
cluster. br /Needed for an external client, such as Tableau, to connect into 
thebr /cluster nodes. Also needed for the Drill Web UI./td/trtrtd 
valign=top 31011/tdtd valign=top colspan=1 TCP/tdtd valign=top 
Control port address. Used between nodes in a Drill cluster. br /Needed for 
multi-node installation of Apache Drill./td/trtrtd valign=top 
colspan=1 31012/tdtd valign=top colspan=1 TCP/tdtd valign=top 
colspan=1 Data port address. Used between nodes in a Drill cluster. br 
/Needed for multi-node ins
 tallation of Apache 

[2/2] drill git commit: DRILL-2336 plugin updates

2015-03-02 Thread bridgetb
DRILL-2336 plugin updates


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/0119fdde
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/0119fdde
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/0119fdde

Branch: refs/heads/gh-pages-master
Commit: 0119fdde5ebb1a4921822ed213039bdbbbec4e71
Parents: 2a34ac8
Author: Kristine Hahn kh...@maprtech.com
Authored: Mon Mar 2 17:25:46 2015 -0800
Committer: Bridget Bevens bbev...@maprtech.com
Committed: Mon Mar 2 17:53:58 2015 -0800

--
 _docs/005-connect.md|  25 +-
 _docs/connect/001-plugin-reg.md |  43 ++--
 _docs/connect/002-plugin-conf.md| 123 ++
 _docs/connect/002-workspaces.md |  74 --
 _docs/connect/003-reg-fs.md |  64 -
 _docs/connect/003-workspaces.md |  74 ++
 _docs/connect/004-reg-fs.md |  64 +
 _docs/connect/004-reg-hbase.md  |  32 ---
 _docs/connect/005-reg-hbase.md  |  34 +++
 _docs/connect/005-reg-hive.md   |  86 ---
 _docs/connect/006-default-frmt.md   |  60 -
 _docs/connect/006-reg-hive.md   |  83 +++
 _docs/connect/007-default-frmt.md   |  60 +
 _docs/connect/007-mongo-plugin.md   | 167 -
 _docs/connect/008-mapr-db-plugin.md |  31 ---
 _docs/connect/008-mongo-plugin.md   | 167 +
 _docs/connect/009-mapr-db-plugin.md |  30 +++
 _docs/img/StoragePluginConfig.png   | Bin 20403 - 0 bytes
 _docs/img/data-sources-schemachg.png| Bin 0 - 8071 bytes
 _docs/img/datasources-json-bracket.png  | Bin 0 - 30129 bytes
 _docs/img/datasources-json.png  | Bin 0 - 16364 bytes
 _docs/img/get2kno_plugin.png| Bin 0 - 55794 bytes
 _docs/img/json-workaround.png   | Bin 20786 - 27547 bytes
 _docs/img/plugin-default.png| Bin 0 - 56412 bytes
 _docs/install/001-drill-in-10.md|   4 +-
 _docs/sql-ref/data-types/001-date.md|   8 +-
 _docs/tutorial/002-get2kno-sb.md| 241 ++-
 _docs/tutorial/003-lesson1.md   |  44 ++--
 _docs/tutorial/005-lesson3.md   | 100 
 .../install-sandbox/001-install-mapr-vm.md  |   2 +-
 .../install-sandbox/002-install-mapr-vb.md  |   2 +-
 31 files changed, 808 insertions(+), 810 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/drill/blob/0119fdde/_docs/005-connect.md
--
diff --git a/_docs/005-connect.md b/_docs/005-connect.md
index b48d200..3c60b2d 100644
--- a/_docs/005-connect.md
+++ b/_docs/005-connect.md
@@ -1,24 +1,24 @@
 ---
-title: Connect to Data Sources
+title: Connect to a Data Source
 ---
-Apache Drill serves as a query layer that connects to data sources through
-storage plugins. Drill uses the storage plugins to interact with data sources.
-You can think of a storage plugin as a connection between Drill and a data
-source.
+A storage plugin is an interface for connecting to a data source to read and 
write data. Apache Drill connects to a data source, such as a file on the file 
system or a Hive metastore, through a storage plugin. When you execute a query, 
Drill gets the plugin name you provide in FROM clause of your query. 
 
+In addition to the connection string, the storage plugin configures the 
workspace and file formats for reading and writing data, as described in 
subsequent sections. 
+
+## Storage Plugins Internals
 The following image represents the storage plugin layer between Drill and a
 data source:
 
 ![drill query flow]({{ site.baseurl }}/docs/img/storageplugin.png)
 
-Storage plugins provide the following information to Drill:
+A storage plugin provides the following information to Drill:
 
   * Metadata available in the underlying data source
   * Location of data
   * Interfaces that Drill can use to read from and write to data sources
   * A set of storage plugin optimization rules that assist with efficient and 
faster execution of Drill queries, such as pushdowns, statistics, and partition 
awareness
 
-Storage plugins perform scanner and writer functions, and inform the metadata
+A storage plugin performs scanner and writer functions, and informs the 
metadata
 repository of any known metadata, such as:
 
   * Schema
@@ -27,15 +27,6 @@ repository of any known metadata, such as:
   * Secondary indices
   * Number of blocks
 
-Storage plugins inform the execution engine of any native capabilities, such
+A storage plugin informs the execution engine of any native capabilities, such
 as predicate 

[1/2] drill git commit: DRILL-2336 plugin updates

2015-03-02 Thread bridgetb
Repository: drill
Updated Branches:
  refs/heads/gh-pages-master 2a34ac893 - 0119fdde5


http://git-wip-us.apache.org/repos/asf/drill/blob/0119fdde/_docs/tutorial/003-lesson1.md
--
diff --git a/_docs/tutorial/003-lesson1.md b/_docs/tutorial/003-lesson1.md
index 119d67f..577ede3 100644
--- a/_docs/tutorial/003-lesson1.md
+++ b/_docs/tutorial/003-lesson1.md
@@ -22,26 +22,17 @@ This lesson consists of select * queries on each data 
source.
 
 ## Before You Begin
 
-### Start sqlline
+### Start SQLLine
 
-If sqlline is not already started, use a Terminal or Command window to log
-into the demo VM as root, then enter `sqlline`:
+If SQLLine is not already started, use a Terminal or Command window to log
+into the demo VM as root, then enter `sqlline`, as described in [Getting to 
Know the Sandbox](/docs/getting-to-know-the-drill-sandbox):
 
-$ ssh root@10.250.0.6
-Password:
-Last login: Mon Sep 15 13:46:08 2014 from 10.250.0.28
-Welcome to your Mapr Demo virtual machine.
-[root@maprdemo ~]# sqlline
-sqlline version 1.1.6
-0: jdbc:drill:
-
-You can run queries from this prompt to complete the tutorial. To exit from
-`sqlline`, type:
+You can run queries from the `sqlline` prompt to complete the tutorial. To 
exit from
+SQLLine, type:
 
 0: jdbc:drill: !quit
 
-Note that though this tutorial demonstrates the queries using SQLLine, you can
-also execute queries using the Drill Web UI.
+Examples in this tutorial use SQLLine. You can also execute queries using the 
Drill Web UI.
 
 ### List the available workspaces and databases:
 
@@ -55,7 +46,6 @@ also execute queries using the Drill Web UI.
 | dfs.root|
 | dfs.views   |
 | dfs.clicks  |
-| dfs.data|
 | dfs.tmp |
 | sys |
 | maprdb  |
@@ -64,9 +54,9 @@ also execute queries using the Drill Web UI.
 +-+
 12 rows selected
 
-Note that this command exposes all the metadata available from the storage
-plugins configured with Drill as a set of schemas. This includes the Hive and
-MapR-DB databases as well as the workspaces configured in the file system. As
+This command exposes all the metadata available from the storage
+plugins configured with Drill as a set of schemas. The Hive and
+MapR-DB databases, file system, and other data are configured in the file 
system. As
 you run queries in the tutorial, you will switch among these schemas by
 submitting the USE command. This behavior resembles the ability to use
 different database schemas (namespaces) in a relational database system.
@@ -113,13 +103,13 @@ on the metadata available in the Hive metastore.
 
 0: jdbc:drill: select * from orders limit 5;
 
++++++-+
-| order_id | month | cust_id | state | prod_id | order_total |
+|  order_id  |   month|  cust_id   |   state|  prod_id   | 
order_total |
 
++++++-+
-| 67212 | June | 10001 | ca | 909 | 13 |
-| 70302 | June | 10004 | ga | 420 | 11 |
-| 69090 | June | 10011 | fl | 44 | 76 |
-| 68834 | June | 10012 | ar | 0 | 81 |
-| 71220 | June | 10018 | az | 411 | 24 |
+| 67212  | June   | 10001  | ca | 909| 13  
|
+| 70302  | June   | 10004  | ga | 420| 11  
|
+| 69090  | June   | 10011  | fl | 44 | 76  
|
+| 68834  | June   | 10012  | ar | 0  | 81  
|
+| 71220  | June   | 10018  | az | 411| 24  
|
 
++++++-+
 
 Because orders is a Hive table, you can query the data in the same way that
@@ -256,7 +246,7 @@ a relational database “table.” Therefore, you can 
perform SQL operations
 directly on files and directories without the need for up-front schema
 definitions or schema management for any model changes. The schema is
 discovered on the fly based on the query. Drill supports queries on a variety
-of file formats including text, CSV, Parquet, and JSON in the 0.5 release.
+of file formats including text, CSV, Parquet, and JSON.
 
 In this example, the clickstream data coming from the mobile/web applications
 is in JSON format. The JSON files have the following structure:
@@ -285,7 +275,7 @@ setup beyond the definition of a workspace.
 
 In this case, setting the workspace is a mechanism for making queries easier
 to write. When you specify a file system workspace, you can shorten references
-to files in the FROM clause of your queries. Instead of having to provide the
+to files in your queries. Instead of having to provide the
 complete path to a file, you can provide the path relative to a directory
 location specified in the workspace. For 

drill git commit: DRILL-2338: Fix Decimal38/Decimal28 vector's get() to copy the scale and precision into the holder

2015-03-02 Thread mehant
Repository: drill
Updated Branches:
  refs/heads/master 3442215fd - a84f7b9e8


DRILL-2338: Fix Decimal38/Decimal28 vector's get() to copy the scale and 
precision into the holder


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/a84f7b9e
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/a84f7b9e
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/a84f7b9e

Branch: refs/heads/master
Commit: a84f7b9e88b1827e6b4da8cdd25c6d4f12dcdadc
Parents: 3442215
Author: Mehant Baid meha...@gmail.com
Authored: Fri Feb 27 19:21:51 2015 -0800
Committer: Mehant Baid meha...@gmail.com
Committed: Mon Mar 2 11:13:27 2015 -0800

--
 .../codegen/templates/FixedValueVectors.java| 12 ++
 .../physical/impl/writer/TestParquetWriter.java | 25 
 2 files changed, 27 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/drill/blob/a84f7b9e/exec/java-exec/src/main/codegen/templates/FixedValueVectors.java
--
diff --git a/exec/java-exec/src/main/codegen/templates/FixedValueVectors.java 
b/exec/java-exec/src/main/codegen/templates/FixedValueVectors.java
index b5011e6..6cea8c8 100644
--- a/exec/java-exec/src/main/codegen/templates/FixedValueVectors.java
+++ b/exec/java-exec/src/main/codegen/templates/FixedValueVectors.java
@@ -394,17 +394,8 @@ public final class ${minor.class}Vector extends 
BaseDataValueVector implements F
 #elseif (minor.class == Decimal28Sparse) || (minor.class == 
Decimal38Sparse) || (minor.class == Decimal28Dense) || (minor.class == 
Decimal38Dense)
 
 public void get(int index, ${minor.class}Holder holder) {
-
 holder.start = index * ${type.width};
-
 holder.buffer = data;
-
-/* The buffer within the value vector is little endian.
- * For the dense representation though, we use big endian
- * byte ordering (internally). This is because we shift bits to the 
right and
- * big endian ordering makes sense for this purpose.  So we have to 
deal with
- * the sign bit for the two representation in a slightly different 
fashion
- */
 holder.scale = getField().getScale();
 holder.precision = getField().getPrecision();
 }
@@ -412,8 +403,9 @@ public final class ${minor.class}Vector extends 
BaseDataValueVector implements F
 public void get(int index, Nullable${minor.class}Holder holder) {
 holder.isSet = 1;
 holder.start = index * ${type.width};
-
 holder.buffer = data;
+holder.scale = getField().getScale();
+holder.precision = getField().getPrecision();
 }
 
   @Override

http://git-wip-us.apache.org/repos/asf/drill/blob/a84f7b9e/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
--
diff --git 
a/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
 
b/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
index 7298f28..76328c6 100644
--- 
a/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
+++ 
b/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
@@ -21,6 +21,7 @@ import static org.junit.Assert.assertEquals;
 
 import java.io.UnsupportedEncodingException;
 import java.lang.reflect.Array;
+import java.math.BigDecimal;
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.HashMap;
@@ -360,6 +361,30 @@ public class TestParquetWriter extends BaseTestQuery {
 compareParquetReadersColumnar(wr_returning_customer_sk, 
dfs.`/tmp/web_returns`);
   }
 
+  @Test
+  public void testWriteDecimal() throws Exception {
+String outputTable = decimal_test;
+Path path = new Path(/tmp/ + outputTable);
+if (fs.exists(path)) {
+  fs.delete(path, true);
+}
+String ctas = String.format(use dfs.tmp;  +
+create table %s as select  +
+cast('1.2' as decimal(38, 2)) col1, cast('1.2' as decimal(28, 2)) 
col2  +
+from cp.`employee.json` limit 1, outputTable);
+
+test(ctas);
+
+BigDecimal result = new BigDecimal(1.20);
+
+testBuilder()
+.unOrdered()
+.sqlQuery(String.format(select col1, col2 from %s , outputTable))
+.baselineColumns(col1, col2)
+.baselineValues(result, result)
+.go();
+  }
+
   public void runTestAndValidate(String selection, String validationSelection, 
String inputTable, String outputFile) throws Exception {
 
 Path path = new Path(/tmp/ + outputFile);



drill git commit: DRILL-2236: Optimize hash inner join by swapping inputs based on row count comparison. Add a planner option to enable/disable this feature.

2015-03-02 Thread jni
Repository: drill
Updated Branches:
  refs/heads/master 9c0738d94 - 3442215fd


DRILL-2236: Optimize hash inner join by swapping inputs based on row count 
comparison. Add a planner option to enable/disable this feature.

Revise code based on review comments.


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/3442215f
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/3442215f
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/3442215f

Branch: refs/heads/master
Commit: 3442215fd91e700f659bc055cd7c05b623bc59b3
Parents: 9c0738d
Author: Jinfeng Ni j...@maprtech.com
Authored: Thu Jan 29 13:24:28 2015 -0800
Committer: Jinfeng Ni j...@maprtech.com
Committed: Mon Mar 2 10:03:31 2015 -0800

--
 .../exec/planner/physical/HashJoinPrel.java | 54 +
 .../drill/exec/planner/physical/JoinPrel.java   |  4 +-
 .../exec/planner/physical/MergeJoinPrel.java|  2 +-
 .../exec/planner/physical/PlannerSettings.java  | 11 +++
 .../physical/explain/NumberingRelWriter.java|  7 ++
 .../physical/visitor/SwapHashJoinVisitor.java   | 79 
 .../planner/sql/handlers/DefaultSqlHandler.java | 13 +++-
 .../server/options/SystemOptionManager.java |  2 +
 8 files changed, 154 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/drill/blob/3442215f/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
--
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
index a3c42de..f63057f 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
@@ -20,6 +20,7 @@ package org.apache.drill.exec.planner.physical;
 import java.io.IOException;
 import java.util.List;
 
+import net.hydromatic.optiq.runtime.FlatLists;
 import org.apache.drill.common.expression.FieldReference;
 import org.apache.drill.common.logical.data.JoinCondition;
 import org.apache.drill.exec.ExecConstants;
@@ -46,18 +47,24 @@ import com.google.common.collect.Lists;
 
 public class HashJoinPrel  extends JoinPrel {
 
+  private boolean swapped = false;
+
   public HashJoinPrel(RelOptCluster cluster, RelTraitSet traits, RelNode left, 
RelNode right, RexNode condition,
-  JoinRelType joinType) throws InvalidRelException {
-super(cluster, traits, left, right, condition, joinType);
+  JoinRelType joinType) throws InvalidRelException {
+this(cluster, traits, left, right, condition, joinType, false);
+  }
 
+  public HashJoinPrel(RelOptCluster cluster, RelTraitSet traits, RelNode left, 
RelNode right, RexNode condition,
+  JoinRelType joinType, boolean swapped) throws InvalidRelException {
+super(cluster, traits, left, right, condition, joinType);
+this.swapped = swapped;
 RelOptUtil.splitJoinCondition(left, right, condition, leftKeys, rightKeys);
   }
 
-
   @Override
   public JoinRelBase copy(RelTraitSet traitSet, RexNode conditionExpr, RelNode 
left, RelNode right, JoinRelType joinType, boolean semiJoinDone) {
 try {
-  return new HashJoinPrel(this.getCluster(), traitSet, left, right, 
conditionExpr, joinType);
+  return new HashJoinPrel(this.getCluster(), traitSet, left, right, 
conditionExpr, joinType, this.swapped);
 }catch (InvalidRelException e) {
   throw new AssertionError(e);
 }
@@ -100,11 +107,32 @@ public class HashJoinPrel  extends JoinPrel {
 
   @Override
   public PhysicalOperator getPhysicalOperator(PhysicalPlanCreator creator) 
throws IOException {
+// Depending on whether the left/right is swapped for hash inner join, 
pass in different
+// combinations of parameters.
+if (! swapped) {
+  return getHashJoinPop(creator, left, right, leftKeys, rightKeys);
+} else {
+  return getHashJoinPop(creator, right, left, rightKeys, leftKeys);
+}
+  }
+
+  @Override
+  public SelectionVectorMode[] getSupportedEncodings() {
+return SelectionVectorMode.DEFAULT;
+  }
+
+  @Override
+  public SelectionVectorMode getEncoding() {
+return SelectionVectorMode.NONE;
+  }
+
+  private PhysicalOperator getHashJoinPop(PhysicalPlanCreator creator, RelNode 
left, RelNode right,
+  ListInteger leftKeys, 
ListInteger rightKeys) throws IOException{
 final ListString fields = getRowType().getFieldNames();
 assert isUnique(fields);
-final int leftCount = left.getRowType().getFieldCount();
-final ListString leftFields = fields.subList(0, leftCount);
-final ListString rightFields =