[12/12] drill git commit: Merge branch 'gh-pages-master' into gh-pages

bridgetb Tue, 17 Mar 2015 14:03:18 -0700

Merge branch 'gh-pages-master' into gh-pages


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/fbc18c48
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/fbc18c48
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/fbc18c48

Branch: refs/heads/gh-pages
Commit: fbc18c480ffd6a2ccb878a4beb3584c8d3d0b64e
Parents: 2856ae4 feaa579
Author: Bridget Bevens <bbev...@maprtech.com>
Authored: Tue Mar 17 14:02:01 2015 -0700
Committer: Bridget Bevens <bbev...@maprtech.com>
Committed: Tue Mar 17 14:02:01 2015 -0700

----------------------------------------------------------------------
 _docs/005-connect.md                            |  27 +-
 _docs/008-sql-ref.md                            |   4 +-
 _docs/009-datasources.md                        |  25 +
 _docs/009-dev-custom-func.md                    |  37 --
 _docs/010-dev-custom-func.md                    |  37 ++
 _docs/010-manage.md                             |  14 -
 _docs/011-develop.md                            |   9 -
 _docs/011-manage.md                             |  14 +
 _docs/012-develop.md                            |   9 +
 _docs/012-rn.md                                 | 191 -------
 _docs/013-contribute.md                         |   9 -
 _docs/013-rn.md                                 | 191 +++++++
 _docs/014-contribute.md                         |   9 +
 _docs/014-sample-ds.md                          |  10 -
 _docs/015-design.md                             |  13 -
 _docs/015-sample-ds.md                          |  10 +
 _docs/016-design.md                             |  13 +
 _docs/016-progress.md                           |   8 -
 _docs/018-bylaws.md                             | 170 -------
 _docs/018-progress.md                           |   8 +
 _docs/019-bylaws.md                             | 170 +++++++
 _docs/connect/001-plugin-reg.md                 |  43 +-
 _docs/connect/002-plugin-conf.md                | 130 +++++
 _docs/connect/002-workspaces.md                 |  74 ---
 _docs/connect/003-reg-fs.md                     |  64 ---
 _docs/connect/003-workspaces.md                 |  74 +++
 _docs/connect/004-reg-fs.md                     |  64 +++
 _docs/connect/004-reg-hbase.md                  |  32 --
 _docs/connect/005-reg-hbase.md                  |  34 ++
 _docs/connect/005-reg-hive.md                   |  83 ---
 _docs/connect/006-default-frmt.md               |  60 ---
 _docs/connect/006-reg-hive.md                   |  82 +++
 _docs/connect/007-default-frmt.md               |  69 +++
 _docs/connect/007-mongo-plugin.md               | 167 ------
 _docs/connect/008-mapr-db-plugin.md             |  31 --
 _docs/connect/008-mongo-plugin.md               | 167 ++++++
 _docs/connect/009-mapr-db-plugin.md             |  30 ++
 _docs/contribute/001-guidelines.md              |   3 +-
 _docs/data-sources/001-hive-types.md            | 180 +++++++
 _docs/data-sources/002-hive-udf.md              |  40 ++
 _docs/data-sources/003-parquet-ref.md           | 269 ++++++++++
 _docs/data-sources/004-json-ref.md              | 504 +++++++++++++++++++
 _docs/img/Hbase_Browse.png                      | Bin 147495 -> 148451 bytes
 _docs/img/StoragePluginConfig.png               | Bin 20403 -> 0 bytes
 _docs/img/Untitled.png                          | Bin 39796 -> 0 bytes
 _docs/img/connect-plugin.png                    | Bin 0 -> 24774 bytes
 _docs/img/data-sources-schemachg.png            | Bin 0 -> 8071 bytes
 _docs/img/datasources-json-bracket.png          | Bin 0 -> 30129 bytes
 _docs/img/datasources-json.png                  | Bin 0 -> 16364 bytes
 _docs/img/get2kno_plugin.png                    | Bin 0 -> 55794 bytes
 _docs/img/json-workaround.png                   | Bin 0 -> 27547 bytes
 _docs/img/plugin-default.png                    | Bin 0 -> 56412 bytes
 _docs/install/001-drill-in-10.md                |   2 +-
 _docs/interfaces/001-odbc-win.md                |   3 +-
 .../interfaces/odbc-win/003-connect-odbc-win.md |   2 +-
 .../interfaces/odbc-win/004-tableau-examples.md |   6 +-
 _docs/manage/002-start-stop.md                  |   2 +-
 _docs/manage/003-ports.md                       |   2 +-
 _docs/manage/conf/002-startup-opt.md            |   3 +-
 _docs/manage/conf/003-plan-exec.md              |   3 +-
 _docs/manage/conf/004-persist-conf.md           |   2 +-
 _docs/query/001-get-started.md                  |  75 +++
 _docs/query/001-query-fs.md                     |  35 --
 _docs/query/002-query-fs.md                     |  35 ++
 _docs/query/002-query-hbase.md                  | 151 ------
 _docs/query/003-query-complex.md                |  56 ---
 _docs/query/003-query-hbase.md                  | 151 ++++++
 _docs/query/004-query-complex.md                |  56 +++
 _docs/query/004-query-hive.md                   |  45 --
 _docs/query/005-query-hive.md                   |  45 ++
 _docs/query/005-query-info-skema.md             | 109 ----
 _docs/query/006-query-info-skema.md             | 109 ++++
 _docs/query/006-query-sys-tbl.md                | 159 ------
 _docs/query/007-query-sys-tbl.md                | 159 ++++++
 _docs/query/get-started/001-lesson1-connect.md  |  88 ++++
 _docs/query/get-started/002-lesson2-download.md | 103 ++++
 _docs/query/get-started/003-lesson3-plugin.md   | 142 ++++++
 _docs/sql-ref/001-data-types.md                 | 215 +++++---
 _docs/sql-ref/002-lexical-structure.md          | 145 ++++++
 _docs/sql-ref/002-operators.md                  |  70 ---
 _docs/sql-ref/003-functions.md                  | 185 -------
 _docs/sql-ref/003-operators.md                  |  70 +++
 _docs/sql-ref/004-functions.md                  | 186 +++++++
 _docs/sql-ref/004-nest-functions.md             |  10 -
 _docs/sql-ref/005-cmd-summary.md                |   9 -
 _docs/sql-ref/005-nest-functions.md             |  10 +
 _docs/sql-ref/006-cmd-summary.md                |   9 +
 _docs/sql-ref/006-reserved-wds.md               |  16 -
 _docs/sql-ref/007-reserved-wds.md               |  16 +
 _docs/sql-ref/data-types/001-date.md            | 206 ++++----
 .../data-types/002-disparate-data-types.md      | 321 ++++++++++++
 _docs/tutorial/002-get2kno-sb.md                | 241 +++------
 _docs/tutorial/003-lesson1.md                   |  44 +-
 _docs/tutorial/005-lesson3.md                   |  98 ++--
 .../install-sandbox/001-install-mapr-vm.md      |   2 +-
 .../install-sandbox/002-install-mapr-vb.md      |   2 +-
 96 files changed, 4268 insertions(+), 2308 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/fbc18c48/_docs/connect/006-reg-hive.md
----------------------------------------------------------------------
diff --cc _docs/connect/006-reg-hive.md
index 0000000,cf9b72a..dfb03dc
mode 000000,100644..100644
--- a/_docs/connect/006-reg-hive.md
+++ b/_docs/connect/006-reg-hive.md
@@@ -1,0 -1,82 +1,82 @@@
+ ---
+ title: "Hive Storage Plugin"
+ parent: "Storage Plugin Configuration"
+ ---
+ You can register a storage plugin instance that connects Drill to a Hive data
+ source that has a remote or embedded metastore service. When you register a
+ storage plugin instance for a Hive data source, provide a unique name for the
+ instance, and identify the type as â`hive`â. You must also provide the
+ metastore connection information.
+ 
+ Drill supports Hive 1.0. To access Hive tables
+ using custom SerDes or InputFormat/OutputFormat, all nodes running Drillbits
+ must have the SerDes or InputFormat/OutputFormat `JAR` files in the 
+ `<drill_installation_directory>/jars/3rdparty` folder.
+ 
+ ## Hive Remote Metastore
+ 
+ In this configuration, the Hive metastore runs as a separate service outside
+ of Hive. Drill communicates with the Hive metastore through Thrift. The
+ metastore service communicates with the Hive database over JDBC. Point Drill
+ to the Hive metastore service address, and provide the connection parameters
+ in the Drill Web UI to configure a connection to Drill.
+ 
+ **Note:** Verify that the Hive metastore service is running before you 
register the Hive metastore.
+ 
+ To register a remote Hive metastore with Drill, complete the following steps:
+ 
+   1. Issue the following command to start the Hive metastore service on the 
system specified in the `hive.metastore.uris`:
+ 
+         hive --service metastore
+   2. Navigate to [http://localhost:8047](http://localhost:8047/), and select 
the **Storage** tab.
+   3. In the disabled storage plugins section, click **Update** next to the 
`hive` instance.
+   4. In the configuration window, add the `Thrift URI` and port to 
`hive.metastore.uris`.
+ 
+      **Example**
+      
+         {
+           "type": "hive",
+           "enabled": true,
+           "configProps": {
+             "hive.metastore.uris": "thrift://<localhost>:<port>",  
+             "hive.metastore.sasl.enabled": "false"
+           }
+         }       
+   5. Click **Enable**.
+   6. Verify that `HADOOP_CLASSPATH` is set in `drill-env.sh`. If you need to 
set the classpath, add the following line to `drill-env.sh`.
+ 
+ Once you have configured a storage plugin instance for a Hive data source, you
+ can [query Hive tables](/docs/querying-hive/).
+ 
+ ## Hive Embedded Metastore
+ 
+ In this configuration, the Hive metastore is embedded within the Drill
+ process. Provide the metastore database configuration settings in the Drill
+ Web UI. Before you register Hive, verify that the driver you use to connect to
+ the Hive metastore is in the Drill classpath located in `/<drill installation
+ dirctory>/lib/.` If the driver is not there, copy the driver to `/<drill
+ installation directory>/lib` on the Drill node. For more information about
+ storage types and configurations, refer to ["Hive Metastore 
Administration"](https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin).
+ 
+ To register an embedded Hive metastore with Drill, complete the following
+ steps:
+ 
+   1. Navigate to `[http://localhost:8047](http://localhost:8047/)`, and 
select the **Storage** tab
+   2. In the disabled storage plugins section, click **Update** next to `hive` 
instance.
+   3. In the configuration window, add the database configuration settings.
+ 
+      **Example**
+      
+         {
+           "type": "hive",
+           "enabled": true,
+           "configProps": {
+             "javax.jdo.option.ConnectionURL": 
"jdbc:<database>://<host:port>/<metastore database>;create=true",
+             "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
+             "fs.default.name": "file:///",   
+           }
+         }
+   4. Click** Enable.**
+   5. Verify that `HADOOP_CLASSPATH` is set in `drill-env.sh`. If you need to 
set the classpath, add the following line to `drill-env.sh`.
+   
 -        export HADOOP_CLASSPATH=/<directory 
path>/hadoop/hadoop-<version-number>
++        export HADOOP_CLASSPATH=/<directory 
path>/hadoop/hadoop-<version-number>

http://git-wip-us.apache.org/repos/asf/drill/blob/fbc18c48/_docs/connect/007-default-frmt.md
----------------------------------------------------------------------
diff --cc _docs/connect/007-default-frmt.md
index 0000000,9325bdb..fc10c16
mode 000000,100644..100644
--- a/_docs/connect/007-default-frmt.md
+++ b/_docs/connect/007-default-frmt.md
@@@ -1,0 -1,69 +1,69 @@@
+ ---
+ title: "Drill Default Input Format"
+ parent: "Storage Plugin Configuration"
+ ---
+ You can define a default input format to tell Drill what file type exists in a
+ workspace within a file system. Drill determines the file type based on file
+ extensions and magic numbers when searching a workspace.
+ 
+ Magic numbers are file signatures that Drill uses to identify Parquet files.
+ If Drill cannot identify the file type based on file extensions or magic
+ numbers, the query fails. Defining a default input format can prevent queries
+ from failing in situations where Drill cannot determine the file type.
+ 
+ If you incorrectly define the file type in a workspace and Drill cannot
+ determine the file type, the query fails. For example, if the directory for
+ which you have defined a workspace contains JSON files and you defined the
+ default input format as CSV, the query fails against the workspace.
+ 
+ You can define one default input format per workspace. If you do not define a
+ default input format, and Drill cannot detect the file format, the query
+ fails. You can define a default input format for any of the file types that
+ Drill supports. Currently, Drill supports the following types:
+ 
+   * CSV
+   * TSV
+   * PSV
+   * Parquet
+   * JSON
+ 
+ ## Defining a Default Input Format
+ 
+ You define the default input format for a file system workspace through the
+ Drill Web UI. You must have a [defined workspace](/docs/workspaces) before 
you can define a
+ default input format.
+ 
+ To define a default input format for a workspace, complete the following
+ steps:
+ 
+   1. Navigate to the Drill Web UI at `<drill_node_ip_address>:8047`. The 
Drillbit process must be running on the node before you connect to the Drill 
Web UI.
+   2. Select **Storage** in the toolbar.
+   3. Click **Update** next to the file system for which you want to define a 
default input format for a workspace.
+   4. In the Configuration area, locate the workspace for which you would like 
to define the default input format, and change the `defaultInputFormat` 
attribute to any of the supported file types.
+ 
+      **Example**
+      
+         {
+           "type": "file",
+           "enabled": true,
+           "connection": "hdfs:///",
+           "workspaces": {
+             "root": {
+               "location": "/drill/testdata",
+               "writable": false,
+               "defaultInputFormat": csv
+           },
+           "local" : {
+             "location" : "/max/proddata",
+             "writable" : true,
+             "defaultInputFormat" : "json"
+         }
+ 
+ ## Querying Compressed JSON
+ 
+ You can use Drill 0.8 and later to query compressed JSON in .gz files as well 
as uncompressed files having the .json extension. First, add the gz extension 
to a storage plugin, and then use that plugin to query the compressed file.
+ 
+       "extensions": [
+         "json",
+         "gz"
 -      ]
++      ]

http://git-wip-us.apache.org/repos/asf/drill/blob/fbc18c48/_docs/manage/conf/002-startup-opt.md
----------------------------------------------------------------------
diff --cc _docs/manage/conf/002-startup-opt.md
index 898a7ba,d1766fb..e0b64bf
--- a/_docs/manage/conf/002-startup-opt.md
+++ b/_docs/manage/conf/002-startup-opt.md
@@@ -46,5 -46,5 +46,4 @@@ override.conf` file located in Drillâ
  You may want to configure the following start-up options that control certain
  behaviors in Drill:
  
- <table ><tbody><tr><th >Option</th><th >Default Value</th><th 
>Description</th></tr><tr><td valign="top" 
>drill.exec.sys.store.provider</td><td valign="top" >ZooKeeper</td><td 
valign="top" >Defines the persistent storage (PStore) provider. The PStore 
holds configuration and profile data. For more information about PStores, see 
<a href="/docs/persistent-configuration-storage" rel="nofollow">Persistent 
Configuration Storage</a>.</td></tr><tr><td valign="top" 
>drill.exec.buffer.size</td><td valign="top" > </td><td valign="top" >Defines 
the amount of memory available, in terms of record batches, to hold data on the 
downstream side of an operation. Drill pushes data downstream as quickly as 
possible to make data immediately available. This requires Drill to use memory 
to hold the data pending operations. When data on a downstream operation is 
required, that data is immediately available so Drill does not have to go over 
the network to process it. Providing more memory to this option inc
 reases the speed at which Drill completes a query.</td></tr><tr><td 
valign="top" 
>drill.exec.sort.external.directoriesdrill.exec.sort.external.fs</td><td 
valign="top" > </td><td valign="top" >These options control spooling. The 
drill.exec.sort.external.directories option tells Drill which directory to use 
when spooling. The drill.exec.sort.external.fs option tells Drill which file 
system to use when spooling beyond memory files. <span style="line-height: 
1.4285715;background-color: transparent;"> </span>Drill uses a spool and sort 
operation for beyond memory operations. The sorting operation is designed to 
spool to a Hadoop file system. The default Hadoop file system is a local file 
system in the /tmp directory. Spooling performance (both writing and reading 
back from it) is constrained by the file system. <span style="line-height: 
1.4285715;background-color: transparent;"> </span>For MapR clusters, use 
MapReduce volumes or set up local volumes to use for spooling purposes. Volumes 
 improve performance and stripe data across as many disks as 
possible.</td></tr><tr><td valign="top" colspan="1" 
>drill.exec.debug.error_on_leak</td><td valign="top" colspan="1" >True</td><td 
valign="top" colspan="1" >Determines how Drill behaves when memory leaks occur 
during a query. By default, this option is enabled so that queries fail when 
memory leaks occur. If you disable the option, Drill issues a warning when a 
memory leak occurs and completes the query.</td></tr><tr><td valign="top" 
colspan="1" >drill.exec.zk.connect</td><td valign="top" colspan="1" 
>localhost:2181</td><td valign="top" colspan="1" >Provides Drill with the 
ZooKeeper quorum to use to connect to data sources. Change this setting to 
point to the ZooKeeper quorum that you want Drill to use. You must configure 
this option on each Drillbit node.</td></tr><tr><td valign="top" colspan="1" 
>drill.exec.cluster-id</td><td valign="top" colspan="1" 
>my_drillbit_cluster</td><td valign="top" colspan="1" >Identifies the cl
 uster that corresponds with the ZooKeeper quorum indicated. It also provides 
Drill with the name of the cluster used during UDP multicast. You must change 
the default cluster-id if there are multiple clusters on the same subnet. If 
you do not change the ID, the clusters will try to connect to each other to 
create one cluster.</td></tr></tbody></table>
- 
+ <table ><tbody><tr><th >Option</th><th >Default Value</th><th 
>Description</th></tr><tr><td valign="top" 
>drill.exec.sys.store.provider</td><td valign="top" >ZooKeeper</td><td 
valign="top" >Defines the persistent storage (PStore) provider. The PStore 
holds configuration and profile data. For more information about PStores, see 
<a href="/docs/persistent-configuration-storage" rel="nofollow">Persistent 
Configuration Storage</a>.</td></tr><tr><td valign="top" 
>drill.exec.buffer.size</td><td valign="top" > </td><td valign="top" >Defines 
the amount of memory available, in terms of record batches, to hold data on the 
downstream side of an operation. Drill pushes data downstream as quickly as 
possible to make data immediately available. This requires Drill to use memory 
to hold the data pending operations. When data on a downstream operation is 
required, that data is immediately available so Drill does not have to go over 
the network to process it. Providing more memory to this option inc
 reases the speed at which Drill completes a query.</td></tr><tr><td 
valign="top" 
>drill.exec.sort.external.directoriesdrill.exec.sort.external.fs</td><td 
valign="top" > </td><td valign="top" >These options control spooling. The 
drill.exec.sort.external.directories option tells Drill which directory to use 
when spooling. The drill.exec.sort.external.fs option tells Drill which file 
system to use when spooling beyond memory files. <span style="line-height: 
1.4285715;background-color: transparent;"> </span>Drill uses a spool and sort 
operation for beyond memory operations. The sorting operation is designed to 
spool to a Hadoop file system. The default Hadoop file system is a local file 
system in the /tmp directory. Spooling performance (both writing and reading 
back from it) is constrained by the file system. <span style="line-height: 
1.4285715;background-color: transparent;"> </span>For MapR clusters, use 
MapReduce volumes or set up local volumes to use for spooling purposes. Volumes 
 improve performance and stripe data across as many disks as 
possible.</td></tr><tr><td valign="top" colspan="1" 
>drill.exec.debug.error_on_leak</td><td valign="top" colspan="1" >True</td><td 
valign="top" colspan="1" >Determines how Drill behaves when memory leaks occur 
during a query. By default, this option is enabled so that queries fail when 
memory leaks occur. If you disable the option, Drill issues a warning when a 
memory leak occurs and completes the query.</td></tr><tr><td valign="top" 
colspan="1" >drill.exec.zk.connect</td><td valign="top" colspan="1" 
>localhost:2181</td><td valign="top" colspan="1" >Provides Drill with the 
ZooKeeper quorum to use to connect to data sources. Change this setting to 
point to the ZooKeeper quorum that you want Drill to use. You must configure 
this option on each Drillbit node.</td></tr><tr><td valign="top" colspan="1" 
>drill.exec.cluster-id</td><td valign="top" colspan="1" 
>my_drillbit_cluster</td><td valign="top" colspan="1" >Identifies the cl
 uster that corresponds with the ZooKeeper quorum indicated. It also provides 
Drill with the name of the cluster used during UDP multicast. You must change 
the default cluster-id if there are multiple clusters on the same subnet. If 
you do not change the ID, the clusters will try to connect to each other to 
create one cluster.</td></tr></tbody></table></div>
 -

[12/12] drill git commit: Merge branch 'gh-pages-master' into gh-pages

Reply via email to