Merge branch 'gh-pages-master' into gh-pages
Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/fbc18c48 Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/fbc18c48 Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/fbc18c48 Branch: refs/heads/gh-pages Commit: fbc18c480ffd6a2ccb878a4beb3584c8d3d0b64e Parents: 2856ae4 feaa579 Author: Bridget Bevens <bbev...@maprtech.com> Authored: Tue Mar 17 14:02:01 2015 -0700 Committer: Bridget Bevens <bbev...@maprtech.com> Committed: Tue Mar 17 14:02:01 2015 -0700 ---------------------------------------------------------------------- _docs/005-connect.md | 27 +- _docs/008-sql-ref.md | 4 +- _docs/009-datasources.md | 25 + _docs/009-dev-custom-func.md | 37 -- _docs/010-dev-custom-func.md | 37 ++ _docs/010-manage.md | 14 - _docs/011-develop.md | 9 - _docs/011-manage.md | 14 + _docs/012-develop.md | 9 + _docs/012-rn.md | 191 ------- _docs/013-contribute.md | 9 - _docs/013-rn.md | 191 +++++++ _docs/014-contribute.md | 9 + _docs/014-sample-ds.md | 10 - _docs/015-design.md | 13 - _docs/015-sample-ds.md | 10 + _docs/016-design.md | 13 + _docs/016-progress.md | 8 - _docs/018-bylaws.md | 170 ------- _docs/018-progress.md | 8 + _docs/019-bylaws.md | 170 +++++++ _docs/connect/001-plugin-reg.md | 43 +- _docs/connect/002-plugin-conf.md | 130 +++++ _docs/connect/002-workspaces.md | 74 --- _docs/connect/003-reg-fs.md | 64 --- _docs/connect/003-workspaces.md | 74 +++ _docs/connect/004-reg-fs.md | 64 +++ _docs/connect/004-reg-hbase.md | 32 -- _docs/connect/005-reg-hbase.md | 34 ++ _docs/connect/005-reg-hive.md | 83 --- _docs/connect/006-default-frmt.md | 60 --- _docs/connect/006-reg-hive.md | 82 +++ _docs/connect/007-default-frmt.md | 69 +++ _docs/connect/007-mongo-plugin.md | 167 ------ _docs/connect/008-mapr-db-plugin.md | 31 -- _docs/connect/008-mongo-plugin.md | 167 ++++++ _docs/connect/009-mapr-db-plugin.md | 30 ++ _docs/contribute/001-guidelines.md | 3 +- _docs/data-sources/001-hive-types.md | 180 +++++++ _docs/data-sources/002-hive-udf.md | 40 ++ _docs/data-sources/003-parquet-ref.md | 269 ++++++++++ _docs/data-sources/004-json-ref.md | 504 +++++++++++++++++++ _docs/img/Hbase_Browse.png | Bin 147495 -> 148451 bytes _docs/img/StoragePluginConfig.png | Bin 20403 -> 0 bytes _docs/img/Untitled.png | Bin 39796 -> 0 bytes _docs/img/connect-plugin.png | Bin 0 -> 24774 bytes _docs/img/data-sources-schemachg.png | Bin 0 -> 8071 bytes _docs/img/datasources-json-bracket.png | Bin 0 -> 30129 bytes _docs/img/datasources-json.png | Bin 0 -> 16364 bytes _docs/img/get2kno_plugin.png | Bin 0 -> 55794 bytes _docs/img/json-workaround.png | Bin 0 -> 27547 bytes _docs/img/plugin-default.png | Bin 0 -> 56412 bytes _docs/install/001-drill-in-10.md | 2 +- _docs/interfaces/001-odbc-win.md | 3 +- .../interfaces/odbc-win/003-connect-odbc-win.md | 2 +- .../interfaces/odbc-win/004-tableau-examples.md | 6 +- _docs/manage/002-start-stop.md | 2 +- _docs/manage/003-ports.md | 2 +- _docs/manage/conf/002-startup-opt.md | 3 +- _docs/manage/conf/003-plan-exec.md | 3 +- _docs/manage/conf/004-persist-conf.md | 2 +- _docs/query/001-get-started.md | 75 +++ _docs/query/001-query-fs.md | 35 -- _docs/query/002-query-fs.md | 35 ++ _docs/query/002-query-hbase.md | 151 ------ _docs/query/003-query-complex.md | 56 --- _docs/query/003-query-hbase.md | 151 ++++++ _docs/query/004-query-complex.md | 56 +++ _docs/query/004-query-hive.md | 45 -- _docs/query/005-query-hive.md | 45 ++ _docs/query/005-query-info-skema.md | 109 ---- _docs/query/006-query-info-skema.md | 109 ++++ _docs/query/006-query-sys-tbl.md | 159 ------ _docs/query/007-query-sys-tbl.md | 159 ++++++ _docs/query/get-started/001-lesson1-connect.md | 88 ++++ _docs/query/get-started/002-lesson2-download.md | 103 ++++ _docs/query/get-started/003-lesson3-plugin.md | 142 ++++++ _docs/sql-ref/001-data-types.md | 215 +++++--- _docs/sql-ref/002-lexical-structure.md | 145 ++++++ _docs/sql-ref/002-operators.md | 70 --- _docs/sql-ref/003-functions.md | 185 ------- _docs/sql-ref/003-operators.md | 70 +++ _docs/sql-ref/004-functions.md | 186 +++++++ _docs/sql-ref/004-nest-functions.md | 10 - _docs/sql-ref/005-cmd-summary.md | 9 - _docs/sql-ref/005-nest-functions.md | 10 + _docs/sql-ref/006-cmd-summary.md | 9 + _docs/sql-ref/006-reserved-wds.md | 16 - _docs/sql-ref/007-reserved-wds.md | 16 + _docs/sql-ref/data-types/001-date.md | 206 ++++---- .../data-types/002-disparate-data-types.md | 321 ++++++++++++ _docs/tutorial/002-get2kno-sb.md | 241 +++------ _docs/tutorial/003-lesson1.md | 44 +- _docs/tutorial/005-lesson3.md | 98 ++-- .../install-sandbox/001-install-mapr-vm.md | 2 +- .../install-sandbox/002-install-mapr-vb.md | 2 +- 96 files changed, 4268 insertions(+), 2308 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/drill/blob/fbc18c48/_docs/connect/006-reg-hive.md ---------------------------------------------------------------------- diff --cc _docs/connect/006-reg-hive.md index 0000000,cf9b72a..dfb03dc mode 000000,100644..100644 --- a/_docs/connect/006-reg-hive.md +++ b/_docs/connect/006-reg-hive.md @@@ -1,0 -1,82 +1,82 @@@ + --- + title: "Hive Storage Plugin" + parent: "Storage Plugin Configuration" + --- + You can register a storage plugin instance that connects Drill to a Hive data + source that has a remote or embedded metastore service. When you register a + storage plugin instance for a Hive data source, provide a unique name for the + instance, and identify the type as â`hive`â. You must also provide the + metastore connection information. + + Drill supports Hive 1.0. To access Hive tables + using custom SerDes or InputFormat/OutputFormat, all nodes running Drillbits + must have the SerDes or InputFormat/OutputFormat `JAR` files in the + `<drill_installation_directory>/jars/3rdparty` folder. + + ## Hive Remote Metastore + + In this configuration, the Hive metastore runs as a separate service outside + of Hive. Drill communicates with the Hive metastore through Thrift. The + metastore service communicates with the Hive database over JDBC. Point Drill + to the Hive metastore service address, and provide the connection parameters + in the Drill Web UI to configure a connection to Drill. + + **Note:** Verify that the Hive metastore service is running before you register the Hive metastore. + + To register a remote Hive metastore with Drill, complete the following steps: + + 1. Issue the following command to start the Hive metastore service on the system specified in the `hive.metastore.uris`: + + hive --service metastore + 2. Navigate to [http://localhost:8047](http://localhost:8047/), and select the **Storage** tab. + 3. In the disabled storage plugins section, click **Update** next to the `hive` instance. + 4. In the configuration window, add the `Thrift URI` and port to `hive.metastore.uris`. + + **Example** + + { + "type": "hive", + "enabled": true, + "configProps": { + "hive.metastore.uris": "thrift://<localhost>:<port>", + "hive.metastore.sasl.enabled": "false" + } + } + 5. Click **Enable**. + 6. Verify that `HADOOP_CLASSPATH` is set in `drill-env.sh`. If you need to set the classpath, add the following line to `drill-env.sh`. + + Once you have configured a storage plugin instance for a Hive data source, you + can [query Hive tables](/docs/querying-hive/). + + ## Hive Embedded Metastore + + In this configuration, the Hive metastore is embedded within the Drill + process. Provide the metastore database configuration settings in the Drill + Web UI. Before you register Hive, verify that the driver you use to connect to + the Hive metastore is in the Drill classpath located in `/<drill installation + dirctory>/lib/.` If the driver is not there, copy the driver to `/<drill + installation directory>/lib` on the Drill node. For more information about + storage types and configurations, refer to ["Hive Metastore Administration"](https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin). + + To register an embedded Hive metastore with Drill, complete the following + steps: + + 1. Navigate to `[http://localhost:8047](http://localhost:8047/)`, and select the **Storage** tab + 2. In the disabled storage plugins section, click **Update** next to `hive` instance. + 3. In the configuration window, add the database configuration settings. + + **Example** + + { + "type": "hive", + "enabled": true, + "configProps": { + "javax.jdo.option.ConnectionURL": "jdbc:<database>://<host:port>/<metastore database>;create=true", + "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh", + "fs.default.name": "file:///", + } + } + 4. Click** Enable.** + 5. Verify that `HADOOP_CLASSPATH` is set in `drill-env.sh`. If you need to set the classpath, add the following line to `drill-env.sh`. + - export HADOOP_CLASSPATH=/<directory path>/hadoop/hadoop-<version-number> ++ export HADOOP_CLASSPATH=/<directory path>/hadoop/hadoop-<version-number> http://git-wip-us.apache.org/repos/asf/drill/blob/fbc18c48/_docs/connect/007-default-frmt.md ---------------------------------------------------------------------- diff --cc _docs/connect/007-default-frmt.md index 0000000,9325bdb..fc10c16 mode 000000,100644..100644 --- a/_docs/connect/007-default-frmt.md +++ b/_docs/connect/007-default-frmt.md @@@ -1,0 -1,69 +1,69 @@@ + --- + title: "Drill Default Input Format" + parent: "Storage Plugin Configuration" + --- + You can define a default input format to tell Drill what file type exists in a + workspace within a file system. Drill determines the file type based on file + extensions and magic numbers when searching a workspace. + + Magic numbers are file signatures that Drill uses to identify Parquet files. + If Drill cannot identify the file type based on file extensions or magic + numbers, the query fails. Defining a default input format can prevent queries + from failing in situations where Drill cannot determine the file type. + + If you incorrectly define the file type in a workspace and Drill cannot + determine the file type, the query fails. For example, if the directory for + which you have defined a workspace contains JSON files and you defined the + default input format as CSV, the query fails against the workspace. + + You can define one default input format per workspace. If you do not define a + default input format, and Drill cannot detect the file format, the query + fails. You can define a default input format for any of the file types that + Drill supports. Currently, Drill supports the following types: + + * CSV + * TSV + * PSV + * Parquet + * JSON + + ## Defining a Default Input Format + + You define the default input format for a file system workspace through the + Drill Web UI. You must have a [defined workspace](/docs/workspaces) before you can define a + default input format. + + To define a default input format for a workspace, complete the following + steps: + + 1. Navigate to the Drill Web UI at `<drill_node_ip_address>:8047`. The Drillbit process must be running on the node before you connect to the Drill Web UI. + 2. Select **Storage** in the toolbar. + 3. Click **Update** next to the file system for which you want to define a default input format for a workspace. + 4. In the Configuration area, locate the workspace for which you would like to define the default input format, and change the `defaultInputFormat` attribute to any of the supported file types. + + **Example** + + { + "type": "file", + "enabled": true, + "connection": "hdfs:///", + "workspaces": { + "root": { + "location": "/drill/testdata", + "writable": false, + "defaultInputFormat": csv + }, + "local" : { + "location" : "/max/proddata", + "writable" : true, + "defaultInputFormat" : "json" + } + + ## Querying Compressed JSON + + You can use Drill 0.8 and later to query compressed JSON in .gz files as well as uncompressed files having the .json extension. First, add the gz extension to a storage plugin, and then use that plugin to query the compressed file. + + "extensions": [ + "json", + "gz" - ] ++ ] http://git-wip-us.apache.org/repos/asf/drill/blob/fbc18c48/_docs/manage/conf/002-startup-opt.md ---------------------------------------------------------------------- diff --cc _docs/manage/conf/002-startup-opt.md index 898a7ba,d1766fb..e0b64bf --- a/_docs/manage/conf/002-startup-opt.md +++ b/_docs/manage/conf/002-startup-opt.md @@@ -46,5 -46,5 +46,4 @@@ override.conf` file located in Drillâ You may want to configure the following start-up options that control certain behaviors in Drill: - <table ><tbody><tr><th >Option</th><th >Default Value</th><th >Description</th></tr><tr><td valign="top" >drill.exec.sys.store.provider</td><td valign="top" >ZooKeeper</td><td valign="top" >Defines the persistent storage (PStore) provider. The PStore holds configuration and profile data. For more information about PStores, see <a href="/docs/persistent-configuration-storage" rel="nofollow">Persistent Configuration Storage</a>.</td></tr><tr><td valign="top" >drill.exec.buffer.size</td><td valign="top" > </td><td valign="top" >Defines the amount of memory available, in terms of record batches, to hold data on the downstream side of an operation. Drill pushes data downstream as quickly as possible to make data immediately available. This requires Drill to use memory to hold the data pending operations. When data on a downstream operation is required, that data is immediately available so Drill does not have to go over the network to process it. Providing more memory to this option inc reases the speed at which Drill completes a query.</td></tr><tr><td valign="top" >drill.exec.sort.external.directoriesdrill.exec.sort.external.fs</td><td valign="top" > </td><td valign="top" >These options control spooling. The drill.exec.sort.external.directories option tells Drill which directory to use when spooling. The drill.exec.sort.external.fs option tells Drill which file system to use when spooling beyond memory files. <span style="line-height: 1.4285715;background-color: transparent;"> </span>Drill uses a spool and sort operation for beyond memory operations. The sorting operation is designed to spool to a Hadoop file system. The default Hadoop file system is a local file system in the /tmp directory. Spooling performance (both writing and reading back from it) is constrained by the file system. <span style="line-height: 1.4285715;background-color: transparent;"> </span>For MapR clusters, use MapReduce volumes or set up local volumes to use for spooling purposes. Volumes improve performance and stripe data across as many disks as possible.</td></tr><tr><td valign="top" colspan="1" >drill.exec.debug.error_on_leak</td><td valign="top" colspan="1" >True</td><td valign="top" colspan="1" >Determines how Drill behaves when memory leaks occur during a query. By default, this option is enabled so that queries fail when memory leaks occur. If you disable the option, Drill issues a warning when a memory leak occurs and completes the query.</td></tr><tr><td valign="top" colspan="1" >drill.exec.zk.connect</td><td valign="top" colspan="1" >localhost:2181</td><td valign="top" colspan="1" >Provides Drill with the ZooKeeper quorum to use to connect to data sources. Change this setting to point to the ZooKeeper quorum that you want Drill to use. You must configure this option on each Drillbit node.</td></tr><tr><td valign="top" colspan="1" >drill.exec.cluster-id</td><td valign="top" colspan="1" >my_drillbit_cluster</td><td valign="top" colspan="1" >Identifies the cl uster that corresponds with the ZooKeeper quorum indicated. It also provides Drill with the name of the cluster used during UDP multicast. You must change the default cluster-id if there are multiple clusters on the same subnet. If you do not change the ID, the clusters will try to connect to each other to create one cluster.</td></tr></tbody></table> - + <table ><tbody><tr><th >Option</th><th >Default Value</th><th >Description</th></tr><tr><td valign="top" >drill.exec.sys.store.provider</td><td valign="top" >ZooKeeper</td><td valign="top" >Defines the persistent storage (PStore) provider. The PStore holds configuration and profile data. For more information about PStores, see <a href="/docs/persistent-configuration-storage" rel="nofollow">Persistent Configuration Storage</a>.</td></tr><tr><td valign="top" >drill.exec.buffer.size</td><td valign="top" > </td><td valign="top" >Defines the amount of memory available, in terms of record batches, to hold data on the downstream side of an operation. Drill pushes data downstream as quickly as possible to make data immediately available. This requires Drill to use memory to hold the data pending operations. When data on a downstream operation is required, that data is immediately available so Drill does not have to go over the network to process it. Providing more memory to this option inc reases the speed at which Drill completes a query.</td></tr><tr><td valign="top" >drill.exec.sort.external.directoriesdrill.exec.sort.external.fs</td><td valign="top" > </td><td valign="top" >These options control spooling. The drill.exec.sort.external.directories option tells Drill which directory to use when spooling. The drill.exec.sort.external.fs option tells Drill which file system to use when spooling beyond memory files. <span style="line-height: 1.4285715;background-color: transparent;"> </span>Drill uses a spool and sort operation for beyond memory operations. The sorting operation is designed to spool to a Hadoop file system. The default Hadoop file system is a local file system in the /tmp directory. Spooling performance (both writing and reading back from it) is constrained by the file system. <span style="line-height: 1.4285715;background-color: transparent;"> </span>For MapR clusters, use MapReduce volumes or set up local volumes to use for spooling purposes. Volumes improve performance and stripe data across as many disks as possible.</td></tr><tr><td valign="top" colspan="1" >drill.exec.debug.error_on_leak</td><td valign="top" colspan="1" >True</td><td valign="top" colspan="1" >Determines how Drill behaves when memory leaks occur during a query. By default, this option is enabled so that queries fail when memory leaks occur. If you disable the option, Drill issues a warning when a memory leak occurs and completes the query.</td></tr><tr><td valign="top" colspan="1" >drill.exec.zk.connect</td><td valign="top" colspan="1" >localhost:2181</td><td valign="top" colspan="1" >Provides Drill with the ZooKeeper quorum to use to connect to data sources. Change this setting to point to the ZooKeeper quorum that you want Drill to use. You must configure this option on each Drillbit node.</td></tr><tr><td valign="top" colspan="1" >drill.exec.cluster-id</td><td valign="top" colspan="1" >my_drillbit_cluster</td><td valign="top" colspan="1" >Identifies the cl uster that corresponds with the ZooKeeper quorum indicated. It also provides Drill with the name of the cluster used during UDP multicast. You must change the default cluster-id if there are multiple clusters on the same subnet. If you do not change the ID, the clusters will try to connect to each other to create one cluster.</td></tr></tbody></table></div> -