[3/3] drill git commit: DRILL-2316: Add hive, parquet, json ref docs, basics tutorial, and minor edits
DRILL-2316: Add hive, parquet, json ref docs, basics tutorial, and minor edits Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/2a34ac89 Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/2a34ac89 Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/2a34ac89 Branch: refs/heads/gh-pages-master Commit: 2a34ac8931326f30b34986868f4c4e5ad61fec59 Parents: d959a21 Author: Kristine Hahn kh...@maprtech.com Authored: Wed Feb 25 18:31:56 2015 -0800 Committer: Bridget Bevens bbev...@maprtech.com Committed: Mon Mar 2 14:18:23 2015 -0800 -- _docs/009-datasources.md| 27 ++ _docs/010-dev-custom-func.md| 37 ++ _docs/011-manage.md | 14 + _docs/012-develop.md| 9 + _docs/013-rn.md | 191 _docs/014-contribute.md | 9 + _docs/015-sample-ds.md | 10 + _docs/016-design.md | 13 + _docs/018-progress.md | 8 + _docs/019-bylaws.md | 170 _docs/connect/005-reg-hive.md | 7 +- _docs/connect/007-mongo-plugin.md | 6 +- _docs/data-sources/001-hive-types.md| 188 _docs/data-sources/002-hive-udf.md | 39 ++ _docs/data-sources/003-parquet-ref.md | 287 _docs/data-sources/004-json-ref.md | 432 +++ _docs/dev-custom-fcn/002-dev-aggregate.md | 2 +- _docs/img/Untitled.png | Bin 39796 - 0 bytes _docs/img/json-workaround.png | Bin 0 - 20786 bytes _docs/install/001-drill-in-10.md| 2 +- _docs/interfaces/001-odbc-win.md| 3 +- .../interfaces/odbc-win/003-connect-odbc-win.md | 2 +- .../interfaces/odbc-win/004-tableau-examples.md | 6 +- _docs/manage/002-start-stop.md | 2 +- _docs/manage/003-ports.md | 2 +- _docs/manage/conf/002-startup-opt.md| 2 +- _docs/manage/conf/003-plan-exec.md | 3 +- _docs/manage/conf/004-persist-conf.md | 2 +- _docs/query/001-get-started.md | 75 _docs/query/001-query-fs.md | 35 -- _docs/query/002-query-fs.md | 35 ++ _docs/query/002-query-hbase.md | 151 --- _docs/query/003-query-complex.md| 56 --- _docs/query/003-query-hbase.md | 151 +++ _docs/query/004-query-complex.md| 56 +++ _docs/query/004-query-hive.md | 45 -- _docs/query/005-query-hive.md | 45 ++ _docs/query/005-query-info-skema.md | 109 - _docs/query/006-query-info-skema.md | 109 + _docs/query/006-query-sys-tbl.md| 159 --- _docs/query/007-query-sys-tbl.md| 159 +++ _docs/query/get-started/001-lesson1-connect.md | 88 _docs/query/get-started/002-lesson2-download.md | 103 + _docs/query/get-started/003-lesson3-plugin.md | 142 ++ _docs/sql-ref/003-functions.md | 19 +- _docs/sql-ref/005-cmd-summary.md| 2 +- _docs/sql-ref/006-reserved-wds.md | 2 +- _docs/sql-ref/data-types/001-date.md| 4 +- _docs/tutorial/005-lesson3.md | 2 +- 49 files changed, 2433 insertions(+), 587 deletions(-) -- http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/009-datasources.md -- diff --git a/_docs/009-datasources.md b/_docs/009-datasources.md new file mode 100644 index 000..3f3d431 --- /dev/null +++ b/_docs/009-datasources.md @@ -0,0 +1,27 @@ +--- +title: Data Sources and File Formats +--- +Included in the data sources that Drill supports are these key data sources: + +* Hbase +* Hive +* MapR-DB +* File system + +. . . + +Drill supports the following input formats for data: + +* CSV (Comma-Separated-Values) +* TSV (Tab-Separated-Values) +* PSV (Pipe-Separated-Values) +* Parquet +* JSON + +You set the input format for data coming from data sources to Drill in the workspace portion of the [storage plugin](/drill/docs/storage-plugin-registration) definition. The default input format in Drill is Parquet. + +You change the [sys.options table](/drill/docs/planning-and-execution-options) to set the output format of Drill data. The default storage format for Drill Create Table AS (CTAS) statements is Parquet. + + + + http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/010-dev-custom-func.md
[1/3] drill git commit: DRILL-2316: Add hive, parquet, json ref docs, basics tutorial, and minor edits
Repository: drill Updated Branches: refs/heads/gh-pages-master d959a2100 - 2a34ac893 http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/query/get-started/003-lesson3-plugin.md -- diff --git a/_docs/query/get-started/003-lesson3-plugin.md b/_docs/query/get-started/003-lesson3-plugin.md new file mode 100644 index 000..9aab881 --- /dev/null +++ b/_docs/query/get-started/003-lesson3-plugin.md @@ -0,0 +1,142 @@ +--- +title: Lesson 3: Create a Storage Plugin +parent: Getting Started Tutorial +--- +The Drill default storage plugins support common file formats. If you need +support for some other file format, create a custom storage plugin. You can also create a storage plugin to simplify querying file having long path names. A workspace name replaces the long path name. + +This lesson covers how to create and use a storage plugin to simplify queries. First, +you create the storage plugin in the Drill Web UI. Next, you connect to the +file through the plugin to query a file, and then a directory, and finally you +query multiple files in a directory. + +## Create a Storage Plugin + +You can create a storage plugin using the Apache Drill Web UI. + + 1. Create an `ngram` directory on your file system. + 2. Copy `googlebooks-eng-all-5gram-20120701-zo.tsv` to the `ngram` directory. + 3. Open the Drill Web UI by navigating to http://localhost:8047/storage. + To open the Drill Web UI, SQLLine must still be running. + 4. In New Storage Plugin, type `myplugin`. + ![new plugin]({{ site.baseurl }}/docs/img/ngram_plugin.png) + 5. Click **Create**. + The Configuration screen appears. + 6. Replace null with the following storage plugin definition, except on the location line, use the path to your `ngram` directory instead of the drilluser's path and give your workspace an arbitrary name, for example, ngram: + +{ + type: file, + enabled: true, + connection: file:///, + workspaces: { +ngram: { + location: /Users/drilluser/ngram, + writable: false, + defaultInputFormat: null + } + }, + formats: { + tsv: { + type: text, + extensions: [ + tsv + ], + delimiter: \t +} + } +} + + 7. Click **Create**. + The success message appears briefly. + 8. Click **Back**. + The new plugin appears in Enabled Storage Plugins. + ![new plugin]({{ site.baseurl }}/docs/img/ngram_plugin.png) + 9. Go back to the SQLLine prompt in the CLI, and list the storage plugins. Press RETURN in the CLI to get a prompt if necessary. + +Your custom plugin appears in the list and has two workspaces: the `ngram` +workspace that you defined and a default workspace. + +## Connect to and Query a File + +When querying the same data source repeatedly, avoiding long path names is +important. This exercise demonstrates how to simplify the query. Instead of +using the full path to the Ngram file, you use dot notation in the FROM +clause. + +``workspace name.`location``` + +This syntax assumes you connected to a storage plugin that defines the +location of the data. To query the data source while you are _not_ connected to +that storage plugin, include the plugin name: + +``plugin name.workspace name.`location``` + +This exercise shows how to query Ngram data when you are, and when you are +not, connected to `myplugin`. + + 1. Connect to the ngram file through the custom storage plugin. + `USE myplugin;` + 2. Get data about Zoological Journal of the Linnean that appears more than 250 times a year in the books that Google scans. In the FROM clause, instead of using the full path to the file as you did in the last exercise, connect to the data using the storage plugin workspace name ngram. + + SELECT COLUMNS[0], +COLUMNS[1], +COLUMNS[2] + FROM ngram.`/googlebooks-eng-all-5gram-20120701-zo.tsv` + WHERE ((columns[0] = 'Zoological Journal of the Linnean') + AND (columns[2] 250)) + LIMIT 10; + + The output consists of 5 rows of data. + 3. Switch to the `dfs` storage plugin. + + 0: jdbc:drill:zk=local USE dfs; + +++ + | ok | summary | + +++ + | true | Default schema changed to 'dfs' | + +++ + 1 row selected (0.019 seconds) + 4. Query the TSV file again. Because you switched to `dfs`, Drill does not know the location of the file. To provide the information to Drill, preface the file name with the storage plugin and workspace names in the FROM clause. + + SELECT COLUMNS[0], +COLUMNS[1], +COLUMNS[2] + FROM
[2/3] drill git commit: DRILL-2316: Add hive, parquet, json ref docs, basics tutorial, and minor edits
http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/manage/002-start-stop.md -- diff --git a/_docs/manage/002-start-stop.md b/_docs/manage/002-start-stop.md index 76a76f4..d37f840 100644 --- a/_docs/manage/002-start-stop.md +++ b/_docs/manage/002-start-stop.md @@ -28,7 +28,7 @@ can indicate the schema name when you invoke SQLLine. To start SQLLine, issue the appropriate command for your Drill installation type: -table tbodytrtd valign=topstrongDrill Install Type/strong/tdtd valign=topstrongExample/strong/tdtd valign=topstrongCommand/strong/td/trtrtd valign=topEmbedded/tdtd valign=topDrill installed locally (embedded mode);Hive with embedded metastore/tdtd valign=topTo connect without specifying a schema, navigate to the Drill installation directory and issue the following command:code$ bin/sqlline -u jdbc:drill:zk=local -n admin -p admin /codespan /spanOnce you are in the prompt, you can issuecode USE lt;schemagt; /codeor you can use absolute notation: codeschema.table.column./codeTo connect to a schema directly, issue the command with the schema name:code$ bin/sqlline -u jdbc:drill:schema=lt;databasegt;;zk=local -n admin -p admin/code/td/trtrtd valign=topDistributed/tdtd valign=topDrill installed in distributed mode;Hive with remote metastore;HBase/tdtd valign=topTo connect without specify ing a schema, navigate to the Drill installation directory and issue the following command:code$ bin/sqlline -u jdbc:drill:zk=lt;zk1hostgt;:lt;portgt;,lt;zk2hostgt;:lt;portgt;,lt;zk3hostgt;:lt;portgt; -n admin -p admin/codeOnce you are in the prompt, you can issuecode USE lt;schemagt; /codeor you can use absolute notation: codeschema.table.column./codeTo connect to a schema directly, issue the command with the schema name:code$ bin/sqlline -u jdbc:drill:schema=lt;databasegt;;zk=lt;zk1hostgt;:lt;portgt;,lt;zk2hostgt;:lt;portgt;,lt;zk3hostgt;:lt;portgt; -n admin -p admin/code/td/tr/tbody/table +table tbodytrtd valign=topstrongDrill Install Type/strong/tdtd valign=topstrongExample/strong/tdtd valign=topstrongCommand/strong/td/trtrtd valign=topEmbedded/tdtd valign=topDrill installed locally (embedded mode);Hive with embedded metastore/tdtd valign=topTo connect without specifying a schema, navigate to the Drill installation directory and issue the following command:code$ bin/sqlline -u jdbc:drill:zk=local -n admin -p admin /codespan /spanOnce you are in the prompt, you can issuecode USE lt;schemagt; /codeor you can use absolute notation: codeschema.table.column./codeTo connect to a schema directly, issue the command with the schema name:code$ bin/sqlline -u jdbc:drill:schema=lt;databasegt;;zk=local -n admin -p admin/code/td/trtrtd valign=topDistributed/tdtd valign=topDrill installed in distributed mode;Hive with remote metastore;HBase/tdtd valign=topTo connect without specify ing a schema, navigate to the Drill installation directory and issue the following command:code$ bin/sqlline -u jdbc:drill:zk=lt;zk1hostgt;:lt;portgt;,lt;zk2hostgt;:lt;portgt;,lt;zk3hostgt;:lt;portgt; -n admin -p admin/codeOnce you are in the prompt, you can issuecode USE lt;schemagt; /codeor you can use absolute notation: codeschema.table.column./codeTo connect to a schema directly, issue the command with the schema name:code$ bin/sqlline -u jdbc:drill:schema=lt;databasegt;;zk=lt;zk1hostgt;:lt;portgt;,lt;zk2hostgt;:lt;portgt;,lt;zk3hostgt;:lt;portgt; -n admin -p admin/code/td/tr/tbody/table/div When SQLLine starts, the system displays the following prompt: http://git-wip-us.apache.org/repos/asf/drill/blob/2a34ac89/_docs/manage/003-ports.md -- diff --git a/_docs/manage/003-ports.md b/_docs/manage/003-ports.md index df1d362..c72beff 100644 --- a/_docs/manage/003-ports.md +++ b/_docs/manage/003-ports.md @@ -5,5 +5,5 @@ parent: Manage Drill The following table provides a list of the ports that Drill uses, the port type, and a description of how Drill uses the port: -table tbodytrth Port/thth colspan=1 Type/thth Description/th/trtrtd valign=top 8047/tdtd valign=top colspan=1 TCP/tdtd valign=top Needed for span style=color: rgb(34,34,34);the Drill Web UI./spanspan style=color: rgb(34,34,34); /span/td/trtrtd valign=top 31010/tdtd valign=top colspan=1 TCP/tdtd valign=top User port address. Used between nodes in a Drill cluster. br /Needed for an external client, such as Tableau, to connect into thebr /cluster nodes. Also needed for the Drill Web UI./td/trtrtd valign=top 31011/tdtd valign=top colspan=1 TCP/tdtd valign=top Control port address. Used between nodes in a Drill cluster. br /Needed for multi-node installation of Apache Drill./td/trtrtd valign=top colspan=1 31012/tdtd valign=top colspan=1 TCP/tdtd valign=top colspan=1 Data port address. Used between nodes in a Drill cluster. br /Needed for multi-node ins tallation of Apache
[2/2] drill git commit: DRILL-2336 plugin updates
DRILL-2336 plugin updates Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/0119fdde Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/0119fdde Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/0119fdde Branch: refs/heads/gh-pages-master Commit: 0119fdde5ebb1a4921822ed213039bdbbbec4e71 Parents: 2a34ac8 Author: Kristine Hahn kh...@maprtech.com Authored: Mon Mar 2 17:25:46 2015 -0800 Committer: Bridget Bevens bbev...@maprtech.com Committed: Mon Mar 2 17:53:58 2015 -0800 -- _docs/005-connect.md| 25 +- _docs/connect/001-plugin-reg.md | 43 ++-- _docs/connect/002-plugin-conf.md| 123 ++ _docs/connect/002-workspaces.md | 74 -- _docs/connect/003-reg-fs.md | 64 - _docs/connect/003-workspaces.md | 74 ++ _docs/connect/004-reg-fs.md | 64 + _docs/connect/004-reg-hbase.md | 32 --- _docs/connect/005-reg-hbase.md | 34 +++ _docs/connect/005-reg-hive.md | 86 --- _docs/connect/006-default-frmt.md | 60 - _docs/connect/006-reg-hive.md | 83 +++ _docs/connect/007-default-frmt.md | 60 + _docs/connect/007-mongo-plugin.md | 167 - _docs/connect/008-mapr-db-plugin.md | 31 --- _docs/connect/008-mongo-plugin.md | 167 + _docs/connect/009-mapr-db-plugin.md | 30 +++ _docs/img/StoragePluginConfig.png | Bin 20403 - 0 bytes _docs/img/data-sources-schemachg.png| Bin 0 - 8071 bytes _docs/img/datasources-json-bracket.png | Bin 0 - 30129 bytes _docs/img/datasources-json.png | Bin 0 - 16364 bytes _docs/img/get2kno_plugin.png| Bin 0 - 55794 bytes _docs/img/json-workaround.png | Bin 20786 - 27547 bytes _docs/img/plugin-default.png| Bin 0 - 56412 bytes _docs/install/001-drill-in-10.md| 4 +- _docs/sql-ref/data-types/001-date.md| 8 +- _docs/tutorial/002-get2kno-sb.md| 241 ++- _docs/tutorial/003-lesson1.md | 44 ++-- _docs/tutorial/005-lesson3.md | 100 .../install-sandbox/001-install-mapr-vm.md | 2 +- .../install-sandbox/002-install-mapr-vb.md | 2 +- 31 files changed, 808 insertions(+), 810 deletions(-) -- http://git-wip-us.apache.org/repos/asf/drill/blob/0119fdde/_docs/005-connect.md -- diff --git a/_docs/005-connect.md b/_docs/005-connect.md index b48d200..3c60b2d 100644 --- a/_docs/005-connect.md +++ b/_docs/005-connect.md @@ -1,24 +1,24 @@ --- -title: Connect to Data Sources +title: Connect to a Data Source --- -Apache Drill serves as a query layer that connects to data sources through -storage plugins. Drill uses the storage plugins to interact with data sources. -You can think of a storage plugin as a connection between Drill and a data -source. +A storage plugin is an interface for connecting to a data source to read and write data. Apache Drill connects to a data source, such as a file on the file system or a Hive metastore, through a storage plugin. When you execute a query, Drill gets the plugin name you provide in FROM clause of your query. +In addition to the connection string, the storage plugin configures the workspace and file formats for reading and writing data, as described in subsequent sections. + +## Storage Plugins Internals The following image represents the storage plugin layer between Drill and a data source: ![drill query flow]({{ site.baseurl }}/docs/img/storageplugin.png) -Storage plugins provide the following information to Drill: +A storage plugin provides the following information to Drill: * Metadata available in the underlying data source * Location of data * Interfaces that Drill can use to read from and write to data sources * A set of storage plugin optimization rules that assist with efficient and faster execution of Drill queries, such as pushdowns, statistics, and partition awareness -Storage plugins perform scanner and writer functions, and inform the metadata +A storage plugin performs scanner and writer functions, and informs the metadata repository of any known metadata, such as: * Schema @@ -27,15 +27,6 @@ repository of any known metadata, such as: * Secondary indices * Number of blocks -Storage plugins inform the execution engine of any native capabilities, such +A storage plugin informs the execution engine of any native capabilities, such as predicate
[1/2] drill git commit: DRILL-2336 plugin updates
Repository: drill Updated Branches: refs/heads/gh-pages-master 2a34ac893 - 0119fdde5 http://git-wip-us.apache.org/repos/asf/drill/blob/0119fdde/_docs/tutorial/003-lesson1.md -- diff --git a/_docs/tutorial/003-lesson1.md b/_docs/tutorial/003-lesson1.md index 119d67f..577ede3 100644 --- a/_docs/tutorial/003-lesson1.md +++ b/_docs/tutorial/003-lesson1.md @@ -22,26 +22,17 @@ This lesson consists of select * queries on each data source. ## Before You Begin -### Start sqlline +### Start SQLLine -If sqlline is not already started, use a Terminal or Command window to log -into the demo VM as root, then enter `sqlline`: +If SQLLine is not already started, use a Terminal or Command window to log +into the demo VM as root, then enter `sqlline`, as described in [Getting to Know the Sandbox](/docs/getting-to-know-the-drill-sandbox): -$ ssh root@10.250.0.6 -Password: -Last login: Mon Sep 15 13:46:08 2014 from 10.250.0.28 -Welcome to your Mapr Demo virtual machine. -[root@maprdemo ~]# sqlline -sqlline version 1.1.6 -0: jdbc:drill: - -You can run queries from this prompt to complete the tutorial. To exit from -`sqlline`, type: +You can run queries from the `sqlline` prompt to complete the tutorial. To exit from +SQLLine, type: 0: jdbc:drill: !quit -Note that though this tutorial demonstrates the queries using SQLLine, you can -also execute queries using the Drill Web UI. +Examples in this tutorial use SQLLine. You can also execute queries using the Drill Web UI. ### List the available workspaces and databases: @@ -55,7 +46,6 @@ also execute queries using the Drill Web UI. | dfs.root| | dfs.views | | dfs.clicks | -| dfs.data| | dfs.tmp | | sys | | maprdb | @@ -64,9 +54,9 @@ also execute queries using the Drill Web UI. +-+ 12 rows selected -Note that this command exposes all the metadata available from the storage -plugins configured with Drill as a set of schemas. This includes the Hive and -MapR-DB databases as well as the workspaces configured in the file system. As +This command exposes all the metadata available from the storage +plugins configured with Drill as a set of schemas. The Hive and +MapR-DB databases, file system, and other data are configured in the file system. As you run queries in the tutorial, you will switch among these schemas by submitting the USE command. This behavior resembles the ability to use different database schemas (namespaces) in a relational database system. @@ -113,13 +103,13 @@ on the metadata available in the Hive metastore. 0: jdbc:drill: select * from orders limit 5; ++++++-+ -| order_id | month | cust_id | state | prod_id | order_total | +| order_id | month| cust_id | state| prod_id | order_total | ++++++-+ -| 67212 | June | 10001 | ca | 909 | 13 | -| 70302 | June | 10004 | ga | 420 | 11 | -| 69090 | June | 10011 | fl | 44 | 76 | -| 68834 | June | 10012 | ar | 0 | 81 | -| 71220 | June | 10018 | az | 411 | 24 | +| 67212 | June | 10001 | ca | 909| 13 | +| 70302 | June | 10004 | ga | 420| 11 | +| 69090 | June | 10011 | fl | 44 | 76 | +| 68834 | June | 10012 | ar | 0 | 81 | +| 71220 | June | 10018 | az | 411| 24 | ++++++-+ Because orders is a Hive table, you can query the data in the same way that @@ -256,7 +246,7 @@ a relational database âtable.â Therefore, you can perform SQL operations directly on files and directories without the need for up-front schema definitions or schema management for any model changes. The schema is discovered on the fly based on the query. Drill supports queries on a variety -of file formats including text, CSV, Parquet, and JSON in the 0.5 release. +of file formats including text, CSV, Parquet, and JSON. In this example, the clickstream data coming from the mobile/web applications is in JSON format. The JSON files have the following structure: @@ -285,7 +275,7 @@ setup beyond the definition of a workspace. In this case, setting the workspace is a mechanism for making queries easier to write. When you specify a file system workspace, you can shorten references -to files in the FROM clause of your queries. Instead of having to provide the +to files in your queries. Instead of having to provide the complete path to a file, you can provide the path relative to a directory location specified in the workspace. For
drill git commit: DRILL-2338: Fix Decimal38/Decimal28 vector's get() to copy the scale and precision into the holder
Repository: drill Updated Branches: refs/heads/master 3442215fd - a84f7b9e8 DRILL-2338: Fix Decimal38/Decimal28 vector's get() to copy the scale and precision into the holder Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/a84f7b9e Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/a84f7b9e Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/a84f7b9e Branch: refs/heads/master Commit: a84f7b9e88b1827e6b4da8cdd25c6d4f12dcdadc Parents: 3442215 Author: Mehant Baid meha...@gmail.com Authored: Fri Feb 27 19:21:51 2015 -0800 Committer: Mehant Baid meha...@gmail.com Committed: Mon Mar 2 11:13:27 2015 -0800 -- .../codegen/templates/FixedValueVectors.java| 12 ++ .../physical/impl/writer/TestParquetWriter.java | 25 2 files changed, 27 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/drill/blob/a84f7b9e/exec/java-exec/src/main/codegen/templates/FixedValueVectors.java -- diff --git a/exec/java-exec/src/main/codegen/templates/FixedValueVectors.java b/exec/java-exec/src/main/codegen/templates/FixedValueVectors.java index b5011e6..6cea8c8 100644 --- a/exec/java-exec/src/main/codegen/templates/FixedValueVectors.java +++ b/exec/java-exec/src/main/codegen/templates/FixedValueVectors.java @@ -394,17 +394,8 @@ public final class ${minor.class}Vector extends BaseDataValueVector implements F #elseif (minor.class == Decimal28Sparse) || (minor.class == Decimal38Sparse) || (minor.class == Decimal28Dense) || (minor.class == Decimal38Dense) public void get(int index, ${minor.class}Holder holder) { - holder.start = index * ${type.width}; - holder.buffer = data; - -/* The buffer within the value vector is little endian. - * For the dense representation though, we use big endian - * byte ordering (internally). This is because we shift bits to the right and - * big endian ordering makes sense for this purpose. So we have to deal with - * the sign bit for the two representation in a slightly different fashion - */ holder.scale = getField().getScale(); holder.precision = getField().getPrecision(); } @@ -412,8 +403,9 @@ public final class ${minor.class}Vector extends BaseDataValueVector implements F public void get(int index, Nullable${minor.class}Holder holder) { holder.isSet = 1; holder.start = index * ${type.width}; - holder.buffer = data; +holder.scale = getField().getScale(); +holder.precision = getField().getPrecision(); } @Override http://git-wip-us.apache.org/repos/asf/drill/blob/a84f7b9e/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java -- diff --git a/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java b/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java index 7298f28..76328c6 100644 --- a/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java +++ b/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java @@ -21,6 +21,7 @@ import static org.junit.Assert.assertEquals; import java.io.UnsupportedEncodingException; import java.lang.reflect.Array; +import java.math.BigDecimal; import java.util.ArrayList; import java.util.Arrays; import java.util.HashMap; @@ -360,6 +361,30 @@ public class TestParquetWriter extends BaseTestQuery { compareParquetReadersColumnar(wr_returning_customer_sk, dfs.`/tmp/web_returns`); } + @Test + public void testWriteDecimal() throws Exception { +String outputTable = decimal_test; +Path path = new Path(/tmp/ + outputTable); +if (fs.exists(path)) { + fs.delete(path, true); +} +String ctas = String.format(use dfs.tmp; + +create table %s as select + +cast('1.2' as decimal(38, 2)) col1, cast('1.2' as decimal(28, 2)) col2 + +from cp.`employee.json` limit 1, outputTable); + +test(ctas); + +BigDecimal result = new BigDecimal(1.20); + +testBuilder() +.unOrdered() +.sqlQuery(String.format(select col1, col2 from %s , outputTable)) +.baselineColumns(col1, col2) +.baselineValues(result, result) +.go(); + } + public void runTestAndValidate(String selection, String validationSelection, String inputTable, String outputFile) throws Exception { Path path = new Path(/tmp/ + outputFile);
drill git commit: DRILL-2236: Optimize hash inner join by swapping inputs based on row count comparison. Add a planner option to enable/disable this feature.
Repository: drill Updated Branches: refs/heads/master 9c0738d94 - 3442215fd DRILL-2236: Optimize hash inner join by swapping inputs based on row count comparison. Add a planner option to enable/disable this feature. Revise code based on review comments. Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/3442215f Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/3442215f Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/3442215f Branch: refs/heads/master Commit: 3442215fd91e700f659bc055cd7c05b623bc59b3 Parents: 9c0738d Author: Jinfeng Ni j...@maprtech.com Authored: Thu Jan 29 13:24:28 2015 -0800 Committer: Jinfeng Ni j...@maprtech.com Committed: Mon Mar 2 10:03:31 2015 -0800 -- .../exec/planner/physical/HashJoinPrel.java | 54 + .../drill/exec/planner/physical/JoinPrel.java | 4 +- .../exec/planner/physical/MergeJoinPrel.java| 2 +- .../exec/planner/physical/PlannerSettings.java | 11 +++ .../physical/explain/NumberingRelWriter.java| 7 ++ .../physical/visitor/SwapHashJoinVisitor.java | 79 .../planner/sql/handlers/DefaultSqlHandler.java | 13 +++- .../server/options/SystemOptionManager.java | 2 + 8 files changed, 154 insertions(+), 18 deletions(-) -- http://git-wip-us.apache.org/repos/asf/drill/blob/3442215f/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java -- diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java b/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java index a3c42de..f63057f 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java @@ -20,6 +20,7 @@ package org.apache.drill.exec.planner.physical; import java.io.IOException; import java.util.List; +import net.hydromatic.optiq.runtime.FlatLists; import org.apache.drill.common.expression.FieldReference; import org.apache.drill.common.logical.data.JoinCondition; import org.apache.drill.exec.ExecConstants; @@ -46,18 +47,24 @@ import com.google.common.collect.Lists; public class HashJoinPrel extends JoinPrel { + private boolean swapped = false; + public HashJoinPrel(RelOptCluster cluster, RelTraitSet traits, RelNode left, RelNode right, RexNode condition, - JoinRelType joinType) throws InvalidRelException { -super(cluster, traits, left, right, condition, joinType); + JoinRelType joinType) throws InvalidRelException { +this(cluster, traits, left, right, condition, joinType, false); + } + public HashJoinPrel(RelOptCluster cluster, RelTraitSet traits, RelNode left, RelNode right, RexNode condition, + JoinRelType joinType, boolean swapped) throws InvalidRelException { +super(cluster, traits, left, right, condition, joinType); +this.swapped = swapped; RelOptUtil.splitJoinCondition(left, right, condition, leftKeys, rightKeys); } - @Override public JoinRelBase copy(RelTraitSet traitSet, RexNode conditionExpr, RelNode left, RelNode right, JoinRelType joinType, boolean semiJoinDone) { try { - return new HashJoinPrel(this.getCluster(), traitSet, left, right, conditionExpr, joinType); + return new HashJoinPrel(this.getCluster(), traitSet, left, right, conditionExpr, joinType, this.swapped); }catch (InvalidRelException e) { throw new AssertionError(e); } @@ -100,11 +107,32 @@ public class HashJoinPrel extends JoinPrel { @Override public PhysicalOperator getPhysicalOperator(PhysicalPlanCreator creator) throws IOException { +// Depending on whether the left/right is swapped for hash inner join, pass in different +// combinations of parameters. +if (! swapped) { + return getHashJoinPop(creator, left, right, leftKeys, rightKeys); +} else { + return getHashJoinPop(creator, right, left, rightKeys, leftKeys); +} + } + + @Override + public SelectionVectorMode[] getSupportedEncodings() { +return SelectionVectorMode.DEFAULT; + } + + @Override + public SelectionVectorMode getEncoding() { +return SelectionVectorMode.NONE; + } + + private PhysicalOperator getHashJoinPop(PhysicalPlanCreator creator, RelNode left, RelNode right, + ListInteger leftKeys, ListInteger rightKeys) throws IOException{ final ListString fields = getRowType().getFieldNames(); assert isUnique(fields); -final int leftCount = left.getRowType().getFieldCount(); -final ListString leftFields = fields.subList(0, leftCount); -final ListString rightFields =