Repository: drill Updated Branches: refs/heads/gh-pages 923bad370 -> 2265dd160
DRILL-2493 Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/2265dd16 Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/2265dd16 Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/2265dd16 Branch: refs/heads/gh-pages Commit: 2265dd16087f080cfbdbd852e9d8a05a400d7968 Parents: 923bad3 Author: Kristine Hahn <kh...@maprtech.com> Authored: Thu Mar 19 16:26:52 2015 -0700 Committer: Bridget Bevens <bbev...@maprtech.com> Committed: Fri Mar 20 10:51:07 2015 -0700 ---------------------------------------------------------------------- _docs/009-datasources.md | 2 +- _docs/data-sources/004-json-ref.md | 2 + _docs/img/datasources-json-bracket.png | Bin 30129 -> 15160 bytes _docs/sql-ref/001-data-types.md | 4 +- _docs/sql-ref/002-lexical-structure.md | 8 ++ _docs/sql-ref/008-sql-extensions.md | 135 ++++++++++++++++++++++++++++ 6 files changed, 149 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/drill/blob/2265dd16/_docs/009-datasources.md ---------------------------------------------------------------------- diff --git a/_docs/009-datasources.md b/_docs/009-datasources.md index 5400f22..4b25e9e 100644 --- a/_docs/009-datasources.md +++ b/_docs/009-datasources.md @@ -3,7 +3,7 @@ title: "Data Sources and File Formats" --- Included in the data sources that Drill supports are these key data sources: -* Hbase +* HBase * Hive * MapR-DB * File system http://git-wip-us.apache.org/repos/asf/drill/blob/2265dd16/_docs/data-sources/004-json-ref.md ---------------------------------------------------------------------- diff --git a/_docs/data-sources/004-json-ref.md b/_docs/data-sources/004-json-ref.md index bf4c5d7..bbb4386 100644 --- a/_docs/data-sources/004-json-ref.md +++ b/_docs/data-sources/004-json-ref.md @@ -128,6 +128,8 @@ Generally, you query JSON files using the following syntax, which includes a tab The first index position of an array is 0. +* Do not use a map, array or repeated scalar type in GROUP BY, ORDER BY or in a comparison operator. + Drill returns null when a document does not have the specified map or level. Using the following techniques, you can query complex, nested JSON: http://git-wip-us.apache.org/repos/asf/drill/blob/2265dd16/_docs/img/datasources-json-bracket.png ---------------------------------------------------------------------- diff --git a/_docs/img/datasources-json-bracket.png b/_docs/img/datasources-json-bracket.png index e813568..4dfee92 100644 Binary files a/_docs/img/datasources-json-bracket.png and b/_docs/img/datasources-json-bracket.png differ http://git-wip-us.apache.org/repos/asf/drill/blob/2265dd16/_docs/sql-ref/001-data-types.md ---------------------------------------------------------------------- diff --git a/_docs/sql-ref/001-data-types.md b/_docs/sql-ref/001-data-types.md index 39b1468..7e56231 100644 --- a/_docs/sql-ref/001-data-types.md +++ b/_docs/sql-ref/001-data-types.md @@ -467,7 +467,9 @@ Although you can use CAST to handle binary data, CONVERT_TO and CONVERT_FROM are ### Using CONVERT_TO and CONVERT_FROM -To query HBase data in Drill, convert every column of an HBase table to/from byte arrays from/to an [SQL data type](/docs/data-types/) that Drill supports when writing/reading data. For examples of how to use these functions, see ["Convert and Cast Functions".](/docs/sql-functions#convert-and-cast-functions) +CONVERT_TO converts an SQL data type to complex types, including Hbase byte arrays, JSON and Parquet arrays and mapsTo query HBase data in Drill, convert every column of an HBase table to/from byte arrays from/to an [SQL data type](/docs/data-types/) that Drill supports when writing/reading data. For examples of how to use these functions, see ["Convert and Cast Functions".](/docs/sql-functions#convert-and-cast-functions) + +CONVERT_FROM converts from complex types, including Hbase byte arrays, JSON and Parquet arrays and maps to an SQL data type. ## Handling Textual Data In a textual file, such as CSV, Drill interprets every field as a VARCHAR, as previously mentioned. In addition to using the CAST function, you can also use [to_char](link), [to_date](line), [to_number](link), and [to_timestamp](link). If the SELECT statement includes a WHERE clause that compares a column of an unknown data type, cast both the value of the column and the comparison value in the WHERE clause. http://git-wip-us.apache.org/repos/asf/drill/blob/2265dd16/_docs/sql-ref/002-lexical-structure.md ---------------------------------------------------------------------- diff --git a/_docs/sql-ref/002-lexical-structure.md b/_docs/sql-ref/002-lexical-structure.md index 5115ab8..731b2b3 100644 --- a/_docs/sql-ref/002-lexical-structure.md +++ b/_docs/sql-ref/002-lexical-structure.md @@ -28,6 +28,14 @@ The upper/lowercase sensitivity of the parts differs. SQL function and command names are case-insensitive. Storage plugin and workspace names are case-sensitive. Column and table names are case-insensitive unless enclosed in double quotation marks. The double-quotation mark character can be used as an escape character for the double quotation mark. +Although column names are case-insensitive in Drill, the names might be otherwise in the storage format: + +* JSON: insensitive +* Hive: insensitive +* Parquet: insensitive +* MapR-DB: case-sensitive +* HBase: case-sensitive + Keywords are case-insensitive. For example, the keywords SELECT and select are equivalent. This document shows keywords in uppercase. The sys.options table name and values are case-sensitive. The following query works: http://git-wip-us.apache.org/repos/asf/drill/blob/2265dd16/_docs/sql-ref/008-sql-extensions.md ---------------------------------------------------------------------- diff --git a/_docs/sql-ref/008-sql-extensions.md b/_docs/sql-ref/008-sql-extensions.md new file mode 100644 index 0000000..1eb5f65 --- /dev/null +++ b/_docs/sql-ref/008-sql-extensions.md @@ -0,0 +1,135 @@ +--- +title: "SQL Extensions" +parent: "SQL Reference" +--- +Drill extends SQL to work with Hadoop-scale data and to explore smaller-scale data in ways not possible with SQL. Using intuitive SQL extensions you work with self-describing data and complex data types. Extensions to SQL include capabilities for exploring self-describing data, such as files and HBase, directly in the native format. + +Drill provides language support for pointing to [storage plugin]() interfaces that Drill uses to interact with data sources. Use the name of a storage plugin to specify a file system *database* as a prefix in queries when you refer to objects across databases. Query files, including compressed .gz files and directories like an SQL table using a single query. + +Drill extends the SELECT statement for reading complex, multi-structured data. The extended CREATE TABLE AS SELECT, provides the capability to write data of complex/multi-structured data types. Drill extends the [lexical rules](http://drill.apache.org/docs/lexical-structure) for working with files and directories, such as using back ticks for including file names, directory names, and reserved words in queries. Drill syntax supports using the file system as a persistent store for query profiles and diagnostic information. + +## Extensions for Hive- and HBase-related Data Sources + +Drill supports Hive and HBase as a plug-and-play data source. You can query Hive tables with no modifications and creating model in the Hive metastore. Primitives, such as JOIN, support columnar operation. + +## Extensions for JSON-related Data Sources +For reading all JSON data as text, use the all text mode extension. Drill extends SQL to provide access to repeating values in arrays and arrays within arrays (array indexes). You can use these extensions to reach into deeply nested data. Drill extensions use standard JavaScript notation for referencing data elements in a hierarchy. + +## Extensions for Text Data Sources +Drill handles plain text files and directories like standard SQL tables and can infer knowledge about the schema of the data. You can query compressed .gz files. + +## SQL Commands Extensions + +The following table describes key Drill extensions to SQL commands. + +<table> + <tr> + <th>Command</th> + <th>SQL</th> + <th>Drill</th> + </tr> + <tr> + <td>ALTER (SESSION | SYSTEM)</td> + <td>None</td> + <td>Changes a system or session option.</td> + </tr> + <tr> + <td>CREATE TABLE AS SELECT</td> + <td>Creates a table from selected data in an existing database table.</td> + <td>Stores selected data from one or more data sources on the file system.</td> + </tr> + <tr> + <td>CREATE VIEW</td> + <td>Creates a virtual table. The fields in a view are fields from one or more real tables in the database.</td> + <td>Creates a virtual structure for and stores the result set. The fields in a view are fields from files in a file system, Hive, Hbase, MapR-DB tables</td> + </tr> + <tr> + <td>DESCRIBE</td> + <td>Obtains information about the <select list> columns</td> + <td>Obtains information about views created in a workspace and tables created in Hive, HBase, and MapR-DB.</td> + </tr> + <tr> + <td>EXPLAIN</td> + <td>None</td> + <td>Obtains a query execution plan.</td> + </tr> + <tr> + <td>INSERT</td> + <td>Loads data into the database for querying.</td> + <td>No INSERT function. Performs schema-on-read querying and execution; no need to load data into Drill for querying.</td> + </tr> + <tr> + <td>SELECT</td> + <td>Retrieves rows from a database table or view.</td> + <td>Retrieves data from Hbase, Hive, MapR-DB, file system or other storage plugin data source.</td> + </tr> + <tr> + <td>SHOW (DATABASES | SCHEMAS | FILES | TABLES)</td> + <td>None</td> + <td>Lists the storage plugin data sources available for querying or the Hive, Hbase, MapR-DB tables, or views for the data source in use. Supports a FROM clause for listing file data sources in directories.</td> + </tr> + <tr> + <td>USE</td> + <td>Targets a database in SQL schema for querying.</td> + <td>Targets Hbase, Hive, MapR-DB, file system or other storage plugin data source, which can be schema-less for querying.</td> + </tr> +</table> + +## SQL Function Extensions +The following table describes key Drill functions for analyzing nested data. + +<table> + <tr> + <th>Function</th> + <th>SQL</th> + <th>Drill</th> + </tr> + <tr> + <td>CAST</td> + <td>Casts database data from one type to another.</td> + <td>Casts database data from one type to another and also casts data having no metadata into a readable type. Allows liberal casting of schema-less data.</td> + </tr> + <tr> + <td>CONVERT_TO</td> + <td>Converts an expression from one type to another using the CONVERT command.</td> + <td>Converts an SQL data type to complex types, including Hbase byte arrays, JSON and Parquet arrays and maps.</td> + </tr> + <tr> + <td>CONVERT_FROM</td> + <td>Same as above</td> + <td>Converts from complex types, including Hbase byte arrays, JSON and Parquet arrays and maps to an SQL data type.</td> + </tr> + <tr> + <td>FLATTEN</td> + <td>None</td> + <td>Separates the elements in nested data from a repeated field into individual records.</td> + </tr> + <tr> + <td>KVGEN</td> + <td>None</td> + <td>Returns a repeated map, generating key-value pairs to simplify querying of complex data having unknown column names. You can then aggregate or filter on the key or value.</td> + </tr> + <tr> + <td>REPEATED_COUNT</td> + <td>None</td> + <td>Counts the values in a JSON array.</td> + </tr> +</table> + +## Other Extensions + +[`sys` database system tables]() provide port, version, and option information. Drill Connects to a random node, know where youâre connected: + +select host from sys.drillbits where `current` = true; ++------------+ +| host | ++------------+ +| 10.1.1.109 | ++------------+ + +select commit_id from sys.version; ++------------+ +| commit_id | ++------------+ +| e3ab2c1760ad34bda80141e2c3108f7eda7c9104 | +