http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/query/005-query-info-skema.md ---------------------------------------------------------------------- diff --git a/_docs/query/005-query-info-skema.md b/_docs/query/005-query-info-skema.md new file mode 100644 index 0000000..1ad0008 --- /dev/null +++ b/_docs/query/005-query-info-skema.md @@ -0,0 +1,109 @@ +--- +title: "Querying the INFORMATION SCHEMA" +parent: "Query Data" +--- +When you are using Drill to connect to multiple data sources, you need a +simple mechanism to discover what each data source contains. The information +schema is an ANSI standard set of metadata tables that you can query to return +information about all of your Drill data sources (or schemas). Data sources +may be databases or file systems; they are all known as "schemas" in this +context. You can query the following INFORMATION_SCHEMA tables: + + * SCHEMATA + * CATALOGS + * TABLES + * COLUMNS + * VIEWS + +## SCHEMATA + +The SCHEMATA table contains the CATALOG_NAME and SCHEMA_NAME columns. To allow +maximum flexibility inside BI tools, the only catalog that Drill supports is +`DRILL`. + + 0: jdbc:drill:zk=local> select CATALOG_NAME, SCHEMA_NAME as all_my_data_sources from INFORMATION_SCHEMA.SCHEMATA order by SCHEMA_NAME; + +--------------+---------------------+ + | CATALOG_NAME | all_my_data_sources | + +--------------+---------------------+ + | DRILL | INFORMATION_SCHEMA | + | DRILL | cp.default | + | DRILL | dfs.default | + | DRILL | dfs.root | + | DRILL | dfs.tmp | + | DRILL | HiveTest.SalesDB | + | DRILL | maprfs.logs | + | DRILL | sys | + +--------------+---------------------+ + +The INFORMATION_SCHEMA name and associated keywords are case-sensitive. You +can also return a list of schemas by running the SHOW DATABASES command: + + 0: jdbc:drill:zk=local> show databases; + +-------------+ + | SCHEMA_NAME | + +-------------+ + | dfs.default | + | dfs.root | + | dfs.tmp | + ... + +## CATALOGS + +The CATALOGS table returns only one row, with the hardcoded DRILL catalog name +and description. + +## TABLES + +The TABLES table returns the table name and type for each table or view in +your databases. (Type means TABLE or VIEW.) Note that Drill does not return +files available for querying in file-based data sources. Instead, use SHOW +FILES to explore these data sources. + +## COLUMNS + +The COLUMNS table returns the column name and other metadata (such as the data +type) for each column in each table or view. + +## VIEWS + +The VIEWS table returns the name and definition for each view in your +databases. Note that file schemas are the canonical repository for views in +Drill. Depending on how you create a view, the may only be displayed in Drill +after it has been used. + +## Useful Queries + +Run an ``INFORMATION_SCHEMA.`TABLES` ``query to view all of the tables and views +within a database. TABLES is a reserved word in Drill and requires back ticks +(`). + +For example, the following query identifies all of the tables and views that +Drill can access: + + SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE + FROM INFORMATION_SCHEMA.`TABLES` + ORDER BY TABLE_NAME DESC; + ---------------------------------------------------------------- + TABLE_SCHEMA TABLE_NAME TABLE_TYPE + ---------------------------------------------------------------- + HiveTest.CustomersDB Customers TABLE + HiveTest.SalesDB Orders TABLE + HiveTest.SalesDB OrderLines TABLE + HiveTest.SalesDB USOrders VIEW + dfs.default CustomerSocialProfile VIEW + ---------------------------------------------------------------- + +**Note:** Currently, Drill only supports querying Drill views; Hive views are not yet supported. + +You can run a similar query to identify columns in tables and the data types +of those columns: + + SELECT COLUMN_NAME, DATA_TYPE + FROM INFORMATION_SCHEMA.COLUMNS + WHERE TABLE_NAME = 'Orders' AND TABLE_SCHEMA = 'HiveTest.SalesDB' AND COLUMN_NAME LIKE '%Total'; + +-------------+------------+ + | COLUMN_NAME | DATA_TYPE | + +-------------+------------+ + | OrderTotal | Decimal | + +-------------+------------+ +
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/query/006-query-sys-tbl.md ---------------------------------------------------------------------- diff --git a/_docs/query/006-query-sys-tbl.md b/_docs/query/006-query-sys-tbl.md new file mode 100644 index 0000000..9b853ec --- /dev/null +++ b/_docs/query/006-query-sys-tbl.md @@ -0,0 +1,159 @@ +--- +title: "Querying System Tables" +parent: "Query Data" +--- +Drill has a sys database that contains system tables. You can query the system +tables for information about Drill, including Drill ports, the Drill version +running on the system, and available Drill options. View the databases in +Drill to identify the sys database, and then use the sys database to view +system tables that you can query. + +## View Drill Databases + +Issue the `SHOW DATABASES` command to view Drill databases. + + 0: jdbc:drill:zk=10.10.100.113:5181> show databases; + +-------------+ + | SCHEMA_NAME | + +-------------+ + | M7 | + | hive.default| + | dfs.default | + | dfs.root | + | dfs.views | + | dfs.tmp | + | dfs.tpcds | + | sys | + | cp.default | + | hbase | + | INFORMATION_SCHEMA | + +-------------+ + 11 rows selected (0.162 seconds) + +Drill returns `sys` in the database results. + +## Use the Sys Database + +Issue the `USE` command to select the sys database for subsequent SQL +requests. + + 0: jdbc:drill:zk=10.10.100.113:5181> use sys; + +------------+--------------------------------+ + | ok | summary | + +------------+--------------------------------+ + | true | Default schema changed to 'sys' | + +------------+--------------------------------+ + 1 row selected (0.101 seconds) + +## View Tables + +Issue the `SHOW TABLES` command to view the tables in the sys database. + + 0: jdbc:drill:zk=10.10.100.113:5181> show tables; + +--------------+------------+ + | TABLE_SCHEMA | TABLE_NAME | + +--------------+------------+ + | sys | drillbits | + | sys | version | + | sys | options | + +--------------+------------+ + 3 rows selected (0.934 seconds) + 0: jdbc:drill:zk=10.10.100.113:5181> + +## Query System Tables + +Query the drillbits, version, and options tables in the sys database. + +###Query the drillbits table. + + 0: jdbc:drill:zk=10.10.100.113:5181> select * from drillbits; + +------------------+------------+--------------+------------+---------+ + | host | user_port | control_port | data_port | current| + +-------------------+------------+--------------+------------+--------+ + | qa-node115.qa.lab | 31010 | 31011 | 31012 | true | + | qa-node114.qa.lab | 31010 | 31011 | 31012 | false | + | qa-node116.qa.lab | 31010 | 31011 | 31012 | false | + +------------+------------+--------------+------------+---------------+ + 3 rows selected (0.146 seconds) + + * host +The name of the node running the Drillbit service. + * user-port +The user port address, used between nodes in a cluster for connecting to +external clients and for the Drill Web UI. + * control_port +The control port address, used between nodes for multi-node installation of +Apache Drill. + * data_port +The data port address, used between nodes for multi-node installation of +Apache Drill. + * current +True means the Drillbit is connected to the session or client running the +query. This Drillbit is the Foreman for the current session. + +### Query the version table. + + 0: jdbc:drill:zk=10.10.100.113:5181> select * from version; + +------------+----------------+-------------+-------------+------------+ + | commit_id | commit_message | commit_time | build_email | build_time | + +------------+----------------+-------------+-------------+------------+ + | 108d29fce3d8465d619d45db5f6f433ca3d97619 | DRILL-1635: Additional fix for validation exceptions. | 14.11.2014 @ 02:32:47 UTC | Unknown | 14.11.2014 @ 03:56:07 UTC | + +------------+----------------+-------------+-------------+------------+ + 1 row selected (0.144 seconds) + * commit_id +The github id of the release you are running. For example, <https://github.com +/apache/drill/commit/e3ab2c1760ad34bda80141e2c3108f7eda7c9104> + * commit_message +The message explaining the change. + * commit_time +The date and time of the change. + * build_email +The email address of the person who made the change, which is unknown in this +example. + * build_time +The time that the release was built. + +### Query the options table. + +Drill provides system, session, and boot options that you can query. + +The following example shows a query on the system options: + + 0: jdbc:drill:zk=10.10.100.113:5181> select * from options where type='SYSTEM' limit 10; + +------------+------------+------------+------------+------------+------------+------------+ + | name | kind | type | num_val | string_val | bool_val | float_val | + +------------+------------+------------+------------+------------+------------+------------+ + | exec.max_hash_table_size | LONG | SYSTEM | 1073741824 | null | null | null | + | planner.memory.max_query_memory_per_node | LONG | SYSTEM | 2048 | null | null | null | + | planner.join.row_count_estimate_factor | DOUBLE | SYSTEM | null | null | null | 1.0 | + | planner.affinity_factor | DOUBLE | SYSTEM | null | null | null | 1.2 | + | exec.errors.verbose | BOOLEAN | SYSTEM | null | null | false | null | + | planner.disable_exchanges | BOOLEAN | SYSTEM | null | null | false | null | + | exec.java_compiler_debug | BOOLEAN | SYSTEM | null | null | true | null | + | exec.min_hash_table_size | LONG | SYSTEM | 65536 | null | null | null | + | exec.java_compiler_janino_maxsize | LONG | SYSTEM | 262144 | null | null | null | + | planner.enable_mergejoin | BOOLEAN | SYSTEM | null | null | true | null | + +------------+------------+------------+------------+------------+------------+------------+ + 10 rows selected (0.334 seconds) + * name +The name of the option. + * kind +The data type of the option value. + * type +The type of options in the output: system, session, or boot. + * num_val +The default value, which is of the long or int data type; otherwise, null. + * string_val +The default value, which is a string; otherwise, null. + * bool_val +The default value, which is true or false; otherwise, null. + * float_val +The default value, which is of the double, float, or long double data type; +otherwise, null. + +For information about how to configure Drill system and session options, see[ +Planning and Execution Options](/drill/docs/planning-and-execution-options). + +For information about how to configure Drill start-up options, see[ Start-Up +Options](/drill/docs/start-up-options). + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/query/query-complex/001-sample-donuts.md ---------------------------------------------------------------------- diff --git a/_docs/query/query-complex/001-sample-donuts.md b/_docs/query/query-complex/001-sample-donuts.md new file mode 100644 index 0000000..9bace24 --- /dev/null +++ b/_docs/query/query-complex/001-sample-donuts.md @@ -0,0 +1,40 @@ +--- +title: "Sample Data: Donuts" +parent: "Querying Complex Data" +--- +The complex data queries use sample `donuts.json` and `moredonuts.json` files. +Here is the single complete "record" (`0001`) from the `donuts.json `file. In +terms of Drill query processing, this record is equivalent to a single record +in a table. + + { + "id": "0001", + "type": "donut", + "name": "Cake", + "ppu": 0.55, + "batters": + { + "batter": + [ + { "id": "1001", "type": "Regular" }, + { "id": "1002", "type": "Chocolate" }, + { "id": "1003", "type": "Blueberry" }, + { "id": "1004", "type": "Devil's Food" } + ] + }, + "topping": + [ + { "id": "5001", "type": "None" }, + { "id": "5002", "type": "Glazed" }, + { "id": "5005", "type": "Sugar" }, + { "id": "5007", "type": "Powdered Sugar" }, + { "id": "5006", "type": "Chocolate with Sprinkles" }, + { "id": "5003", "type": "Chocolate" }, + { "id": "5004", "type": "Maple" } + ] + } + +The data is made up of maps, arrays, and nested arrays. Name-value pairs and +embedded name-value pairs define the contents of each record. For example, +`type: donut` is a map. Under `topping`, the pairs of `id` and `type` values +belong to an array (inside the square brackets). \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/query/query-complex/002-query1-select.md ---------------------------------------------------------------------- diff --git a/_docs/query/query-complex/002-query1-select.md b/_docs/query/query-complex/002-query1-select.md new file mode 100644 index 0000000..654a158 --- /dev/null +++ b/_docs/query/query-complex/002-query1-select.md @@ -0,0 +1,19 @@ +--- +title: "Query 1: Selecting Flat Data" +parent: "Querying Complex Data" +--- +A very simple query against the `donuts.json` file returns the values for the +four "flat" columns (the columns that contain data at the top level only: no +nested data): + + 0: jdbc:drill:zk=local> select id, type, name, ppu + from dfs.`/Users/brumsby/drill/donuts.json`; + +------------+------------+------------+------------+ + | id | type | name | ppu | + +------------+------------+------------+------------+ + | 0001 | donut | Cake | 0.55 | + +------------+------------+------------+------------+ + 1 row selected (0.248 seconds) + +Note that `dfs` is the schema name, the path to the file is enclosed by +backticks, and the query must end with a semicolon. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/query/query-complex/003-query2-use-sql.md ---------------------------------------------------------------------- diff --git a/_docs/query/query-complex/003-query2-use-sql.md b/_docs/query/query-complex/003-query2-use-sql.md new file mode 100644 index 0000000..8981b65 --- /dev/null +++ b/_docs/query/query-complex/003-query2-use-sql.md @@ -0,0 +1,58 @@ +--- +title: "Query 2: Using Standard SQL Functions, Clauses, and Joins" +parent: "Querying Complex Data" +--- +You can use standard SQL clauses, such as WHERE and ORDER BY, to elaborate on +this kind of simple query: + + 0: jdbc:drill:zk=local> select id, type from dfs.`/Users/brumsby/drill/donuts.json` + where id>0 + order by id limit 1; + + +------------+------------+ + | id | type | + +------------+------------+ + | 0001 | donut | + +------------+------------+ + + 1 row selected (0.318 seconds) + +You can also join files (or tables, or files and tables) by using standard +syntax: + + 0: jdbc:drill:zk=local> select tbl1.id, tbl1.type from dfs.`/Users/brumsby/drill/donuts.json` as tbl1 + join + dfs.`/Users/brumsby/drill/moredonuts.json` as tbl2 + on tbl1.id=tbl2.id; + + +------------+------------+ + | id | type | + +------------+------------+ + | 0001 | donut | + +------------+------------+ + + 1 row selected (0.395 seconds) + +Equivalent USING syntax and joins in the WHERE clause are also supported. + +Standard aggregate functions work against JSON data. For example: + + 0: jdbc:drill:zk=local> select type, avg(ppu) as ppu_sum from dfs.`/Users/brumsby/drill/donuts.json` group by type; + + +------------+------------+ + | type | ppu_sum | + +------------+------------+ + | donut | 0.55 | + +------------+------------+ + + 1 row selected (0.216 seconds) + + 0: jdbc:drill:zk=local> select type, sum(sales) as sum_by_type from dfs.`/Users/brumsby/drill/moredonuts.json` group by type; + + +------------+-------------+ + | type | sum_by_type | + +------------+-------------+ + | donut | 1194 | + +------------+-------------+ + + 1 row selected (0.389 seconds) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/query/query-complex/004-query3-sel-nest.md ---------------------------------------------------------------------- diff --git a/_docs/query/query-complex/004-query3-sel-nest.md b/_docs/query/query-complex/004-query3-sel-nest.md new file mode 100644 index 0000000..6bd2e94 --- /dev/null +++ b/_docs/query/query-complex/004-query3-sel-nest.md @@ -0,0 +1,45 @@ +--- +title: "Query 3: Selecting Nested Data for a Column" +parent: "Querying Complex Data" +--- +The following queries show how to access the nested data inside the parts of +the record that are not flat (such as `topping`). To isolate and return nested +data, use the `[n]` notation, where `n` is a number that points to a specific +position in an array. Arrays use a 0-based index, so `topping[3]` points to +the _fourth_ element in the array under `topping`, not the third. + + 0: jdbc:drill:zk=local> select topping[3] as top from dfs.`/Users/brumsby/drill/donuts.json`; + + +------------+ + | top | + +------------+ + | {"id":"5007","type":"Powdered Sugar"} | + +------------ + 1 row selected (0.137 seconds) + +Note that this query produces _one column for all of the data_ that is nested +inside the `topping` segment of the file. The query as written does not unpack +the `id` and `type` name/value pairs. Also note the use of an alias for the +column name. (Without the alias, the default column name would be `EXPR$0`.) + +Some JSON files store arrays within arrays. If your data has this +characteristic, you can probe into the inner array by using the following +notation: `[n][n]` + +For example, assume that a segment of the JSON file looks like this: + + ... + group: + [ + [1,2,3], + + [4,5,6], + + [7,8,9] + ] + ... + +The following query would return `6` (the _third_ value of the _second_ inner +array). + +`select group[1][2]` \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/query/query-complex/005-query4-sel-multiple.md ---------------------------------------------------------------------- diff --git a/_docs/query/query-complex/005-query4-sel-multiple.md b/_docs/query/query-complex/005-query4-sel-multiple.md new file mode 100644 index 0000000..dc5c08f --- /dev/null +++ b/_docs/query/query-complex/005-query4-sel-multiple.md @@ -0,0 +1,24 @@ +--- +title: "Query 4: Selecting Multiple Columns Within Nested Data" +parent: "Querying Complex Data" +--- +The following query goes one step further to extract the JSON data, selecting +specific `id` and `type` data values _as individual columns_ from inside the +`topping` array. This query is similar to the previous query, but it returns +the `id` and `type` values as separate columns. + + 0: jdbc:drill:zk=local> select tbl.topping[3].id as record, tbl.topping[3].type as first_topping + from dfs.`/Users/brumsby/drill/donuts.json` as tbl; + +------------+---------------+ + | record | first_topping | + +------------+---------------+ + | 5007 | Powdered Sugar | + +------------+---------------+ + 1 row selected (0.133 seconds) + +This query also introduces a typical requirement for queries against nested +data: the use of a table alias (named tbl in this example). Without the table +alias, the query would return an error because the parser would assume that id +is a column inside a table named topping. As in all standard SQL queries, +select tbl.col means that tbl is the name of an existing table (at least for +the duration of the query) and col is a column that exists in that table. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/query/query-fs/001-query-json.md ---------------------------------------------------------------------- diff --git a/_docs/query/query-fs/001-query-json.md b/_docs/query/query-fs/001-query-json.md new file mode 100644 index 0000000..219f2b1 --- /dev/null +++ b/_docs/query/query-fs/001-query-json.md @@ -0,0 +1,41 @@ +--- +title: "Querying JSON Files" +parent: "Querying a File System" +--- +Your Drill installation includes a sample JSON file located in Drill's +classpath. The sample JSON file, `employee.json`, contains fictitious employee +data. Use SQL syntax to query the sample `JSON` file. + +To view the data in the `employee.json` file, submit the following SQL query +to Drill: + + 0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json`; + +The query returns the following results: + +**Example of partial output** + + +-------------+------------+------------+------------+-------------+-----------+ + | employee_id | full_name | first_name | last_name | position_id | position_ | + +-------------+------------+------------+------------+-------------+-----------+ + | 1101 | Steve Eurich | Steve | Eurich | 16 | Store T | + | 1102 | Mary Pierson | Mary | Pierson | 16 | Store T | + | 1103 | Leo Jones | Leo | Jones | 16 | Store Tem | + | 1104 | Nancy Beatty | Nancy | Beatty | 16 | Store T | + | 1105 | Clara McNight | Clara | McNight | 16 | Store | + | 1106 | Marcella Isaacs | Marcella | Isaacs | 17 | Stor | + | 1107 | Charlotte Yonce | Charlotte | Yonce | 17 | Stor | + | 1108 | Benjamin Foster | Benjamin | Foster | 17 | Stor | + | 1109 | John Reed | John | Reed | 17 | Store Per | + | 1110 | Lynn Kwiatkowski | Lynn | Kwiatkowski | 17 | St | + | 1111 | Donald Vann | Donald | Vann | 17 | Store Pe | + | 1112 | William Smith | William | Smith | 17 | Store | + | 1113 | Amy Hensley | Amy | Hensley | 17 | Store Pe | + | 1114 | Judy Owens | Judy | Owens | 17 | Store Per | + | 1115 | Frederick Castillo | Frederick | Castillo | 17 | S | + | 1116 | Phil Munoz | Phil | Munoz | 17 | Store Per | + | 1117 | Lori Lightfoot | Lori | Lightfoot | 17 | Store | + ... + +-------------+------------+------------+------------+-------------+-----------+ + 1,155 rows selected (0.762 seconds) + 0: jdbc:drill:zk=local> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/query/query-fs/002-query-parquet.md ---------------------------------------------------------------------- diff --git a/_docs/query/query-fs/002-query-parquet.md b/_docs/query/query-fs/002-query-parquet.md new file mode 100644 index 0000000..cf19fcf --- /dev/null +++ b/_docs/query/query-fs/002-query-parquet.md @@ -0,0 +1,99 @@ +--- +title: "Querying Parquet Files" +parent: "Querying a File System" +--- +Your Drill installation includes a `sample-data` directory with Parquet files +that you can query. Use SQL syntax to query the `region.parquet` and +`nation.parquet` files in the `sample-data` directory. + +**Note:** Your Drill installation location may differ from the examples used here. The examples assume that Drill was installed in embedded mode on your machine following the [Apache Drill in 10 Minutes ](/drill/docs/apache-drill-in-10-minutes/)tutorial. If you installed Drill in distributed mode, or your `sample-data` directory differs from the location used in the examples, make sure to change the `sample-data` directory to the correct location before you run the queries. + +## Region File + +If you followed the Apache Drill in 10 Minutes instructions to install Drill +in embedded mode, the path to the parquet file varies between operating +systems. + +To view the data in the `region.parquet` file, issue the query appropriate for +your operating system: + + * Linux + + SELECT * FROM dfs.`/opt/drill/apache-drill-0.4.0-incubating/sample-data/region.parquet`; + + * Mac OS X + + SELECT * FROM dfs.`/Users/max/drill/apache-drill-0.4.0-incubating/sample-data/region.parquet`; + + * Windows + + SELECT * FROM dfs.`C:\drill\apache-drill-0.4.0-incubating\sample-data\region.parquet`; + +The query returns the following results: + + +------------+------------+ + | EXPR$0 | EXPR$1 | + +------------+------------+ + | AFRICA | lar deposits. blithely final packages cajole. regular waters ar | + | AMERICA | hs use ironic, even requests. s | + | ASIA | ges. thinly even pinto beans ca | + | EUROPE | ly final courts cajole furiously final excuse | + | MIDDLE EAST | uickly special accounts cajole carefully blithely close reques | + +------------+------------+ + 5 rows selected (0.165 seconds) + 0: jdbc:drill:zk=local> + +## Nation File + +If you followed the Apache Drill in 10 Minutes instructions to install Drill +in embedded mode, the path to the parquet file varies between operating +systems. + +To view the data in the `nation.parquet` file, issue the query appropriate for +your operating system: + + * Linux + + SELECT * FROM dfs.`/opt/drill/apache-drill-0.4.0-incubating/sample-data/nation.parquet`; + + * Mac OS X + + SELECT * FROM dfs.`/Users/max/drill/apache-drill-0.4.0-incubating/sample-data/nation.parquet`; + + * Windows + + SELECT * FROM dfs.`C:\drill\apache-drill-0.4.0-incubating\sample-data\nation.parquet`; + +The query returns the following results: + + +------------+------------+------------+------------+ + | EXPR$0 | EXPR$1 | EXPR$2 | EXPR$3 | + +------------+------------+------------+------------+ + | 0 | 0 | ALGERIA | haggle. carefully final deposits det | + | 1 | 1 | ARGENTINA | al foxes promise slyly according to t | + | 2 | 1 | BRAZIL | y alongside of the pending deposits. | + | 3 | 1 | CANADA | eas hang ironic, silent packages. sly | + | 4 | 4 | EGYPT | y above the carefully unusual theodol | + | 5 | 0 | ETHIOPIA | ven packages wake quickly. regu | + | 6 | 3 | FRANCE | refully final requests. regular, iron | + | 7 | 3 | GERMANY | l platelets. regular accounts x-ray: | + | 8 | 2 | INDIA | ss excuses cajole slyly across the pa | + | 9 | 2 | INDONESIA | slyly express asymptotes. regular de | + | 10 | 4 | IRAN | efully alongside of the slyly final d | + | 11 | 4 | IRAQ | nic deposits boost atop the quickly f | + | 12 | 2 | JAPAN | ously. final, express gifts cajole a | + | 13 | 4 | JORDAN | ic deposits are blithely about the ca | + | 14 | 0 | KENYA | pending excuses haggle furiously dep | + | 15 | 0 | MOROCCO | rns. blithely bold courts among the c | + | 16 | 0 | MOZAMBIQUE | s. ironic, unusual asymptotes wake bl | + | 17 | 1 | PERU | platelets. blithely pending dependenc | + | 18 | 2 | CHINA | c dependencies. furiously express not | + | 19 | 3 | ROMANIA | ular asymptotes are about the furious | + | 20 | 4 | SAUDI ARABIA | ts. silent requests haggle. closely | + | 21 | 2 | VIETNAM | hely enticingly express accounts. eve | + | 22 | 3 | RUSSIA | requests against the platelets use n | + | 23 | 3 | UNITED KINGDOM | eans boost carefully special requ | + | 24 | 1 | UNITED STATES | y final packages. slow foxes cajol | + +------------+------------+------------+------------+ + 25 rows selected (2.401 seconds) + 0: jdbc:drill:zk=local> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/query/query-fs/003-query-text.md ---------------------------------------------------------------------- diff --git a/_docs/query/query-fs/003-query-text.md b/_docs/query/query-fs/003-query-text.md new file mode 100644 index 0000000..1fd9d84 --- /dev/null +++ b/_docs/query/query-fs/003-query-text.md @@ -0,0 +1,119 @@ +--- +title: "Querying Plain Text Files" +parent: "Querying a File System" +--- +You can use Drill to access both structured file types and plain text files +(flat files). This section shows a few simple examples that work on flat +files: + + * CSV files (comma-separated values) + * TSV files (tab-separated values) + * PSV files (pipe-separated values) + +The examples here show CSV files, but queries against TSV and PSV files return +equivalent results. However, make sure that your registered storage plugins +recognize the appropriate file types and extensions. For example, the +following configuration expects PSV files (files with a pipe delimiter) to +have a `tbl` extension, not a `psv` extension. Drill returns a "file not +found" error if references to files in queries do not match these conditions. + + "formats": { + "psv": { + "type": "text", + "extensions": [ + "tbl" + ], + "delimiter": "|" + } + +## SELECT * FROM a CSV File + +The first query selects rows from a `.csv` text file. The file contains seven +records: + + $ more plays.csv + + 1599,As You Like It + 1601,Twelfth Night + 1594,Comedy of Errors + 1595,Romeo and Juliet + 1596,The Merchant of Venice + 1610,The Tempest + 1599,Hamlet + +Drill recognizes each row as an array of values and returns one column for +each row. + + 0: jdbc:drill:zk=local> select * from dfs.`/Users/brumsby/drill/plays.csv`; + + +------------+ + | columns | + +------------+ + | ["1599","As You Like It"] | + | ["1601","Twelfth Night"] | + | ["1594","Comedy of Errors"] | + | ["1595","Romeo and Juliet"] | + | ["1596","The Merchant of Venice"] | + | ["1610","The Tempest"] | + | ["1599","Hamlet"] | + +------------+ + 7 rows selected (0.089 seconds) + +## Columns[n] Syntax + +You can use the `COLUMNS[n]` syntax in the SELECT list to return these CSV +rows in a more readable, column by column, format. (This syntax uses a zero- +based index, so the first column is column `0`.) + + 0: jdbc:drill:zk=local> select columns[0], columns[1] from dfs.`/Users/brumsby/drill/plays.csv`; + + +------------+------------+ + | EXPR$0 | EXPR$1 | + +------------+------------+ + | 1599 | As You Like It | + | 1601 | Twelfth Night | + | 1594 | Comedy of Errors | + | 1595 | Romeo and Juliet | + | 1596 | The Merchant of Venice | + | 1610 | The Tempest | + | 1599 | Hamlet | + +------------+------------+ + 7 rows selected (0.137 seconds) + +You can use aliases to return meaningful column names. Note that `YEAR` is a +reserved word, so the `Year` alias must be enclosed by back ticks. + + 0: jdbc:drill:zk=local> select columns[0] as `Year`, columns[1] as Play + from dfs.`/Users/brumsby/drill/plays.csv`; + + +------------+------------+ + | Year | Play | + +------------+------------+ + | 1599 | As You Like It | + | 1601 | Twelfth Night | + | 1594 | Comedy of Errors | + | 1595 | Romeo and Juliet | + | 1596 | The Merchant of Venice | + | 1610 | The Tempest | + | 1599 | Hamlet | + +------------+------------+ + 7 rows selected (0.113 seconds) + +You cannot refer to the aliases in subsequent clauses of the query. Use the +original `columns[n]` syntax, as shown in the WHERE clause for the following +example: + + 0: jdbc:drill:zk=local> select columns[0] as `Year`, columns[1] as Play + from dfs.`/Users/brumsby/drill/plays.csv` where columns[0]>1599; + + +------------+------------+ + | Year | Play | + +------------+------------+ + | 1601 | Twelfth Night | + | 1610 | The Tempest | + +------------+------------+ + 2 rows selected (0.201 seconds) + +Note that the restriction with the use of aliases applies to queries against +all data sources. + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/query/query-fs/004-query-dir.md ---------------------------------------------------------------------- diff --git a/_docs/query/query-fs/004-query-dir.md b/_docs/query/query-fs/004-query-dir.md new file mode 100644 index 0000000..47abee9 --- /dev/null +++ b/_docs/query/query-fs/004-query-dir.md @@ -0,0 +1,90 @@ +--- +title: "Querying Directories" +parent: "Querying a File System" +--- +You can store multiple files in a directory and query them as if they were a +single entity. You do not have to explicitly join the files. The files must be +compatible, in the sense that they must have comparable data types and columns +in the same order. This type of query is not limited to text files; you can +also query directories of JSON files, for example. + +For example, assume that a `testdata` directory contains two files with the +same structure: `plays.csv` and `moreplays.csv`. The first file contains 7 +records and the second file contains 3 records. The following query returns +the "union" of the two files, ordered by the first column: + + 0: jdbc:drill:zk=local> select columns[0] as `Year`, columns[1] as Play + from dfs.`/Users/brumsby/drill/testdata` order by 1; + + +------------+------------+ + | Year | Play | + +------------+------------+ + | 1594 | Comedy of Errors | + | 1595 | Romeo and Juliet | + | 1596 | The Merchant of Venice | + | 1599 | As You Like It | + | 1599 | Hamlet | + | 1601 | Twelfth Night | + | 1606 | Macbeth | + | 1606 | King Lear | + | 1609 | The Winter's Tale | + | 1610 | The Tempest | + +------------+------------+ + 10 rows selected (0.296 seconds) + +You can drill down further and automatically query subdirectories as well. For +example, assume that you have a logs directory that contains a subdirectory +for each year and subdirectories for each month (1 through 12). The month +directories contain JSON files. + + [root@ip-172-16-1-200 logs]# pwd + /mapr/drilldemo/labs/clicks/logs + [root@ip-172-16-1-200 logs]# ls + 2012 2013 2014 + [root@ip-172-16-1-200 logs]# cd 2013 + [root@ip-172-16-1-200 2013]# ls + 1 10 11 12 2 3 4 5 6 7 8 9 + +You can query all of these files, or a subset, by referencing the file system +once in a Drill query. For example, the following query counts the number of +records in all of the files inside the `2013` directory: + + 0: jdbc:drill:> select count(*) from MFS.`/mapr/drilldemo/labs/clicks/logs/2013` ; + +------------+ + | EXPR$0 | + +------------+ + | 24000 | + +------------+ + 1 row selected (2.607 seconds) + +You can also use variables `dir0`, `dir1`, and so on, to refer to +subdirectories in your workspace path. For example, assume that `bob.logdata` +is a workspace that points to the `logs` directory, which contains multiple +subdirectories: `2012`, `2013`, and `2014`. The following query constrains +files inside the subdirectory named `2013`. The variable `dir0` refers to the +first level down from logs, `dir1` to the next level, and so on. + + 0: jdbc:drill:> use bob.logdata; + +------------+------------+ + | ok | summary | + +------------+------------+ + | true | Default schema changed to 'bob.logdata' | + +------------+------------+ + 1 row selected (0.305 seconds) + + 0: jdbc:drill:> select * from logs where dir0='2013' limit 10; + +------------+------------+------------+------------+------------+------------+------------+------------+------------+-------------+ + | dir0 | dir1 | trans_id | date | time | cust_id | device | state | camp_id | keywords | + +------------+------------+------------+------------+------------+------------+------------+------------+------------+-------------+ + | 2013 | 2 | 12115 | 02/23/2013 | 19:48:24 | 3 | IOS5 | az | 5 | who's | + | 2013 | 2 | 12127 | 02/26/2013 | 19:42:03 | 11459 | IOS5 | wa | 10 | for | + | 2013 | 2 | 12138 | 02/09/2013 | 05:49:01 | 1 | IOS6 | ca | 7 | minutes | + | 2013 | 2 | 12139 | 02/23/2013 | 06:58:20 | 1 | AOS4.4 | ms | 7 | i | + | 2013 | 2 | 12145 | 02/10/2013 | 10:14:56 | 10 | IOS5 | mi | 6 | wrong | + | 2013 | 2 | 12157 | 02/15/2013 | 02:49:22 | 102 | IOS5 | ny | 5 | want | + | 2013 | 2 | 12176 | 02/19/2013 | 08:39:02 | 28 | IOS5 | or | 0 | and | + | 2013 | 2 | 12194 | 02/24/2013 | 08:26:17 | 125445 | IOS5 | ar | 0 | say | + | 2013 | 2 | 12236 | 02/05/2013 | 01:40:05 | 10 | IOS5 | nj | 2 | sir | + | 2013 | 2 | 12249 | 02/03/2013 | 04:45:47 | 21725 | IOS5 | nj | 5 | no | + +------------+------------+------------+------------+------------+------------+------------+------------+------------+-------------+ + 10 rows selected (0.583 seconds) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/rn/001-0.5.0rn.md ---------------------------------------------------------------------- diff --git a/_docs/rn/001-0.5.0rn.md b/_docs/rn/001-0.5.0rn.md new file mode 100644 index 0000000..2dff978 --- /dev/null +++ b/_docs/rn/001-0.5.0rn.md @@ -0,0 +1,29 @@ +--- +title: "Apache Drill 0.5.0 Release Notes" +parent: "Release Notes" +--- + +Apache Drill 0.5.0, the first beta release for Drill, is designed to help +enthusiasts start working and experimenting with Drill. It also continues the +Drill monthly release cycle as we drive towards general availability. + +The 0.5.0 release is primarily a bug fix release, with [more than 100 JIRAs](h +ttps://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&versi +on=12324880) closed, but there are some notable features. For information +about the features, see the [Apache Drill Blog for the 0.5.0 +release](https://blogs.apache.org/drill/entry/apache_drill_beta_release_see). + +This release is available as [binary](http://www.apache.org/dyn/closer.cgi/inc +ubator/drill/drill-0.5.0-incubating/apache-drill-0.5.0-incubating.tar.gz) and +[source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.5.0-incu +bating/apache-drill-0.5.0-incubating-src.tar.gz) tarballs that are compiled +against Apache Hadoop. Drill has been tested against MapR, Cloudera, and +Hortonworks Hadoop distributions. There are associated build profiles and +JIRAs that can help you run Drill against your preferred distribution. + +Apache Drill 0.5.0 Key Notes and Limitations + + * The current release supports in memory and beyond memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. + * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Others operations, such as streaming aggregate, may have partial support that leads to unexpected results. + * There are known issues with joining text files without using an intervening view. See [DRILL-1401](https://issues.apache.org/jira/browse/DRILL-1401) for more information. + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/rn/002-0.4.0rn.md ---------------------------------------------------------------------- diff --git a/_docs/rn/002-0.4.0rn.md b/_docs/rn/002-0.4.0rn.md new file mode 100644 index 0000000..d0dac30 --- /dev/null +++ b/_docs/rn/002-0.4.0rn.md @@ -0,0 +1,42 @@ +--- +title: "Apache Drill 0.4.0 Release Notes" +parent: "Release Notes" +--- +The 0.4.0 release is a developer preview release, designed to help enthusiasts +start to work with and experiment with Drill. It is the first Drill release +that provides distributed query execution. + +This release is built upon [more than 800 +JIRAs](https://issues.apache.org/jira/browse/DRILL/fixforversion/12324963/). +It is a pre-beta release on the way towards Drill. As a developer snapshot, +the release contains a large number of outstanding bugs that will make some +use cases challenging. Feel free to consult outstanding issues [targeted for +the 0.5.0 +release](https://issues.apache.org/jira/browse/DRILL/fixforversion/12324880/) +to see whether your use case is affected. + +To read more about this release and new features introduced, please view the +[0.4.0 announcement blog +entry](https://blogs.apache.org/drill/entry/announcing_apache_drill_0_4). + +The release is available as both [binary](http://www.apache.org/dyn/closer.cgi +/incubator/drill/drill-0.4.0-incubating/apache-drill-0.4.0-incubating.tar.gz) +and [source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.4.0- +incubating/apache-drill-0.4.0-incubating-src.tar.gz) tarballs. In both cases, +these are compiled against Apache Hadoop. Drill has also been tested against +MapR, Cloudera and Hortonworks Hadoop distributions and there are associated +build profiles or JIRAs that can help you run against your preferred +distribution. + +Some Key Notes & Limitations + + * The current release supports in memory and beyond memory execution. However, users must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. + * In many cases,merge join operations return incorrect results. + * Use of a local filter in a join âonâ clause when using left, right or full outer joins may result in incorrect results. + * Because of known memory leaks and memory overrun issues you may need more memory and you may need to restart the system in some cases. + * Some types of complex expressions, especially those involving empty arrays may fail or return incorrect results. + * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior (such as Sort). Others operations (such as streaming aggregate) may have partial support that leads to unexpected results. + * Protobuf, UDF, query plan interfaces and all interfaces are subject to change in incompatible ways. + * Multiplication of some types of DECIMAL(28+,*) will return incorrect result. + + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/rn/003-alpha-rn.md ---------------------------------------------------------------------- diff --git a/_docs/rn/003-alpha-rn.md b/_docs/rn/003-alpha-rn.md new file mode 100644 index 0000000..256300a --- /dev/null +++ b/_docs/rn/003-alpha-rn.md @@ -0,0 +1,39 @@ +--- +title: "Apache Drill M1 Release Notes (Apache Drill Alpha)" +parent: "Release Notes" +--- +### Milestone 1 Goals + +The first release of Apache Drill is designed as a technology preview for +people to better understand the architecture and vision. It is a functional +release tying to piece together the key components of a next generation MPP +query engine. It is designed to allow milestone 2 (M2) to focus on +architectural analysis and performance optimization. + + * Provide a new optimistic DAG execution engine for data analysis + * Build a new columnar shredded in-memory format and execution model that minimizes data serialization/deserialization costs and operator complexity + * Provide a model for runtime generated functions and relational operators that minimizes complexity and maximizes performance + * Support queries against columnar on disk format (Parquet) and JSON + * Support the most common set of standard SQL read-only phrases using ANSI standards. Includes: SELECT, FROM, WHERE, HAVING, ORDER, GROUP BY, IN, DISTINCT, LEFT JOIN, RIGHT JOIN, INNER JOIN + * Support schema-on-read querying and execution + * Build a set of columnar operation primitives including Merge Join, Sort, Streaming Aggregate, Filter, Selection Vector removal. + * Support unlimited level of subqueries and correlated subqueries + * Provided an extensible query-language agnostic JSON-base logical data flow syntax. + * Support complex data type manipulation via logical plan operations + +### Known Issues + +SQL Parsing +Because Apache Drill is built to support late-bound changing schemas while SQL +is statically typed, there are couple of special requirements that are +required writing SQL queries. These are limited to the current release and +will be correct in a future milestone release. + + * All tables are exposed as a single map field that contains + * Drill Alpha doesn't support implicit or explicit casts outside those required above. + * Drill Alpha does not include, there are currently a couple of differences for how to write a query in order to query against UDFs + * Drill currently supports simple and aggregate functions using scalar, repeated and + * Nested data support incomplete. Drill Alpha supports nested data structures as well repeated fields. + + + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/rn/004-0.6.0-rn.md ---------------------------------------------------------------------- diff --git a/_docs/rn/004-0.6.0-rn.md b/_docs/rn/004-0.6.0-rn.md new file mode 100644 index 0000000..f121ebe --- /dev/null +++ b/_docs/rn/004-0.6.0-rn.md @@ -0,0 +1,32 @@ +--- +title: "Apache Drill 0.6.0 Release Notes (Apache Drill Alpha)" +parent: "Release Notes" +--- +Apache Drill 0.6.0, the second beta release for Drill, is designed to help +enthusiasts start working and experimenting with Drill. It also continues the +Drill monthly release cycle as we drive towards general availability. + +This release is available as [binary](http://www.apache.org/dyn/closer.cgi/inc +ubator/drill/drill-0.5.0-incubating/apache-drill-0.5.0-incubating.tar.gz) and +[source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.5.0-incu +bating/apache-drill-0.5.0-incubating-src.tar.gz) tarballs that are compiled +against Apache Hadoop. Drill has been tested against MapR, Cloudera, and +Hortonworks Hadoop distributions. There are associated build profiles and +JIRAs that can help you run Drill against your preferred distribution. + +Apache Drill 0.6.0 Key Features + +This release is primarily a bug fix release, with [more than 30 JIRAs closed]( +https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&vers +ion=12327472), but there are some notable features: + + * Direct ANSI SQL access to MongoDB, using the latest [MongoDB Plugin for Apache Drill](/drill/docs/mongodb-plugin-for-apache-drill) + * Filesystem query performance improvements with partition pruning + * Ability to use the file system as a persistent store for query profiles and diagnostic information + * Window function support (alpha) + +Apache Drill 0.6.0 Key Notes and Limitations + + * The current release supports in-memory and beyond-memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. + * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Other operations, such as streaming aggregate, may have partial support that leads to unexpected results. + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/rn/005-0.7.0-rn.md ---------------------------------------------------------------------- diff --git a/_docs/rn/005-0.7.0-rn.md b/_docs/rn/005-0.7.0-rn.md new file mode 100644 index 0000000..e25a58a --- /dev/null +++ b/_docs/rn/005-0.7.0-rn.md @@ -0,0 +1,56 @@ +--- +title: "Apache Drill 0.7.0 Release Notes (Apache Drill Alpha)" +parent: "Release Notes" +--- +Apache Drill 0.7.0, the third beta release for Drill, is designed to help +enthusiasts start working and experimenting with Drill. It also continues the +Drill monthly release cycle as we drive towards general availability. + +This release is available as +[binary](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache- +drill-0.7.0.tar.gz) and +[source](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache- +drill-0.7.0-src.tar.gz) tarballs that are compiled against Apache Hadoop. +Drill has been tested against MapR, Cloudera, and Hortonworks Hadoop +distributions. There are associated build profiles and JIRAs that can help you +run Drill against your preferred distribution + +Apache Drill 0.7.0 Key Features + + * No more dependency on UDP/Multicast - Making it possible for Drill to work well in the following scenarios: + + * UDP multicast not enabled (as in EC2) + + * Cluster spans multiple subnets + + * Cluster has multihome configuration + + * New functions to natively work with nested data - KVGen and Flatten + + * Support for Hive 0.13 (Hive 0.12 with Drill is not supported any more) + + * Improved performance when querying Hive tables and File system through partition pruning + + * Improved performance for HBase with LIKE operator pushdown + + * Improved memory management + + * Drill web UI monitoring and query profile improvements + + * Ability to parse files without explicit extensions using default storage format specification + + * Fixes for dealing with complex/nested data objects in Parquet/JSON + + * Fast schema return - Improved experience working with BI/query tools by returning metadata quickly + + * Several hang related fixes + + * Parquet writer fixes for handling large datasets + + * Stability improvements in ODBC and JDBC drivers + +Apache Drill 0.7.0 Key Notes and Limitations + + * The current release supports in-memory and beyond-memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. + * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Other operations, such as streaming aggregate, may have partial support that leads to unexpected results. + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/001-data-types.md ---------------------------------------------------------------------- diff --git a/_docs/sql-ref/001-data-types.md b/_docs/sql-ref/001-data-types.md new file mode 100644 index 0000000..e425033 --- /dev/null +++ b/_docs/sql-ref/001-data-types.md @@ -0,0 +1,77 @@ +--- +title: "Data Types" +parent: "SQL Reference" +--- +You can use the following SQL data types in your Drill queries: + + +#### Character + + * VARCHAR/CHAR + +#### Date/Time + + * DATE + * INTERVAL + * Interval Year (stores year and month) + * Interval Day (stores day, hour, minute, seconds, and milliseconds) + * TIME + * TIMESTAMP + +Refer to [Supported Date/Time Data Type formats](/drill/docs/supported-date-time-data-type-formats/). + +#### Integer + + * BIGINT + * INT + * SMALLINT + +#### Numeric + + * DECIMAL + * FLOAT + * DOUBLE PRECISION (FLOAT 8) + * REAL (FLOAT 4) + +#### Boolean + +Values are FALSE or TRUE. + +## Complex Data Types + +Drill provides map and array data types to work with complex and nested data +structures. For analysis of complex data, a more modern JSON-style approach to +writing queries is more effective than using standard SQL functions. + +The following table provides descriptions and examples of the complex data +types: + +<table><tbody> + <tr><th>Data Type</th> + <th>Description</th> + <th>Example</th></tr> + <tr> + <td valign="top">Map</td> + <td valign="top">A map is a set of name/value pairs. </br> + A value in an map can be a scalar type, </br> + such as string or int, or a map can be a </br> + complex type, such as an array or another map.</td> + <td valign="top">Map with scalar type values:</br><code> "phoneNumber": { "areaCode": "622", "number": "1567845"}</code></br>Map with complex type value:<code></br> { "citiesLived" : [ { "place" : "Los Angeles",</br> + "yearsLived" : [ "1989",</br> + "1993",</br> + "1998",</br> + "2002"</br> + ]</br> + + } ] }</code></td> + </tr> + <tr> + <td valign="top">Array</td> + <td valign="top">An array is a repeated list of values. </br> + A value in an array can be a scalar type, </br> + such as string or int, or an array can be a</br> + complex type, such as a map or another array.</td> + <td valign="top">Array with scalar values:</br><code> "yearsLived": ["1990", "1993", "1998", "2008"]</code></br>Array with complex type values:</br><code> "children":</br> [ { "age" : "10", </br> "gender" : "Male",</br> "name" : "Earl"</br> }, </br> { "age" : "6",</br> "gender" : "Male",</br> "name" : "Sam"</br> },</br> { "age" : "8",</br> "gender" : "Male", </br> "name" : "Kit" </br>   ;}</br> ]</code></td> + </tr> + </tbody></table> + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/002-operators.md ---------------------------------------------------------------------- diff --git a/_docs/sql-ref/002-operators.md b/_docs/sql-ref/002-operators.md new file mode 100644 index 0000000..79afc7d --- /dev/null +++ b/_docs/sql-ref/002-operators.md @@ -0,0 +1,70 @@ +--- +title: "Operators" +parent: "SQL Reference" +--- +You can use various types of operators in your Drill queries to perform +operations on your data. + +## Logical Operators + +You can use the following logical operators in your Drill queries: + + * AND + * BETWEEN + * IN + * LIKE + * NOT + * OR + +## Comparison Operators + +You can use the following comparison operators in your Drill queries: + + * < + * \> + * <= + * \>= + * = + * <> + * IS NULL + * IS NOT NULL + * IS FALSE + * IS NOT FALSE + * IS TRUE + * IS NOT TRUE + +## Pattern Matching Operators + +You can use the following pattern matching operators in your Drill queries: + + * LIKE + * NOT LIKE + * SIMILAR TO + * NOT SIMILAR TO + +## Math Operators + +You can use the following math operators in your Drill queries: + +**Operator**| **Description** +---|--- ++| Addition +-| Subtraction +*| Multiplication +/| Division + +## Subquery Operators + +You can use the following subquery operators in your Drill queries: + + * EXISTS + * IN + +See [SELECT Statements](/drill/docs/select-statements). + +## String Operators + +You can use the following string operators in your Drill queries: + + * string || string + * string || non-string or non-string || string \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/003-functions.md ---------------------------------------------------------------------- diff --git a/_docs/sql-ref/003-functions.md b/_docs/sql-ref/003-functions.md new file mode 100644 index 0000000..4769257 --- /dev/null +++ b/_docs/sql-ref/003-functions.md @@ -0,0 +1,185 @@ +--- +title: "SQL Functions" +parent: "SQL Reference" +--- +You can use the following types of functions in your Drill queries: + + * Scalar Functions + * Aggregate Functions + * Aggregate Statistics Functions + * Convert Functions + * Nested Data Functions + +## Scalar Functions + +### Math + +You can use the following scalar math functions in your Drill queries: + + * ABS + * CEIL + * CEILING + * DIV + * FLOOR + * MOD + * POWER + * RANDOM + * ROUND + * SIGN + * SQRT + * TRUNC + +### String Functions + +The following table provides the string functions that you can use in your +Drill queries: + +Function| Return Type +--------|--- +char_length(string) or character_length(string)| int +concat(str "any" [, str "any" [, ...] ])| text +convert_from(string bytea, src_encoding name)| text +convert_to(string text, dest_encoding name)| bytea +initcap(string)| text +left(str text, n int)| text +length(string)| int +length(string bytes, encoding name )| int +lower(string)| text +lpad(string text, length int [, fill text])| text +ltrim(string text [, characters text])| text +position(substring in string)| int +regexp_replace(string text, pattern text, replacement text [, flags text])|text +replace(string text, from text, to text)| text +right(str text, n int)| text +rpad(string text, length int [, fill text])| text +rtrim(string text [, characters text])| text +strpos(string, substring)| int +substr(string, from [, count])| text +substring(string [from int] [for int])| text +trim([leading | trailing | both] [characters] from string)| text +upper(string)| text + + +### Date/Time Functions + +The following table provides the date/time functions that you can use in your +Drill queries: + +**Function**| **Return Type** +---|--- +current_date| date +current_time| time with time zone +current_timestamp| timestamp with time zone +date_add(date,interval expr type)| date/datetime +date_part(text, timestamp)| double precision +date_part(text, interval)| double precision +date_sub(date,INTERVAL expr type)| date/datetime +extract(field from interval)| double precision +extract(field from timestamp)| double precision +localtime| time +localtimestamp| timestamp +now()| timestamp with time zone +timeofday()| text + +### Data Type Formatting Functions + +The following table provides the data type formatting functions that you can +use in your Drill queries: + +**Function**| **Return Type** +---|--- +to_char(timestamp, text)| text +to_char(int, text)| text +to_char(double precision, text)| text +to_char(numeric, text)| text +to_date(text, text)| date +to_number(text, text)| numeric +to_timestamp(text, text)| timestamp with time zone +to_timestamp(double precision)| timestamp with time zone + +## Aggregate Functions + +The following table provides the aggregate functions that you can use in your +Drill queries: + +**Function** | **Argument Type** | **Return Type** + -------- | ------------- | ----------- +avg(expression)| smallint, int, bigint, real, double precision, numeric, or interval| numeric for any integer-type argument, double precision for a floating-point argument, otherwise the same as the argument data type +count(*)| _-_| bigint +count([DISTINCT] expression)| any| bigint +max(expression)| any array, numeric, string, or date/time type| same as argument type +min(expression)| any array, numeric, string, or date/time type| same as argument type +sum(expression)| smallint, int, bigint, real, double precision, numeric, or interval| bigint for smallint or int arguments, numeric for bigint arguments, double precision for floating-point arguments, otherwise the same as the argument data type + + +## Aggregate Statistics Functions + +The following table provides the aggregate statistics functions that you can use in your Drill queries: + +**Function**| **Argument Type**| **Return Type** + -------- | ------------- | ----------- +stddev(expression)| smallint, int, bigint, real, double precision, or numeric| double precision for floating-point arguments, otherwise numeric +stddev_pop(expression)| smallint, int, bigint, real, double precision, or numeric| double precision for floating-point arguments, otherwise numeric +stddev_samp(expression)| smallint, int, bigint, real, double precision, or numeric| double precision for floating-point arguments, otherwise numeric +variance(expression)| smallint, int, bigint, real, double precision, or numeric| double precision for floating-point arguments, otherwise numeric +var_pop(expression)| smallint, int, bigint, real, double precision, or numeric| double precision for floating-point arguments, otherwise numeric +var_samp(expression)| smallint, int, bigint, real, double precision, or numeric| double precision for floating-point arguments, otherwise numeric + + +## Convert Functions + +You can use the CONVERT_TO and CONVERT_FROM functions to encode and decode +data when you query your data sources with Drill. For example, HBase stores +data as encoded byte arrays (VARBINARY data). When you issue a query with the +CONVERT_FROM function on HBase, Drill decodes the data and converts it to the +specified data type. In instances where Drill sends data back to HBase during +a query, you can use the CONVERT_TO function to change the data type to bytes. + +Although you can achieve the same results by using the CAST function for some +data types (such as VARBINARY to VARCHAR conversions), in general it is more +efficient to use CONVERT functions when your data sources return binary data. +When your data sources return more conventional data types, you can use the +CAST function. + +The following table provides the data types that you use with the CONVERT_TO +and CONVERT_FROM functions: + +**Type**| **Input Type**| **Output Type** +---|---|--- +BOOLEAN_BYTE| bytes(1)| boolean +TINYINT_BE| bytes(1)| tinyint +TINYINT| bytes(1)| tinyint +SMALLINT_BE| bytes(2)| smallint +SMALLINT| bytes(2)| smallint +INT_BE| bytes(4)| int +INT| bytes(4)| int +BIGINT_BE| bytes(8)| bigint +BIGINT| bytes(8)| bigint +FLOAT| bytes(4)| float (float4) +DOUBLE| bytes(8)| double (float8) +INT_HADOOPV| bytes(1-9)| int +BIGINT_HADOOPV| bytes(1-9)| bigint +DATE_EPOCH_BE| bytes(8)| date +DATE_EPOCH| bytes(8)| date +TIME_EPOCH_BE| bytes(8)| time +TIME_EPOCH| bytes(8)| time +UTF8| bytes| varchar +UTF16| bytes| var16char +UINT8| bytes(8)| uint8 + +A common use case for CONVERT_FROM is when a data source embeds complex data +inside a column. For example, you may have an HBase or MapR-DB table with +embedded JSON data: + + select CONVERT_FROM(col1, 'JSON') + FROM hbase.table1 + ... + +## Nested Data Functions + +This section contains descriptions of SQL functions that you can use to +analyze nested data: + + * [FLATTEN Function](/drill/docs/flatten-function) + * [KVGEN Function](/drill/docs/kvgen-function) + * [REPEATED_COUNT Function](/drill/docs/repeated-count-function) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/004-nest-functions.md ---------------------------------------------------------------------- diff --git a/_docs/sql-ref/004-nest-functions.md b/_docs/sql-ref/004-nest-functions.md new file mode 100644 index 0000000..09fe91e --- /dev/null +++ b/_docs/sql-ref/004-nest-functions.md @@ -0,0 +1,10 @@ +--- +title: "Nested Data Functions" +parent: "SQL Reference" +--- +This section contains descriptions of SQL functions that you can use to +analyze nested data: + + * [FLATTEN Function](/drill/docs/flatten-function) + * [KVGEN Function](/drill/docs/kvgen-function) + * [REPEATED_COUNT Function](/drill/docs/repeated-count-function) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/005-cmd-summary.md ---------------------------------------------------------------------- diff --git a/_docs/sql-ref/005-cmd-summary.md b/_docs/sql-ref/005-cmd-summary.md new file mode 100644 index 0000000..13d9515 --- /dev/null +++ b/_docs/sql-ref/005-cmd-summary.md @@ -0,0 +1,9 @@ +--- +title: "SQL Commands Summary" +parent: "SQL Reference" +--- +The following table provides a list of the SQL commands that Drill supports, +with their descriptions and example syntax: + +<table ><tbody><tr><th >Command</th><th >Description</th><th >Syntax</th></tr><tr><td valign="top" >ALTER SESSION</td><td valign="top" >Changes a system setting for the duration of a session. A session ends when you quit the Drill shell. For a list of Drill options and their descriptions, refer to <a href="/drill/docs/planning-and-execution-options" rel="nofollow">Planning and Execution Options</a>.</td><td valign="top" ><code>ALTER SESSION SET `option_name`='string';<br />ALTER SESSION SET `option_name`=TRUE | FALSE;</code></td></tr><tr><td valign="top" >ALTER SYSTEM</td><td valign="top" >Permanently changes a system setting. The new settings persist across all sessions. For a list of Drill options and their descriptions, refer to <a href="/drill/docs/planning-and-execution-options/" rel="nofollow">Planning and Execution Options</a>.</td><td valign="top" ><code>ALTER SYSTEM `option_name`='string'<br />ALTER SYSTEM `option_name`=TRUE | FALSE;</code></td></tr><tr><td valign="top" ><a href="/drill/docs/create-table-as-ctas-command">CREATE TABLE AS<br />(CTAS)</a></p></td><td valign="top" >Creates a new table and populates the new table with rows returned from a SELECT query. Use the CREATE TABLE AS (CTAS) statement in place of INSERT INTO. When you issue the CTAS command, you create a directory that contains parquet or CSV files. Each workspace in a file system has a default file type.<p>You can specify which writer you want Drill to use when creating a table: parquet, CSV, or JSON (as specified with <span style="line-height: 1.4285715;">the </span><code>store.format</code><span style="line-height: 1.4285715;"> option<span><span>).</span></span></span></p></td><td valign="top" ><code>CREATE TABLE new_table_name AS <query>;</code></td></tr><tr><td valign="top" colspan="1" >CREATE VIEW</td><td valign="top" colspan="1" >Creates a new view based on the results of a SELECT query.</td><td valign="top" colspan="1" ><code>CREATE VIEW view_name [(column_list)] AS & lt;query>;</code></td></tr><tr><td valign="top" colspan="1" >DROP VIEW</td><td valign="top" colspan="1" >Removes one or more views.</td><td valign="top" colspan="1" ><code>DROP VIEW view_name [, <em class="replaceable">view_name</em>] ...; </code></td></tr><tr><td valign="top" colspan="1" ><a href="/drill/docs/explain-commands" rel="nofollow">EXPLAIN PLAN FOR</a></td><td valign="top" colspan="1" >Returns the physical plan for a particular query.</td><td valign="top" colspan="1" ><code>EXPLAIN PLAN FOR <query>;</code></td></tr><tr><td valign="top" colspan="1" ><a href="/drill/docs/explain-commands/" rel="nofollow">EXPLAIN PLAN WITHOUT IMPLEMENTATION FOR</a></td><td valign="top" colspan="1" >Returns the logical plan for a particular query.</td><td valign="top" colspan="1" ><code>EXPLAIN PLAN WITHOUT IMPLEMENTATION FOR <query>;</code></td></tr><tr><td valign="top" colspan="1" ><a href="/drill/docs/select-statements" rel="nofollow">SELECT</a></td><td valign="top" cols pan="1" >Retrieves data from tables and files.</td><td valign="top" colspan="1" ><code>[WITH subquery]<br />SELECT column_list FROM table_name <br />[WHERE clause]<br />[GROUP BY clause]<br />[HAVING clause]<br />[ORDER BY clause];</code></td></tr><tr><td valign="top" colspan="1" >SHOW DATABASES</td><td valign="top" colspan="1" >Returns a list of available schemas. Equivalent to SHOW SCHEMAS.</td><td valign="top" colspan="1" ><code>SHOW DATABASES;</code></td></tr><tr><td valign="top" colspan="1" ><a href="/drill/docs/show-files-command/" rel="nofollow">SHOW FILES</a></td><td valign="top" colspan="1" >Returns a list of files in a file system schema.</td><td valign="top" colspan="1" ><code>SHOW FILES IN filesystem.`schema_name`;<br />SHOW FILES FROM filesystem.`schema_name`;</code></td></tr><tr><td valign="top" colspan="1" >SHOW SCHEMAS</td><td valign="top" colspan="1" >Returns a list of available schemas. Equivalent to SHOW DATABASES.</td><td valign="top" colspan="1" ><code>SHOW SCHE MAS;</code></td></tr><tr><td valign="top" colspan="1" >SHOW TABLES</td><td valign="top" colspan="1" >Returns a list of tables for all schemas. Optionally, you can first issue the <code>USE </code>command to identify the schema for which you want to view tables.<br />For example, the following <code>USE</code> statement tells Drill that you only want information from the <code>hive.default</code> schema:<br /><code>USE hive.`default`;</code></td><td valign="top" colspan="1" ><code>SHOW TABLES;</code></td></tr><tr><td valign="top" colspan="1" >USE</td><td valign="top" colspan="1" >Change to a particular schema. When you opt to use a particular schema, Drill issues queries on that schema only.</td><td valign="top" colspan="1" ><code>USE schema_name;</code></td></tr></tbody></table> + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/006-reserved-wds.md ---------------------------------------------------------------------- diff --git a/_docs/sql-ref/006-reserved-wds.md b/_docs/sql-ref/006-reserved-wds.md new file mode 100644 index 0000000..a19f73c --- /dev/null +++ b/_docs/sql-ref/006-reserved-wds.md @@ -0,0 +1,16 @@ +--- +title: "Reserved Keywords" +parent: "SQL Reference" +--- +When you use a reserved keyword in a Drill query, enclose the word in +backticks. For example, if you issue the following query to Drill, +you must include backticks around the word TABLES because TABLES is a reserved +keyword: + +``SELECT * FROM INFORMATION_SCHEMA.`TABLES`;`` + +The following table provides the Drill reserved keywords that require back +ticks: + +<table ><tbody><tr><td valign="top" ><h1 id="ReservedKeywords-A">A</h1><p>ABS<br />ALL<br />ALLOCATE<br />ALLOW<br />ALTER<br />AND<br />ANY<br />ARE<br />ARRAY<br />AS<br />ASENSITIVE<br />ASYMMETRIC<br />AT<br />ATOMIC<br />AUTHORIZATION<br />AVG</p><h1 id="ReservedKeywords-B">B</h1><p>BEGIN<br />BETWEEN<br />BIGINT<br />BINARY<br />BIT<br />BLOB<br />BOOLEAN<br />BOTH<br />BY</p><h1 id="ReservedKeywords-C">C</h1><p>CALL<br />CALLED<br />CARDINALITY<br />CASCADED<br />CASE<br />CAST<br />CEIL<br />CEILING<br />CHAR<br />CHARACTER<br />CHARACTER_LENGTH<br />CHAR_LENGTH<br />CHECK<br />CLOB<br />CLOSE<br />COALESCE<br />COLLATE<br />COLLECT<br />COLUMN<br />COMMIT<br />CONDITION<br />CONNECT<br />CONSTRAINT<br />CONVERT<br />CORR<br />CORRESPONDING<br />COUNT<br />COVAR_POP<br />COVAR_SAMP<br />CREATE<br />CROSS<br />CUBE<br />CUME_DIST<br />CURRENT<br />CURRENT_CATALOG<br />CURRENT_DATE<br />CURRENT_DEFAULT_TRANSFORM_GROUP<br />CURRENT_PATH<br />CURRENT_ROLE<br />CURRENT_SCHEMA<br />CURRENT_TIME<br />CURRENT_TIMESTAMP<br />CURRENT_TRANSFORM_GROUP_FOR_TYPE<br />CURRENT_USER<br />CURSOR<br />CYCLE</p></td><td valign="top" ><h1 id="ReservedKeywords-D">D</h1><p>DATABASES<br />DATE<br />DAY<br />DEALLOCATE<br />DEC<br />DECIMAL<br />DECLARE<br />DEFAULT<br />DEFAULT_KW<br />DELETE<br />DENSE_RANK<br />DEREF<br />DESCRIBE<br />DETERMINISTIC<br />DISALLOW<br />DISCONNECT<br />DISTINCT<br />DOUBLE<br />DROP<br />DYNAMIC</p><h1 id="ReservedKeywords-E">E</h1><p>EACH<br />ELEMENT<br />ELSE<br />END<br />END_EXEC<br />ESCAPE<br />EVERY<br />EXCEPT<br />EXEC<br />EXECUTE<br />EXISTS<br />EXP<br />EXPLAIN<br />EXTERNAL<br />EXTRACT</p><h1 id="ReservedKeywords-F">F</h1><p>FALSE<br />FETCH<br />FILES<br />FILTER<br />FIRST_VALUE<br />FLOAT<br />FLOOR<br />FOR<br />FOREIGN<br />FREE<br />FROM<br />FULL<br />FUNCTION<br />FUSION</p><h1 id="ReservedKeywords-G">G</h1><p>GET<br />GLOBAL<br />GRANT<br />GROUP<br />GROUPING</p><h1 id="ReservedKeywords-H">H</h1><p>HAVING<br />HOLD<b r />HOUR</p></td><td valign="top" ><h1 id="ReservedKeywords-I">I</h1><p>IDENTITY<br />IMPORT<br />IN<br />INDICATOR<br />INNER<br />INOUT<br />INSENSITIVE<br />INSERT<br />INT<br />INTEGER<br />INTERSECT<br />INTERSECTION<br />INTERVAL<br />INTO<br />IS</p><h1 id="ReservedKeywords-J">J</h1><p>JOIN</p><h1 id="ReservedKeywords-L">L</h1><p>LANGUAGE<br />LARGE<br />LAST_VALUE<br />LATERAL<br />LEADING<br />LEFT<br />LIKE<br />LIMIT<br />LN<br />LOCAL<br />LOCALTIME<br />LOCALTIMESTAMP<br />LOWER</p><h1 id="ReservedKeywords-M">M</h1><p>MATCH<br />MAX<br />MEMBER<br />MERGE<br />METHOD<br />MIN<br />MINUTE<br />MOD<br />MODIFIES<br />MODULE<br />MONTH<br />MULTISET</p><h1 id="ReservedKeywords-N">N</h1><p>NATIONAL<br />NATURAL<br />NCHAR<br />NCLOB<br />NEW<br />NO<br />NONE<br />NORMALIZE<br />NOT<br />NULL<br />NULLIF<br />NUMERIC</p><h1 id="ReservedKeywords-O">O</h1><p>OCTET_LENGTH<br />OF<br />OFFSET<br />OLD<br />ON<br />ONLY<br />OPEN<br />OR<br />ORDER<br />OUT<br />OUTER<br />OVER< br />OVERLAPS<br />OVERLAY</p></td><td valign="top" colspan="1" ><h1 id="ReservedKeywords-P">P</h1><p>PARAMETER<br />PARTITION<br />PERCENTILE_CONT<br />PERCENTILE_DISC<br />PERCENT_RANK<br />POSITION<br />POWER<br />PRECISION<br />PREPARE<br />PRIMARY<br />PROCEDURE</p><h1 id="ReservedKeywords-R">R</h1><p>RANGE<br />RANK<br />READS<br />REAL<br />RECURSIVE<br />REF<br />REFERENCES<br />REFERENCING<br />REGR_AVGX<br />REGR_AVGY<br />REGR_COUNT<br />REGR_INTERCEPT<br />REGR_R2<br />REGR_SLOPE<br />REGR_SXX<br />REGR_SXY<br />RELEASE<br />REPLACE<br />RESULT<br />RETURN<br />RETURNS<br />REVOKE<br />RIGHT<br />ROLLBACK<br />ROLLUP<br />ROW<br />ROWS<br />ROW_NUMBER</p><h1 id="ReservedKeywords-S">S</h1><p>SAVEPOINT<br />SCHEMAS<br />SCOPE<br />SCROLL<br />SEARCH<br />SECOND<br />SELECT<br />SENSITIVE<br />SESSION_USER<br />SET<br />SHOW<br />SIMILAR<br />SMALLINT<br />SOME<br />SPECIFIC<br />SPECIFICTYPE<br />SQL<br />SQLEXCEPTION<br />SQLSTATE<br />SQLWARNING<br />SQRT<br />START<br / >STATIC<br />STDDEV_POP<br />STDDEV_SAMP<br />SUBMULTISET<br />SUBSTRING<br >/>SUM<br />SYMMETRIC<br />SYSTEM<br />SYSTEM_USER</p></td><td valign="top" >colspan="1" ><h1 id="ReservedKeywords-T">T</h1><p>TABLE<br />TABLES<br >/>TABLESAMPLE<br />THEN<br />TIME<br />TIMESTAMP<br />TIMEZONE_HOUR<br >/>TIMEZONE_MINUTE<br />TINYINT<br />TO<br />TRAILING<br />TRANSLATE<br >/>TRANSLATION<br />TREAT<br />TRIGGER<br />TRIM<br />TRUE</p><h1 >id="ReservedKeywords-U">U</h1><p>UESCAPE<br />UNION<br />UNIQUE<br >/>UNKNOWN<br />UNNEST<br />UPDATE<br />UPPER<br />USE<br />USER<br >/>USING</p><h1 id="ReservedKeywords-V">V</h1><p>VALUE<br />VALUES<br >/>VARBINARY<br />VARCHAR<br />VARYING<br />VAR_POP<br />VAR_SAMP</p><h1 >id="ReservedKeywords-W">W</h1><p>WHEN<br />WHENEVER<br />WHERE<br >/>WIDTH_BUCKET<br />WINDOW<br />WITH<br />WITHIN<br />WITHOUT</p><h1 >id="ReservedKeywords-Y">Y</h1><p>YEAR</p></td></tr></tbody></table> + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/cmd-summary/001-create-table-as.md ---------------------------------------------------------------------- diff --git a/_docs/sql-ref/cmd-summary/001-create-table-as.md b/_docs/sql-ref/cmd-summary/001-create-table-as.md new file mode 100644 index 0000000..573c295 --- /dev/null +++ b/_docs/sql-ref/cmd-summary/001-create-table-as.md @@ -0,0 +1,134 @@ +--- +title: "CREATE TABLE AS (CTAS) command" +parent: "SQL Commands Summary" +--- +You can create tables in Drill by using the CTAS command: + + CREATE TABLE new_table_name AS <query>; + +where query is any valid Drill query. Each table you create must have a unique +name. You can include an optional column list for the new table. For example: + + create table logtable(transid, prodid) as select transaction_id, product_id from ... + +You can store table data in one of three formats: + + * csv + * parquet + * json + +The parquet and json formats can be used to store complex data. + +To set the output format for a Drill table, set the `store.format` option with +the ALTER SYSTEM or ALTER SESSION command. For example: + + alter session set `store.format`='json'; + +Table data is stored in the location specified by the workspace that is in use +when you run the CTAS statement. By default, a directory is created, using the +exact table name specified in the CTAS statement. A .json, .csv, or .parquet +file inside that directory contains the data. + +You can only create new tables in workspaces. You cannot create tables in +other storage plugins such as Hive and HBase. + +You must use a writable (mutable) workspace when creating Drill tables. For +example: + + "tmp": { + "location": "/tmp", + "writable": true, + } + +## Example + +The following query returns one row from a JSON file: + + 0: jdbc:drill:zk=local> select id, type, name, ppu + from dfs.`/Users/brumsby/drill/donuts.json`; + +------------+------------+------------+------------+ + | id | type | name | ppu | + +------------+------------+------------+------------+ + | 0001 | donut | Cake | 0.55 | + +------------+------------+------------+------------+ + 1 row selected (0.248 seconds) + +To create and verify the contents of a table that contains this row: + + 1. Set the workspace to a writable workspace. + 2. Set the `store.format` option appropriately. + 3. Run a CTAS statement that contains the query. + 4. Go to the directory where the table is stored and check the contents of the file. + 5. Run a query against the new table. + +The following sqlline output captures this sequence of steps. + +### Workspace Definition + + "tmp": { + "location": "/tmp", + "writable": true, + } + +### ALTER SESSION Command + + alter session set `store.format`='json'; + +### USE Command + + 0: jdbc:drill:zk=local> use dfs.tmp; + +------------+------------+ + | ok | summary | + +------------+------------+ + | true | Default schema changed to 'dfs.tmp' | + +------------+------------+ + 1 row selected (0.03 seconds) + +### CTAS Command + + 0: jdbc:drill:zk=local> create table donuts_json as + select id, type, name, ppu from dfs.`/Users/brumsby/drill/donuts.json`; + +------------+---------------------------+ + | Fragment | Number of records written | + +------------+---------------------------+ + | 0_0 | 1 | + +------------+---------------------------+ + 1 row selected (0.107 seconds) + +### File Contents + + administorsmbp7:tmp brumsby$ pwd + /tmp + administorsmbp7:tmp brumsby$ cd donuts_json + administorsmbp7:donuts_json brumsby$ more 0_0_0.json + { + "id" : "0001", + "type" : "donut", + "name" : "Cake", + "ppu" : 0.55 + } + +### Query Against New Table + + 0: jdbc:drill:zk=local> select * from donuts_json; + +------------+------------+------------+------------+ + | id | type | name | ppu | + +------------+------------+------------+------------+ + | 0001 | donut | Cake | 0.55 | + +------------+------------+------------+------------+ + 1 row selected (0.053 seconds) + +### Use a Different Output Format + +You can run the same sequence again with a different storage format set for +the system or session (csv or parquet). For example, if the format is set to +csv, and you name the table donuts_csv, the resulting file would look like +this: + + administorsmbp7:tmp brumsby$ cd donuts_csv + administorsmbp7:donuts_csv brumsby$ ls + 0_0_0.csv + administorsmbp7:donuts_csv brumsby$ more 0_0_0.csv + id,type,name,ppu + 0001,donut,Cake,0.55 + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/sql-ref/cmd-summary/002-explain.md ---------------------------------------------------------------------- diff --git a/_docs/sql-ref/cmd-summary/002-explain.md b/_docs/sql-ref/cmd-summary/002-explain.md new file mode 100644 index 0000000..12a4d9a --- /dev/null +++ b/_docs/sql-ref/cmd-summary/002-explain.md @@ -0,0 +1,166 @@ +--- +title: "EXPLAIN commands" +parent: "SQL Commands Summary" +--- +EXPLAIN is a useful tool for examining the steps that a query goes through +when it is executed. You can use the EXPLAIN output to gain a deeper +understanding of the parallel processing that Drill queries exploit. You can +also look at costing information, troubleshoot performance issues, and +diagnose routine errors that may occur when you run queries. + +Drill provides two variations on the EXPLAIN command, one that returns the +physical plan and one that returns the logical plan. A logical plan takes the +SQL query (as written by the user and accepted by the parser) and translates +it into a logical series of operations that correspond to SQL language +constructs (without defining the specific algorithms that will be implemented +to run the query). A physical plan translates the logical plan into a specific +series of steps that will be used when the query runs. For example, a logical +plan may indicate a join step in general and classify it as inner or outer, +but the corresponding physical plan will indicate the specific type of join +operator that will run, such as a merge join or a hash join. The physical plan +is operational and reveals the specific _access methods_ that will be used for +the query. + +An EXPLAIN command for a query that is run repeatedly under the exact same +conditions against the same data will return the same plan. However, if you +change a configuration option, for example, or update the tables or files that +you are selecting from, you are likely to see plan changes. + +## EXPLAIN Syntax + +The EXPLAIN command supports the following syntax: + + explain plan [ including all attributes ] [ with implementation | without implementation ] for <query> ; + +where `query` is any valid SELECT statement supported by Drill. + +##### INCLUDING ALL ATTRIBUTES + +This option returns costing information. You can use this option for both +physical and logical plans. + +#### WITH IMPLEMENTATION | WITHOUT IMPLEMENTATION + +These options return the physical and logical plan information, respectively. +The default is physical (WITH IMPLEMENTATION). + +## EXPLAIN for Physical Plans + +The EXPLAIN PLAN FOR <query> command returns the chosen physical execution +plan for a query statement without running the query. You can use this command +to see what kind of execution operators Drill implements. For example, you can +find out what kind of join algorithm is chosen when tables or files are +joined. You can also use this command to analyze errors and troubleshoot +queries that do not run. For example, if you run into a casting error, the +query plan text may help you isolate the problem. + +Use the following syntax: + + explain plan for <query> ; + explain plan with implementation for <query> ; + +The following set command increases the default text display (number of +characters). By default, most of the plan output is not displayed. + + 0: jdbc:drill:zk=local> !set maxwidth 10000 + +Do not use a semicolon to terminate set commands. + +For example, here is the top portion of the explain output for a +COUNT(DISTINCT) query on a JSON file: + + 0: jdbc:drill:zk=local> !set maxwidth 10000 + + 0: jdbc:drill:zk=local> explain plan for + select type t, count(distinct id) + from dfs.`/Users/brumsby/drill/donuts.json` + where type='donut' group by type; + + +------------+------------+ + | text | json | + +------------+------------+ + | 00-00 Screen + 00-01 Project(t=[$0], EXPR$1=[$1]) + 00-02 Project(t=[$0], EXPR$1=[$1]) + 00-03 HashAgg(group=[{0}], EXPR$1=[COUNT($1)]) + 00-04 HashAgg(group=[{0, 1}]) + 00-05 SelectionVectorRemover + 00-06 Filter(condition=[=(CAST($0):CHAR(5) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary", 'donut')]) + 00-07 Project(type=[$1], id=[$2]) + 00-08 ProducerConsumer + 00-09 Scan(groupscan=[EasyGroupScan [selectionRoot=/Users/brumsby/drill/donuts.json, columns = null]]) + ... + +Read the text output from bottom to top to understand the sequence of +operators that will execute the query. Note that the physical plan starts with +a scan of the JSON file that is being queried. The selected columns are +projected and filtered, then the aggregate function is applied. + +The EXPLAIN text output is followed by detailed JSON output, which is reusable +for submitting the query via Drill APIs. + + | { + "head" : { + "version" : 1, + "generator" : { + "type" : "ExplainHandler", + "info" : "" + }, + "type" : "APACHE_DRILL_PHYSICAL", + "options" : [ ], + "queue" : 0, + "resultMode" : "EXEC" + }, + .... + +## Costing Information + +Add the INCLUDING ALL ATTRIBUTES option to the EXPLAIN command to see cost +estimates for the query plan. For example: + + 0: jdbc:drill:zk=local> !set maxwidth 10000 + 0: jdbc:drill:zk=local> explain plan including all attributes for + select * from dfs.`/Users/brumsby/drill/donuts.json` where type='donut'; + + +------------+------------+ + | text | json | + +------------+------------+ + + | 00-00 Screen: rowcount = 1.0, cumulative cost = {4.1 rows, 14.1 cpu, 0.0 io, 0.0 network}, id = 3110 + 00-01 Project(*=[$0], type=[$1]): rowcount = 1.0, cumulative cost = {4.0 rows, 14.0 cpu, 0.0 io, 0.0 network}, id = 3109 + 00-02 SelectionVectorRemover: rowcount = 1.0, cumulative cost = {3.0 rows, 6.0 cpu, 0.0 io, 0.0 network}, id = 3108 + 00-03 Filter(condition=[=(CAST($1):CHAR(5) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary", 'donut')]): rowcount = 1.0, cumulative cost = {2.0 rows, 5.0 cpu, 0.0 io, 0.0 network}, id = 3107 + 00-04 ProducerConsumer: rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 network}, id = 3106 + 00-05 Scan(groupscan=[EasyGroupScan [selectionRoot=/Users/brumsby/drill/donuts.json, columns = null]]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 3101 + +## EXPLAIN for Logical Plans + +To return the logical plan for a query (again, without actually running the +query), use the EXPLAIN PLAN WITHOUT IMPLEMENTATION syntax: + + explain plan without implementation for <query> ; + +For example: + + 0: jdbc:drill:zk=local> explain plan without implementation for + select a.id + from dfs.`/Users/brumsby/drill/donuts.json` a, dfs.`/Users/brumsby/drill/moredonuts.json` b + where a.id=b.id; + + +------------+------------+ + | text | json | + +------------+------------+ + | DrillScreenRel + DrillProjectRel(id=[$1]) + DrillJoinRel(condition=[=($1, $3)], joinType=[inner]) + DrillScanRel(table=[[dfs, /Users/brumsby/drill/donuts.json]], groupscan=[EasyGroupScan [selectionRoot=/Users/brumsby/drill/donuts.json, columns = null]]) + DrillScanRel(table=[[dfs, /Users/brumsby/drill/moredonuts.json]], groupscan=[EasyGroupScan [selectionRoot=/Users/brumsby/drill/moredonuts.json, columns = null]]) + | { + "head" : { + "version" : 1, + "generator" : { + "type" : "org.apache.drill.exec.planner.logical.DrillImplementor", + "info" : "" + }, + ... +