[jira] [Created] (DRILL-2492) JDBC : There seems to be no way to execute a CTAS statement with JDBC successfully
Rahul Challapalli created DRILL-2492: Summary: JDBC : There seems to be no way to execute a CTAS statement with JDBC successfully Key: DRILL-2492 URL: https://issues.apache.org/jira/browse/DRILL-2492 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Reporter: Rahul Challapalli Assignee: Daniel Barclay (Drill) git.commit.id.abbrev=7b4c887 Query : {code} create table temp1 as select * from dfs.jdbctesting.`fewtypes.parquet` {code} I tried to execute the above query using Statement.executeQuery. This call returned a ResultSet object which has a single value false. And when I checked on HDFS there was no table created. I tried using Statement.executeUpdate and got the below error: {code} Exception in thread main java.sql.SQLException: expected one result column at net.hydromatic.avatica.AvaticaStatement.executeUpdate(AvaticaStatement.java:88) at Dummy.testCTASQuery(Dummy.java:57) at Dummy.main(Dummy.java:30) {code} Let me know if I am not using JDBC correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2470) Implement SMALLINT [umbrella/tracking bug]
[ https://issues.apache.org/jira/browse/DRILL-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-2470: -- Summary: Implement SMALLINT [umbrella/tracking bug] (was: Implement SMALLINT (umbrella/tracking bug).) Implement SMALLINT [umbrella/tracking bug] -- Key: DRILL-2470 URL: https://issues.apache.org/jira/browse/DRILL-2470 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2465) Fix multiple DatabaseMetaData.getColumns() bugs (some)
[ https://issues.apache.org/jira/browse/DRILL-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-2465: -- Description: Fixed most {{getColumn()}} bugs reported in DRILL-2420: - Added {{COLUMN_SIZE}} (in part to move later columns to right ordinal position). [With workarounds for (possible) INFORMATION_SCHEMA.COLUMN bugs.] - Fixed {{DECIMAL_DIGITS}} (was DECIMAL_PRECISION; didn't report values for numeric types other than DECIMAL). - Fixed {{NUM_PREC_RADIX}} (was -1 for cases that should be NULL). - Fixed {{REMARKS}} (from '' to NULL). - Fixed {{COLUMN_DEF}} (from '' to NULL). - Fixed {{CHARACTER_OCTET_LENGTH}} (was 4). [With workarounds for (possible) INFORMATION_SCHEMA.COLUMN bugs.] - Fixed {{ORDINAL_POSITION}} (was returning 1 for every column). - Fixed {{SCOPE_CATALOG}} (from '' to NULL). - Fixed {{SCOPE_SCHEMA}} (from '' to NULL). - Fixed {{SCOPE_TABLE}} (from '' to NULL). - Fixed {{SOURCE_DATA_TYPE}} (from VARCHAR to INTEGER.) [With workaround because SMALLINT not implemented yet.] was: Fixed most {{getColumn()}} bugs reported in DRILL-2420: - Added {{COLUMN_SIZE}} (in part to move later columns to right ordinal position). [With workarounds for (possible) INFORMATION_SCHEMA.COLUMN bugs.] - Fixed {{DECIMAL_DIGITS}} (was DECIMAL_PRECISION; didn't report values for numeric types other than DECIMAL). - Fixed {{NUM_PREC_RADIX}} (was -1 for cases that should be NULL). - Fixed {{REMARKS}} (from '' to NULL). - Fixed {{COLUMN_DEF}} (from '' to NULL). - Fixed {{CHARACTER_OCTET_LENGTH}} (was 4). [With workarounds for (possible) INFORMATION_SCHEMA.COLUMN bugs.] - Fixed {{ORDINAL_POSITION}} (was returning 1 for every column). - Fixed {{SCOPE_CATALOG}} (from '' to NULL). - Fixed {{SCOPE_SCHEMA}} (from '' to NULL). - Fixed {{SCOPE_TABLE}} (from '' to NULL). - Fixed {{SOURCE_DATA_TYPE}} (from VARCHAR to INTEGER.) [With workaround because SMALLINT not implemented yet.] [Bug report in progress] Fix multiple DatabaseMetaData.getColumns() bugs (some) --- Key: DRILL-2465 URL: https://issues.apache.org/jira/browse/DRILL-2465 Project: Apache Drill Issue Type: Bug Components: Client - JDBC, Metadata Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) Fixed most {{getColumn()}} bugs reported in DRILL-2420: - Added {{COLUMN_SIZE}} (in part to move later columns to right ordinal position). [With workarounds for (possible) INFORMATION_SCHEMA.COLUMN bugs.] - Fixed {{DECIMAL_DIGITS}} (was DECIMAL_PRECISION; didn't report values for numeric types other than DECIMAL). - Fixed {{NUM_PREC_RADIX}} (was -1 for cases that should be NULL). - Fixed {{REMARKS}} (from '' to NULL). - Fixed {{COLUMN_DEF}} (from '' to NULL). - Fixed {{CHARACTER_OCTET_LENGTH}} (was 4). [With workarounds for (possible) INFORMATION_SCHEMA.COLUMN bugs.] - Fixed {{ORDINAL_POSITION}} (was returning 1 for every column). - Fixed {{SCOPE_CATALOG}} (from '' to NULL). - Fixed {{SCOPE_SCHEMA}} (from '' to NULL). - Fixed {{SCOPE_TABLE}} (from '' to NULL). - Fixed {{SOURCE_DATA_TYPE}} (from VARCHAR to INTEGER.) [With workaround because SMALLINT not implemented yet.] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2463) Implement JDBC mapping of SQL NULL for ResultSet.getXxx() methods
[ https://issues.apache.org/jira/browse/DRILL-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-2463: -- Attachment: (was: DRILL-2463.2.patch.txt) Implement JDBC mapping of SQL NULL for ResultSet.getXxx() methods - Key: DRILL-2463 URL: https://issues.apache.org/jira/browse/DRILL-2463 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) Fix AvaticaDrillSqlAccessor to implement mapping of SQL NULL to dummy primitive values (e.g,., returning 0 for ResultSet.getInt(...)). Fix SqlAccessors template to implement mapping of SQL NULL to null pointers (e.g., returning null from ResultSet.getString(...).) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2463) Implement JDBC mapping of SQL NULL for ResultSet.getXxx() methods
[ https://issues.apache.org/jira/browse/DRILL-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-2463: -- Attachment: (was: DRILL-2463.1.patch.txt) Implement JDBC mapping of SQL NULL for ResultSet.getXxx() methods - Key: DRILL-2463 URL: https://issues.apache.org/jira/browse/DRILL-2463 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) Fix AvaticaDrillSqlAccessor to implement mapping of SQL NULL to dummy primitive values (e.g,., returning 0 for ResultSet.getInt(...)). Fix SqlAccessors template to implement mapping of SQL NULL to null pointers (e.g., returning null from ResultSet.getString(...).) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-2158) Failure while attempting to start Drillbit in embedded mode.
[ https://issues.apache.org/jira/browse/DRILL-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kun22kun closed DRILL-2158. --- Resolution: Fixed It's because of the openJDK. There should be JDK of oracle. Failure while attempting to start Drillbit in embedded mode. -- Key: DRILL-2158 URL: https://issues.apache.org/jira/browse/DRILL-2158 Project: Apache Drill Issue Type: Bug Components: Tools, Build Test Affects Versions: 0.7.0 Environment: Linux Master.hadoop 2.6.32-431.23.3.el6.x86_64 #1 SMP Thu Jul 3117:20:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux CentOS release 6.5 (Final) Reporter: kun22kun Assignee: Chun Chang Priority: Minor Labels: github-import, maven Fix For: 1.0.0 First, I install my drill according to “https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes”. When to start my drill via bin/sqlline -u jdbc:drill:zk=local -n admin -p admin, It shows Error: Failure while attempting to start Drillbit in embedded mode. (state=,code=0) sqlline version 1.1.6 0: jdbc:drill:zk=local Then I install my drill with maven according to INSTALL.md in the source from github. But the same result like above. Finally , in the path tmp/drill/, there's nothing, do I need to create by myself? Is it necessary to build a distributed system for example hadoop? Much apperaite! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2480) Identify, fix INFORMATION_SCHEMA and JDBC metadata bugs [umbrella/tracking bug]
Daniel Barclay (Drill) created DRILL-2480: - Summary: Identify, fix INFORMATION_SCHEMA and JDBC metadata bugs [umbrella/tracking bug] Key: DRILL-2480 URL: https://issues.apache.org/jira/browse/DRILL-2480 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2420) Identify, fix DatabaseMetaData.getColumns() bugs [umbrella/tracking bug]
[ https://issues.apache.org/jira/browse/DRILL-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-2420: -- Summary: Identify, fix DatabaseMetaData.getColumns() bugs [umbrella/tracking bug] (was: Identify and fix DatabaseMetaData.getColumns() bugs [umbrella/tracking bug]) Identify, fix DatabaseMetaData.getColumns() bugs [umbrella/tracking bug] Key: DRILL-2420 URL: https://issues.apache.org/jira/browse/DRILL-2420 Project: Apache Drill Issue Type: Bug Components: Client - JDBC, Metadata Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) Drill's implementation of {{DatabaseMetaData.getColumns(...)}} (currently at {{org.apache.drill.jdbc.MetaImpl.getColumns()}}) doesn't match the JDBC specification (the Javadoc documentation for {{DatabaseMetaData.getColumns(...)}} (as of Java 7)). In the returned {{ResultSet}}: 1. Column {{DATA_TYPE}} is of type {{VARCHAR}} (containing the type name) rather than being of type {{INTEGER}} (containing values per {{java.sql.Types.*}}). 2. Column {{TYPE_NAME}} is missing. 3. Column {{COLUMN_SIZE}} is missing. 4. (Columns after {{DATA_TYPE}} are at incorrect indexes.) 5. Column {{DECIMAL_DIGITS}} is misnamed {{DECIMAL_PRECISION}}. 6. Column {{REMARKS}} is an empty string, but probably should be {{NULL}}. 7. Column {{COLUMN_DEF}} is an empty string, but probably should be {{NULL}}. 8. Column {{CHAR_OCTET_LENGTH}} is always {{4}}, but should be the maximum number of bytes in the _column_ for character types . 8.5 Column {{IS_NULLABLE}} seems to always return 'NO'. 9. Column {{ORDINAL_POSITION}} is always {{1}}, but should be the index of the specific column. 10. Column {{IS_NULLABLE}} is {{'YES'}}, which doesn't seem to correspond to the value for {{NULLABLE}} ({{DatabaseMetaData.columnNullableUnknown}}). 11. Column {{SCOPE_CATALOG}} is an empty string, but should be {{NULL}}. 12. Column {{SCOPE_SCHEMA}} is an empty string, but should be {{NULL}}. 13. Column {{SCOPE_TABLE}} is an empty string, but should be {{NULL}}. 14. Column {{SOURCE_DATA_TYPE}} is an empty string, but should be {{NULL}}. Additional bugs or suspect behavior: - {{DECIMAL_DIGITS}}/{{DECIMAL_PRECISION}} is {{-1}} when it should be {{NULL}} (when not applicable). - {{NUM_PREC_RADIX}} is {{-1}} when it probably should be {{NULL}} (when not applicable). (Other columns to check: Re {{BUFFER_LENGTH}}, {{SQL_DATA_TYPE}}, and {{SQL_DATETIME_SUB}}: When JDBC says a column is not used, are there any requirements on the values (e.g., being {{NULL}})? Re {{IS_AUTOINCREMENT}}: Do we know that a column is not auto-incremented? If so, the value could be {{'NO'}} rather than an empty string. Re {{IS_GENERATEDCOLUMN}}: Do we know that a column is not generated? If so, the value could be {{'NO'}} rather than an empty string. Re {{NULLABLE}} (: Do know whether a column is nullable or not? If so, we could return the specific answer rather that just saying that it's unknown. ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2459) INFO._SCHEMA's CHARACTER_MAXIMUM_LENGTH is -1 for type CHAR
[ https://issues.apache.org/jira/browse/DRILL-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-2459: -- Description: INFORMATION_SCHEMA.COLUMNS.CHARACTER_MAXIMUM_LENGTH does not report the length for type CHAR. For example, for type descriptor CHAR(4), it doesn't return 4. Instead, it returns -1: 0: jdbc:drill:zk=local USE dfs.tmp; +++ | ok | summary | +++ | true | Default schema changed to 'dfs.tmp' | +++ 1 row selected (0.05 seconds) 0: jdbc:drill:zk=local CREATE OR REPLACE VIEW TempView AS SELECT CAST( NULL AS VARCHAR(3) ), CAST( NULL AS CHAR(4) ) FROM INFORMATION_SCHEMA.CATALOGS LIMIT 1 ; +++ | ok | summary | +++ | true | View 'TempView' replaced successfully in 'dfs.tmp' schema | +++ 1 row selected (0.05 seconds) 0: jdbc:drill:zk=local SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'TempView'; +---+--++-+--+-++--+-+---+---+ | TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | ORDINAL_POSITION | IS_NULLABLE | DATA_TYPE | CHARACTER_MAXIMUM_LENGTH | NUMERIC_PRECISION_RADIX | NUMERIC_SCALE | NUMERIC_PRECISION | +---+--++-+--+-++--+-+---+---+ | DRILL | dfs.tmp | TempView | EXPR$0 | 0| NO | VARCHAR| 3| -1 | -1| -1| | DRILL | dfs.tmp | TempView | EXPR$1 | 1| NO | CHAR | -1 | -1 | -1| 4 | +---+--++-+--+-++--+-+---+---+ 2 rows selected (0.072 seconds) 0: jdbc:drill:zk=local Hmm. Note the 4 in the NUMERIC_PRECISION column: 0: jdbc:drill:zk=local SELECT DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, NUMERIC_PRECISION FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'TempView'; ++--+---+ | DATA_TYPE | CHARACTER_MAXIMUM_LENGTH | NUMERIC_PRECISION | ++--+---+ | VARCHAR| 3| -1| | CHAR | -1 | 4 | ++--+---+ 2 rows selected (0.065 seconds) 0: jdbc:drill:zk=local was: INFORMATION_SCHEMA.COLUMNS.CHARACTER_MAXIMUM_LENGTH does not report the length for type CHAR. For example, for type descriptor CHAR(4), it doesn't return 4. Instead, it returns 1: 0: jdbc:drill:zk=local USE dfs.tmp; +++ | ok | summary | +++ | true | Default schema changed to 'dfs.tmp' | +++ 1 row selected (0.05 seconds) 0: jdbc:drill:zk=local CREATE OR REPLACE VIEW TempView AS SELECT CAST( NULL AS VARCHAR(3) ), CAST( NULL AS CHAR(4) ) FROM INFORMATION_SCHEMA.CATALOGS LIMIT 1 ; +++ | ok | summary | +++ | true | View 'TempView' replaced successfully in 'dfs.tmp' schema | +++ 1 row selected (0.05 seconds) 0: jdbc:drill:zk=local SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'TempView'; +---+--++-+--+-++--+-+---+---+ | TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | ORDINAL_POSITION | IS_NULLABLE | DATA_TYPE | CHARACTER_MAXIMUM_LENGTH | NUMERIC_PRECISION_RADIX | NUMERIC_SCALE | NUMERIC_PRECISION | +---+--++-+--+-++--+-+---+---+ | DRILL | dfs.tmp | TempView | EXPR$0 | 0| NO | VARCHAR| 3| -1 | -1| -1| | DRILL | dfs.tmp | TempView | EXPR$1 | 1| NO | CHAR | -1 | -1 | -1| 4 |
[jira] [Updated] (DRILL-2180) Star is not expanded when being used with flatten
[ https://issues.apache.org/jira/browse/DRILL-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu updated DRILL-2180: - Attachment: DRILL-2180.1.patch Patch Available Star is not expanded when being used with flatten - Key: DRILL-2180 URL: https://issues.apache.org/jira/browse/DRILL-2180 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Sean Hsuan-Yi Chu Assignee: Sean Hsuan-Yi Chu Priority: Critical Fix For: 0.9.0 Attachments: DRILL-2180.1.patch For example, select *, flatten(j.topping) tt + from dfs_test.`%s` j (using the same data set in DRILL-2012) * tt null {id:5001,type:None} null {id:5002,type:Glazed} null {id:5005,type:Sugar} null {id:5007,type:Powdered Sugar} null {id:5006,type:Chocolate with Sprinkles} null {id:5003,type:Chocolate} null {id:5004,type:Maple} Note that the first column is messed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1911) Querying same field multiple times with different case would hit memory leak and return incorrect result.
[ https://issues.apache.org/jira/browse/DRILL-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365986#comment-14365986 ] Sean Hsuan-Yi Chu commented on DRILL-1911: -- Resolved in Commit#: ae2053d2a078a40033a140f2dfaeef802a5e8254 Querying same field multiple times with different case would hit memory leak and return incorrect result. -- Key: DRILL-1911 URL: https://issues.apache.org/jira/browse/DRILL-1911 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Jinfeng Ni Assignee: Sean Hsuan-Yi Chu Fix For: 0.8.0 git.commit.id.abbrev=309e1be If query the same field twice, with different case, Drill will throw memory assertion error. select employee_id, Employee_id from cp.`employee.json` limit 2; +-+ | employee_id | +-+ | 1 | | 2 | Query failed: Query failed: Failure while running fragment., Attempted to close accountor with 2 buffer(s) still allocatedfor QueryId: 2b5cc8eb-2817-aadb-e0fa-49272796592a, MajorFragmentId: 0, MinorFragmentId: 0. Total 1 allocation(s) of byte size(s): 4096, at stack location: org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:212) org.apache.drill.exec.vector.UInt1Vector.allocateNewSafe(UInt1Vector.java:137) org.apache.drill.exec.vector.NullableBigIntVector.allocateNewSafe(NullableBigIntVector.java:173) org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doAlloc(ProjectRecordBatch.java:229) org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:167) org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93) org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:97) org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:114) org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Also, notice that the query result only contains one field; the second field is missing. The plan looks fine. Drill Physical : 00-00Screen: rowcount = 463.0, cumulative cost = {1900.3 rows, 996.3 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 103 00-01 Project(employee_id=[$0], Employee_id=[$1]): rowcount = 463.0, cumulative cost = {1854.0 rows, 950.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 102 00-02SelectionVectorRemover: rowcount = 463.0, cumulative cost = {1391.0 rows, 942.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 101 00-03 Limit(fetch=[2]): rowcount = 463.0, cumulative cost = {928.0 rows, 479.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 100 00-04Project(employee_id=[$0], Employee_id=[$0]): rowcount = 463.0, cumulative cost = {926.0 rows, 471.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 99 00-05 Scan(groupscan=[EasyGroupScan [selectionRoot=/employee.json, numFiles=1, columns=[`employee_id`], files=[/employee.json]]]): rowcount = 463.0, cumulative cost = {463.0 rows, 463.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 98 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-1842) SELECT COUNT DISTINCT with HAVING fails to plan the query
[ https://issues.apache.org/jira/browse/DRILL-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu updated DRILL-1842: - Assignee: Aman Sinha (was: Sean Hsuan-Yi Chu) SELECT COUNT DISTINCT with HAVING fails to plan the query - Key: DRILL-1842 URL: https://issues.apache.org/jira/browse/DRILL-1842 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.6.0 Reporter: Chris Matta Assignee: Aman Sinha Fix For: 0.9.0 Attachments: ip-172-16-1-175_drillbit.log Tableau is using the following query to get the distinct count of a measure: {code:SQL} SELECT COUNT(DISTINCT `custview`.`age`) AS `ctd_age_ok` FROM `mfs.views`.`nestedclickview` `nestedclickview` INNER JOIN `mfs.views`.`custview` `custview` ON (`nestedclickview`.`cust_id` = `custview`.`cust_id`) HAVING (COUNT(1) 0); {code} And it fails on 0.06r2 with a planing error. Interestingly if I remove the HAVING(COUNT(1)0) statement at the end it works: {code} : jdbc:drill:zk=172.16.1.175:5181,172.16.1.1 SELECT COUNT(DISTINCT `custview`.`age`) AS `ctd_age_ok` FROM `mfs.views`.`nestedclickview` `nestedclickview` INNER JOIN `mfs.views`.`custview` `custview` ON (`nestedclickview`.`cust_id` = `custview`.`cust_id`); ++ | ctd_age_ok | ++ | 5 | ++ 1 row selected (4.776 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1842) SELECT COUNT DISTINCT with HAVING fails to plan the query
[ https://issues.apache.org/jira/browse/DRILL-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366155#comment-14366155 ] Sean Hsuan-Yi Chu commented on DRILL-1842: -- [~amansinha100], it might be related your current work. So I assign to you. SELECT COUNT DISTINCT with HAVING fails to plan the query - Key: DRILL-1842 URL: https://issues.apache.org/jira/browse/DRILL-1842 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.6.0 Reporter: Chris Matta Assignee: Sean Hsuan-Yi Chu Fix For: 0.9.0 Attachments: ip-172-16-1-175_drillbit.log Tableau is using the following query to get the distinct count of a measure: {code:SQL} SELECT COUNT(DISTINCT `custview`.`age`) AS `ctd_age_ok` FROM `mfs.views`.`nestedclickview` `nestedclickview` INNER JOIN `mfs.views`.`custview` `custview` ON (`nestedclickview`.`cust_id` = `custview`.`cust_id`) HAVING (COUNT(1) 0); {code} And it fails on 0.06r2 with a planing error. Interestingly if I remove the HAVING(COUNT(1)0) statement at the end it works: {code} : jdbc:drill:zk=172.16.1.175:5181,172.16.1.1 SELECT COUNT(DISTINCT `custview`.`age`) AS `ctd_age_ok` FROM `mfs.views`.`nestedclickview` `nestedclickview` INNER JOIN `mfs.views`.`custview` `custview` ON (`nestedclickview`.`cust_id` = `custview`.`cust_id`); ++ | ctd_age_ok | ++ | 5 | ++ 1 row selected (4.776 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2414) Union-All on SELECT * FROM schema-less data source will throw exception
[ https://issues.apache.org/jira/browse/DRILL-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu resolved DRILL-2414. -- Resolution: Fixed Union-All on SELECT * FROM schema-less data source will throw exception --- Key: DRILL-2414 URL: https://issues.apache.org/jira/browse/DRILL-2414 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Sean Hsuan-Yi Chu Assignee: Aman Sinha Fix For: 0.8.0 Attachments: DRILL-2414.1.patch Union-All on SELECT * (wildcard symbol) is supported only for the cases where schema (i.e., hive, view) is available. For detailed design documentation, please refer to DRILL-2207. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2311) Create table with same columns of different case results in a java.lang.IllegalStateException
[ https://issues.apache.org/jira/browse/DRILL-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365788#comment-14365788 ] Aman Sinha commented on DRILL-2311: --- +1 on the patch. Committed to master branch: ae2053d2a078a40033a140f2dfaeef802a5e8254 Create table with same columns of different case results in a java.lang.IllegalStateException - Key: DRILL-2311 URL: https://issues.apache.org/jira/browse/DRILL-2311 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 0.8.0 Reporter: Ramana Inukonda Nagaraj Assignee: Sean Hsuan-Yi Chu Fix For: 0.9.0 Attachments: DRILL-2311.1.patch Doing a create table with same column in different case results in a runtime exception. This query should fail at planning or parsing. CREATE TABLE drill_parquet_mulCaseColumns3 as select cast( BIGINT_col as bigint) BIGINT_col,cast( DECIMAL9_col as decimal) bigint_col FROM dfs.`/user/root/alltypes.json`; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2486) Return format differences between drill odbc from interval date queries
Krystal created DRILL-2486: -- Summary: Return format differences between drill odbc from interval date queries Key: DRILL-2486 URL: https://issues.apache.org/jira/browse/DRILL-2486 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 0.8.0 Reporter: Krystal Assignee: Daniel Barclay (Drill) Priority: Minor git.commit.id=ae2053d2a078a40033a140f2dfaeef802a5e8254 The format of results from interval date queries is different between drill and odbc. Below are some examples. From drill: SELECT interval '10' day from basic limit 1; ++ | EXPR$0 | ++ | P10D | ++ SELECT interval '12-11' year to month from basic limit 1; ++ | EXPR$0 | ++ | P12Y11M| SELECT interval '1' year from basic limit 1; ++ | EXPR$0 | ++ | P1Y| ++ SELECT interval '9' month from basic limit 1; ++ | EXPR$0 | ++ | P9M| ++ From ODBC: SQL SELECT interval '10' day from basic limit 1 +---+ | EXPR$0| +---+ | 10 00:00:00.00| +---+ SQL SELECT interval '12-11' year to month from basic limit 1 +--+ | EXPR$0 | +--+ | 12-11| +--+ SQL SELECT interval '1' year from basic limit 1 +--+ | EXPR$0 | +--+ | 1-00 | +--+ SQL SELECT interval '9' month from basic limit 1 +--+ | EXPR$0 | +--+ | 0-09 | +--+ We should have consistent output from the 2 sources. The result from ODBC seems easier to read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2478) Validating values assigned to SYSTEM/SESSION configuration parameters
[ https://issues.apache.org/jira/browse/DRILL-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366065#comment-14366065 ] Khurram Faraaz commented on DRILL-2478: --- system config option store.parquet.block-size accepts different values, inputs must be validated. {code} 0: jdbc:drill: alter system set `store.parquet.block-size`=0; +++ | ok | summary | +++ | true | store.parquet.block-size updated. | +++ 1 row selected (0.076 seconds) 0: jdbc:drill: alter system set `store.parquet.block-size`=-1; +++ | ok | summary | +++ | true | store.parquet.block-size updated. | +++ 1 row selected (0.05 seconds) 0: jdbc:drill: alter system set `store.parquet.block-size`=536870912; +++ | ok | summary | +++ | true | store.parquet.block-size updated. | +++ 1 row selected (0.057 seconds) 0: jdbc:drill: alter system set `store.parquet.block-size`=100; +++ | ok | summary | +++ | true | store.parquet.block-size updated. | +++ 1 row selected (0.078 seconds) {code} Validating values assigned to SYSTEM/SESSION configuration parameters - Key: DRILL-2478 URL: https://issues.apache.org/jira/browse/DRILL-2478 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 0.8.0 Environment: {code} 0: jdbc:drill: select * from sys.version; +++-+-++ | commit_id | commit_message | commit_time | build_email | build_time | +++-+-++ | f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe | DRILL-2209 Insert ProjectOperator with MuxExchange | 09.03.2015 @ 01:49:18 EDT | Unknown | 09.03.2015 @ 04:50:05 EDT | +++-+-++ 1 row selected (0.046 seconds) {code} Reporter: Khurram Faraaz Assignee: Daniel Barclay (Drill) Values that are assigned to configuration parameters of type SYSTEM and SESSION must be validated. Currently any value can be assigned to some of the SYSTEM/SESSION type parameters. Here are two examples where assignment of invalid values to store.format does not result in any error. {code} 0: jdbc:drill: alter session set `store.format`='1'; +++ | ok | summary | +++ | true | store.format updated. | +++ 1 row selected (0.02 seconds) {code} {code} 0: jdbc:drill: alter session set `store.format`='foo'; +++ | ok | summary | +++ | true | store.format updated. | +++ 1 row selected (0.039 seconds) {code} In some cases values to some of the configuration parameters are validated, like in this example, where trying to assign an invalid value to parameter store.parquet.compression results in an error, which is correct. However, this kind of validation is not performed for every configuration parameter of SYSTEM/SESSION type. These values that are assigned to parameters must be validated, and report errors if incorrect values are assigned by users. {code} 0: jdbc:drill: alter session set `store.parquet.compression`='anything'; Query failed: ExpressionParsingException: Option store.parquet.compression must be one of: [snappy, gzip, none] Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2180) Star is not expanded when being used with flatten
[ https://issues.apache.org/jira/browse/DRILL-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366089#comment-14366089 ] Mehant Baid commented on DRILL-2180: +1 Star is not expanded when being used with flatten - Key: DRILL-2180 URL: https://issues.apache.org/jira/browse/DRILL-2180 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Sean Hsuan-Yi Chu Assignee: Mehant Baid Priority: Critical Fix For: 0.9.0 Attachments: DRILL-2180.1.patch For example, select *, flatten(j.topping) tt + from dfs_test.`%s` j (using the same data set in DRILL-2012) * tt null {id:5001,type:None} null {id:5002,type:Glazed} null {id:5005,type:Sugar} null {id:5007,type:Powdered Sugar} null {id:5006,type:Chocolate with Sprinkles} null {id:5003,type:Chocolate} null {id:5004,type:Maple} Note that the first column is messed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2481) Querying individual column from view results in AssertionError
[ https://issues.apache.org/jira/browse/DRILL-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz updated DRILL-2481: -- Description: Querying an individual column from a view results in an AssertionError Data used was from a csv file, its content was a single row (pls see below) 1,John Doe,HR,5000,Software Engineer {code } 0: jdbc:drill: use dfs.tmp; +++ | ok | summary | +++ | true | Default schema changed to 'dfs.tmp' | +++ 1 row selected (0.188 seconds) 0: jdbc:drill: create view v1 as select * from `employee.csv` union all select * from `employee.csv`; +++ | ok | summary | +++ | true | View 'v1' created successfully in 'dfs.tmp' schema | +++ 1 row selected (0.073 seconds) 0: jdbc:drill: create view v2 as select * from `employee.csv` union all select * from `employee.csv`; +++ | ok | summary | +++ | true | View 'v2' created successfully in 'dfs.tmp' schema | +++ 1 row selected (0.046 seconds) 0: jdbc:drill: select * from v1; ++ | columns | ++ | [1,John Doe,HR,5000,Software Engineer] | | [1,John Doe,HR,5000,Software Engineer] | ++ 2 rows selected (0.087 seconds) 0: jdbc:drill: select * from v2; ++ | columns | ++ | [1,John Doe,HR,5000,Software Engineer] | | [1,John Doe,HR,5000,Software Engineer] | ++ 2 rows selected (0.075 seconds) 0: jdbc:drill: describe v1; +-++-+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | +-++-+ | * | ANY| NO | +-++-+ 1 row selected (0.084 seconds) 0: jdbc:drill: describe v2; +-++-+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | +-++-+ | * | ANY| NO | +-++-+ 1 row selected (0.083 seconds) 0: jdbc:drill: select columns[0] from v1; Query failed: AssertionError: ANY Error: exception while executing query: Failure while executing query. (state=,code=0) {code} {code} Stack trace from drillbit.log 2015-03-17 16:40:43,176 [2af7a6f3-bbcf-3a34-dfae-5b5bb11ff4a9:foreman] INFO o.a.d.e.s.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1 using 1 threads. Time: 1ms total, 1.060851ms avg, 1ms max. 2015-03-17 16:40:43,178 [2af7a6f3-bbcf-3a34-dfae-5b5bb11ff4a9:foreman] INFO o.a.drill.exec.work.foreman.Foreman - State change requested. PENDING -- FAILED org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: ANY at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:195) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75] Caused by: java.lang.AssertionError: ANY at org.eigenbase.reltype.RelDataTypeImpl.getFieldCount(RelDataTypeImpl.java:114) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.relopt.RelOptUtil$2.size(RelOptUtil.java:143) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexChecker.visitInputRef(RexChecker.java:111) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexChecker.visitInputRef(RexChecker.java:55) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexInputRef.accept(RexInputRef.java:103) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:136) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:55) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexCall.accept(RexCall.java:106) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:136) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:55) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexCall.accept(RexCall.java:106) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rel.ProjectRelBase.isValid(ProjectRelBase.java:156) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rel.ProjectRelBase.init(ProjectRelBase.java:82) ~[optiq-core-0.9-drill-r20.jar:na] at
[jira] [Resolved] (DRILL-2311) Create table with same columns of different case results in a java.lang.IllegalStateException
[ https://issues.apache.org/jira/browse/DRILL-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu resolved DRILL-2311. -- Resolution: Fixed Create table with same columns of different case results in a java.lang.IllegalStateException - Key: DRILL-2311 URL: https://issues.apache.org/jira/browse/DRILL-2311 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 0.8.0 Reporter: Ramana Inukonda Nagaraj Assignee: Sean Hsuan-Yi Chu Fix For: 0.8.0 Attachments: DRILL-2311.1.patch Doing a create table with same column in different case results in a runtime exception. This query should fail at planning or parsing. CREATE TABLE drill_parquet_mulCaseColumns3 as select cast( BIGINT_col as bigint) BIGINT_col,cast( DECIMAL9_col as decimal) bigint_col FROM dfs.`/user/root/alltypes.json`; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2414) Union-All on SELECT * FROM schema-less data source will throw exception
[ https://issues.apache.org/jira/browse/DRILL-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365778#comment-14365778 ] Aman Sinha commented on DRILL-2414: --- +1. Committed to master branch, commit #: 63bd48eb0a8081e3c24a7e49095bbcfc0f36bf7c Union-All on SELECT * FROM schema-less data source will throw exception --- Key: DRILL-2414 URL: https://issues.apache.org/jira/browse/DRILL-2414 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Sean Hsuan-Yi Chu Assignee: Aman Sinha Fix For: 0.9.0 Attachments: DRILL-2414.1.patch Union-All on SELECT * (wildcard symbol) is supported only for the cases where schema (i.e., hive, view) is available. For detailed design documentation, please refer to DRILL-2207. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2311) Create table with same columns of different case results in a java.lang.IllegalStateException
[ https://issues.apache.org/jira/browse/DRILL-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365816#comment-14365816 ] Sean Hsuan-Yi Chu commented on DRILL-2311: -- Review Board: https://reviews.apache.org/r/32089/ Create table with same columns of different case results in a java.lang.IllegalStateException - Key: DRILL-2311 URL: https://issues.apache.org/jira/browse/DRILL-2311 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 0.8.0 Reporter: Ramana Inukonda Nagaraj Assignee: Sean Hsuan-Yi Chu Fix For: 0.8.0 Attachments: DRILL-2311.1.patch Doing a create table with same column in different case results in a runtime exception. This query should fail at planning or parsing. CREATE TABLE drill_parquet_mulCaseColumns3 as select cast( BIGINT_col as bigint) BIGINT_col,cast( DECIMAL9_col as decimal) bigint_col FROM dfs.`/user/root/alltypes.json`; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results
[ https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365761#comment-14365761 ] Sean Hsuan-Yi Chu commented on DRILL-2380: -- The failure was due to Union-All. After new union-all had gotten checked in, this query ran and gave the same result as postrgres. TPC-DS Query 33 and simplified variants return wrong results Key: DRILL-2380 URL: https://issues.apache.org/jira/browse/DRILL-2380 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Sean Hsuan-Yi Chu Priority: Critical Fix For: 0.8.0 TPC-DS query 33 returns wrong results. {code:sql} WITH ss AS (SELECT i_manufact_id, Sum(ss_ext_sales_price) total_sales FROM store_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ss_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), cs AS (SELECT i_manufact_id, Sum(cs_ext_sales_price) total_sales FROM catalog_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND cs_item_sk = i_item_sk AND cs_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND cs_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), ws AS (SELECT i_manufact_id, Sum(ws_ext_sales_price) total_sales FROM web_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ws_item_sk = i_item_sk AND ws_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ws_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id) SELECT i_manufact_id, Sum(total_sales) total_sales FROM (SELECT i_manufact_id, total_sales FROM ss UNION ALL SELECT i_manufact_id, total_sales FROM cs UNION ALL SELECT i_manufact_id, total_sales FROM ws) tmp1 GROUP BY i_manufact_id ORDER BY total_sales LIMIT 10; Drill Results: +---+-+ | i_manufact_id | total_sales | +---+-+ | 440 | 0.12| | 434 | 13.16 | | 415 | 14.04 | | 449 | 15.63 | | 563 | 31.46 | | 357 | 49.50 | | 624 | 67.94 | | 192 | 74.40 | | 137 | 83.42 | | 240 | 85.26 | +---+-+ 10 rows selected (7.57 seconds) Postgres Results: i_manufact_id | total_sales ---+- 930 |1.18 818 | 41.86 913 | 141.90 784 | 184.90 488 | 275.08 993 | 301.60 700 | 340.52 895 | 802.30 766 | 839.76 858 | 859.18 (10 rows) {code} The following simplified variants also return wrong results: {code:sql} SELECT sum(x) FROM (SELECT ss_ext_sales_price x, ss_item_sk FROM store_sales GROUP BY ss_item_sk, ss_ext_sales_price UNION ALL SELECT cs_ext_sales_price x, cs_item_sk FROM catalog_sales GROUP BY cs_item_sk, cs_ext_sales_price) tmp GROUP BY x LIMIT 10; Drill Results: ++ | EXPR$0 | ++ | 14141.40 | | 28060.00 | | 30912.70 | | 43706.88 | | 38267.64 | | 10173.00 | | 37829.25 | | 5349.50| | 107515.80 | | 4440.84| ++ 10 rows selected (14.435 seconds) Postgres Results: sum -- 45234.00 5735.31 2275.60 6921.32 2590.46
[jira] [Updated] (DRILL-2311) Create table with same columns of different case results in a java.lang.IllegalStateException
[ https://issues.apache.org/jira/browse/DRILL-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu updated DRILL-2311: - Attachment: DRILL-2311.1.patch Create table with same columns of different case results in a java.lang.IllegalStateException - Key: DRILL-2311 URL: https://issues.apache.org/jira/browse/DRILL-2311 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 0.8.0 Reporter: Ramana Inukonda Nagaraj Assignee: Sean Hsuan-Yi Chu Fix For: 0.9.0 Attachments: DRILL-2311.1.patch Doing a create table with same column in different case results in a runtime exception. This query should fail at planning or parsing. CREATE TABLE drill_parquet_mulCaseColumns3 as select cast( BIGINT_col as bigint) BIGINT_col,cast( DECIMAL9_col as decimal) bigint_col FROM dfs.`/user/root/alltypes.json`; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2441) Throw unsupported error message in case of inequality join
[ https://issues.apache.org/jira/browse/DRILL-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365784#comment-14365784 ] Aman Sinha commented on DRILL-2441: --- +1. Committed to master branch commit #: ae2053d2a Throw unsupported error message in case of inequality join -- Key: DRILL-2441 URL: https://issues.apache.org/jira/browse/DRILL-2441 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Victoria Markman Assignee: Aman Sinha Fix For: 0.9.0 Attachments: DRILL-2441.1.patch Since we don't support inequality join, the whole class of queries will throw huge page long Can't plan exception This is a request to throw a nice error message that we throw in case of cartesian join in these cases as well. {code} select * from t1 left outer join t2 on (t1.a1 = t2.a2 and t1.b2 t2.b2); select * from t1 right outer join t2 on (t1.a1 = t2.a2 and t1.b2 t2.b2); {code} Example of an exception: {code} 0: jdbc:drill:schema=dfs select * from t1 inner join t2 on(t1.b1 t2.b2); Query failed: UnsupportedRelOperatorException: This query cannot be planned possibly due to either a cartesian join or an inequality join Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2176) IndexOutOfBoundsException for count(*) on a subquery which does order-by
[ https://issues.apache.org/jira/browse/DRILL-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365783#comment-14365783 ] Sean Hsuan-Yi Chu commented on DRILL-2176: -- I cannot reproduce this bug. Was it resolved?? IndexOutOfBoundsException for count(*) on a subquery which does order-by Key: DRILL-2176 URL: https://issues.apache.org/jira/browse/DRILL-2176 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 0.7.0 Reporter: Aman Sinha Assignee: Aman Sinha Fix For: 0.9.0 The IOBE occurs in creating the collation trait in Calcite. {code} 0: jdbc:drill:zk=local select count(*) from (select n_nationkey, n_regionkey from cp.`tpch/nation.parquet` order by 1, 2); Query failed: IndexOutOfBoundsException: index (1) must be less than size (1) {code} Full stack trace: {code} aused by: java.lang.IndexOutOfBoundsException: index (1) must be less than size (1) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305) ~[guava-14.0.1.jar:na] at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284) ~[guava-14.0.1.jar:na] at com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:45) ~[guava-14.0.1.jar:na] at org.eigenbase.rex.RexBuilder.makeInputRef(RexBuilder.java:764) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.rel.SortRel.init(SortRel.java:94) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.rel.SortRel.init(SortRel.java:59) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.rel.RelCollationTraitDef.convert(RelCollationTraitDef.java:78) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.rel.RelCollationTraitDef.convert(RelCollationTraitDef.java:1) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.relopt.volcano.VolcanoPlanner.changeTraitsUsingConverters(VolcanoPlanner.java:1011) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.relopt.volcano.VolcanoPlanner.changeTraitsUsingConverters(VolcanoPlanner.java:1102) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.relopt.volcano.AbstractConverter$ExpandConversionRule.onMatch(AbstractConverter.java:108) ~[optiq-core-0.9-drill-r18.jar:na] {code} This might be related to CALCITE-569 (and possibly DRILL-1978) but the stack traces are different, so I am treating this as a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2342) Nullability property of the view created from parquet file is not correct
[ https://issues.apache.org/jira/browse/DRILL-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victoria Markman updated DRILL-2342: Attachment: t1.parquet Table 't1' parquet file that was used in the query Nullability property of the view created from parquet file is not correct - Key: DRILL-2342 URL: https://issues.apache.org/jira/browse/DRILL-2342 Project: Apache Drill Issue Type: Bug Components: Metadata Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Venki Korukanti Priority: Critical Fix For: 0.9.0 Attachments: t1.parquet Here is my t1 table definition: {code} message root { optional int32 a1; optional binary b1 (UTF8); optional int32 c1 (DATE); } {code} I created a view on top of it: {code} 0: jdbc:drill:schema=dfs create view v1 as select cast(a1 as int), cast(b1 as varchar(10)), cast(c1 as date) from t1; +++ | ok | summary | +++ | true | View 'v1' created successfully in 'dfs.aggregation' schema | +++ 1 row selected (0.096 seconds) {code} IS_NULLABLE says 'NO', which is incorrect. {code} 0: jdbc:drill:schema=dfs describe v1; +-++-+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | +-++-+ | EXPR$0 | INTEGER| NO | | EXPR$1 | VARCHAR| NO | | EXPR$2 | DATE | NO | +-++-+ 3 rows selected (0.067 seconds) {code} It is dangerous potentially, because if Calcite decided to take advantage over this property tomorrow and create an optimization where if column is not nullable is null predicate can be dropped, query : select * from v1 where x is null would return incorrect result. {code} 0: jdbc:drill:schema=dfs explain plan for select * from v1 where z is null; +++ |text|json| +++ | 00-00Screen 00-01 Project(x=[$0], y=[$1], z=[$2]) 00-02SelectionVectorRemover 00-03 Filter(condition=[IS NULL($2)]) 00-04Project(x=[CAST($2):ANY NOT NULL], y=[CAST($1):ANY NOT NULL], z=[CAST($0):ANY NOT NULL]) 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/t1]], selectionRoot=/aggregation/t1, numFiles=1, columns=[`a1`, `b1`, `c1`]]]) {code} It seems to me that in views column properties should be always nullable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1833) Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data
[ https://issues.apache.org/jira/browse/DRILL-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365644#comment-14365644 ] Venki Korukanti commented on DRILL-1833: We currently store MapViewName, ViewLocation in ZK. When listing views (as part of SHOW TABLES), we take view list from ZK store and for each entry we check if the view definition exists in given location. As the view list is empty in ZK, we don't list any views in SHOW TABLES. When creating view, we create it by default in workspace schema location. Also when querying we refer directly to FileSystem for view definition. Info in ZK is redundant as we are always trusting the information in FileSystem. We can remove store view persistent info in ZK. One thing I must point is: In future if we support create view with custom view location, then it won't be visible in SHOW TABLES as SHOW TABLES only searches for .view.drill files in workspace root directory. This should be ok as the view created with custom location is considered external to schema. Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data --- Key: DRILL-1833 URL: https://issues.apache.org/jira/browse/DRILL-1833 Project: Apache Drill Issue Type: Bug Components: Storage - Information Schema Environment: git.commit.id.abbrev=2396670 Reporter: Xiao Meng Assignee: Venki Korukanti Fix For: 0.9.0 After wiping out the ZooKeeper data, the drillbit cannot automatically register the view into INFORMATION_SCHEMA.`TABLES` even after we query the view. For example, for a workspace dfs.tmp, there is a view file `varchar_view.view.drill` under the corresponding directory '/tmp'. We can query: {code} select * from dfs.test.`varchar_view` {code} But this view still won't show up in INFORMATION_SCHEMA.`TABLES`. After I recreate the view based on the contents of `varchar_view.view.drill`, the view shows in the INFORMATION_SCHEMA.`TABLES`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2414) Union-All on SELECT * FROM schema-less data source will throw exception
[ https://issues.apache.org/jira/browse/DRILL-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu updated DRILL-2414: - Fix Version/s: (was: 0.9.0) 0.8.0 Union-All on SELECT * FROM schema-less data source will throw exception --- Key: DRILL-2414 URL: https://issues.apache.org/jira/browse/DRILL-2414 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Sean Hsuan-Yi Chu Assignee: Aman Sinha Fix For: 0.8.0 Attachments: DRILL-2414.1.patch Union-All on SELECT * (wildcard symbol) is supported only for the cases where schema (i.e., hive, view) is available. For detailed design documentation, please refer to DRILL-2207. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2311) Create table with same columns of different case results in a java.lang.IllegalStateException
[ https://issues.apache.org/jira/browse/DRILL-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu updated DRILL-2311: - Fix Version/s: (was: 0.9.0) 0.8.0 Create table with same columns of different case results in a java.lang.IllegalStateException - Key: DRILL-2311 URL: https://issues.apache.org/jira/browse/DRILL-2311 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 0.8.0 Reporter: Ramana Inukonda Nagaraj Assignee: Sean Hsuan-Yi Chu Fix For: 0.8.0 Attachments: DRILL-2311.1.patch Doing a create table with same column in different case results in a runtime exception. This query should fail at planning or parsing. CREATE TABLE drill_parquet_mulCaseColumns3 as select cast( BIGINT_col as bigint) BIGINT_col,cast( DECIMAL9_col as decimal) bigint_col FROM dfs.`/user/root/alltypes.json`; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2441) Throw unsupported error message in case of inequality join
[ https://issues.apache.org/jira/browse/DRILL-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu resolved DRILL-2441. -- Resolution: Fixed Fix Version/s: (was: 0.9.0) 0.8.0 Throw unsupported error message in case of inequality join -- Key: DRILL-2441 URL: https://issues.apache.org/jira/browse/DRILL-2441 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Victoria Markman Assignee: Aman Sinha Fix For: 0.8.0 Attachments: DRILL-2441.1.patch Since we don't support inequality join, the whole class of queries will throw huge page long Can't plan exception This is a request to throw a nice error message that we throw in case of cartesian join in these cases as well. {code} select * from t1 left outer join t2 on (t1.a1 = t2.a2 and t1.b2 t2.b2); select * from t1 right outer join t2 on (t1.a1 = t2.a2 and t1.b2 t2.b2); {code} Example of an exception: {code} 0: jdbc:drill:schema=dfs select * from t1 inner join t2 on(t1.b1 t2.b2); Query failed: UnsupportedRelOperatorException: This query cannot be planned possibly due to either a cartesian join or an inequality join Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2002) Confusing star behavior in UNION ALL operator
[ https://issues.apache.org/jira/browse/DRILL-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu updated DRILL-2002: - Fix Version/s: (was: 0.9.0) 0.8.0 Confusing star behavior in UNION ALL operator --- Key: DRILL-2002 URL: https://issues.apache.org/jira/browse/DRILL-2002 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Sean Hsuan-Yi Chu Fix For: 0.8.0 t1.json {code} { a1: 1 ,b1 : 1} { a1: 2 ,b1 : 1} { a1: 2 ,b1 : 2} { a1: 3 ,b1 : 2} { a1: null , b1 : 3} {code} Star in both legs of UNION ALL works: {code} 0: jdbc:drill:schema=dfs select * from `t1.json` union all select * from `t1.json`; +++ | a1 | b1 | +++ | 1 | 1 | | 2 | 1 | | 2 | 2 | | 3 | 2 | | null | 3 | | 1 | 1 | | 2 | 1 | | 2 | 2 | | 3 | 2 | | null | 3 | +++ 10 rows selected (0.126 seconds) {code} I expected this to work in structured, but it seems that since planner has no idea about meta data, error message seems reasonable: {code} 0: jdbc:drill:schema=dfs select a1, b1 from `t1.json` union all select * from `t1.json`; Query failed: Query failed: Failure validating SQL. org.eigenbase.util.EigenbaseContextException: At line 1, column 47: Column count mismatch in UNION ALL Error: exception while executing query: Failure while executing query. (state=,code=0) {code} Query below returns very confusing result. I expected it to error out like the query above: {code} 0: jdbc:drill:schema=dfs select a1 from `t1.json` union all select * from `t1.json`; ++ | a1 | ++ | 1 | | 2 | | 2 | | 3 | | null | | 1 | | 2 | | 2 | | 3 | | null | ++ 10 rows selected (0.111 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-1833) Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data
[ https://issues.apache.org/jira/browse/DRILL-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated DRILL-1833: --- Attachment: DRILL-1833-1.patch Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data --- Key: DRILL-1833 URL: https://issues.apache.org/jira/browse/DRILL-1833 Project: Apache Drill Issue Type: Bug Components: Storage - Information Schema Environment: git.commit.id.abbrev=2396670 Reporter: Xiao Meng Assignee: Venki Korukanti Fix For: 0.9.0 Attachments: DRILL-1833-1.patch After wiping out the ZooKeeper data, the drillbit cannot automatically register the view into INFORMATION_SCHEMA.`TABLES` even after we query the view. For example, for a workspace dfs.tmp, there is a view file `varchar_view.view.drill` under the corresponding directory '/tmp'. We can query: {code} select * from dfs.test.`varchar_view` {code} But this view still won't show up in INFORMATION_SCHEMA.`TABLES`. After I recreate the view based on the contents of `varchar_view.view.drill`, the view shows in the INFORMATION_SCHEMA.`TABLES`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2275) need implementations of sys tables for drill memory and threads profiles
[ https://issues.apache.org/jira/browse/DRILL-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheesh Katkam updated DRILL-2275: --- Assignee: Jacques Nadeau (was: Sudheesh Katkam) need implementations of sys tables for drill memory and threads profiles Key: DRILL-2275 URL: https://issues.apache.org/jira/browse/DRILL-2275 Project: Apache Drill Issue Type: Task Components: Metadata Reporter: Zhiyong Liu Assignee: Jacques Nadeau Priority: Critical Fix For: 0.9.0 Attachments: DRILL-2275.1.patch.txt, DRILL-2275.2.patch.txt, DRILL-2275.3.patch.txt, DRILL-2275.4.patch.txt In order to check drill state information, the following tables are to be implemented: 1. Memory: a query such as select * from sys.drillmemory; should return a result set like the following: +++--+++ |drillbit| total_sys_memory |heap_size | direct_alloc_memory | +++--+++ | node1:port1 | 24596676k | 15200420k | 1012372k | +++--+++ | node2:port2 | 24596676k | 15200420k | 2012372k | +++--+++ 2. Threads: For each node in a cluster, we need counts of threads of the drillbits. A query like this: select * from sys.drillbitthreads; should return a result set like the following: +++--+++ |drillbit| pool_name | total_threads | busy_threads | +++--+++ | node1:port1 | pool1 | 8 | 2 | +++--+++ | node2:port2 | pool2 | 10 | 5 | +++--+++ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2413) FileSystemPlugin refactoring: avoid sharing DrillFileSystem across schemas
[ https://issues.apache.org/jira/browse/DRILL-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated DRILL-2413: --- Attachment: DRILL-2413-1.patch FileSystemPlugin refactoring: avoid sharing DrillFileSystem across schemas -- Key: DRILL-2413 URL: https://issues.apache.org/jira/browse/DRILL-2413 Project: Apache Drill Issue Type: Sub-task Components: Metadata, Storage - Information Schema Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: 0.9.0 Attachments: DRILL-2413-1.patch Currently we create one DrillFileSystem (an extension of hadoop FileSystem) instance and share it across all Workspaces created for all queries, FormatPlugins and FormatMatcher. Remove the shared DrillFileSystem instead share the DrillFileSystem configuration and create a DrillFileSystem in each Schema (WorkspaceSchema) using the current user credentials in Schema. The same DrillFileSystem instances to passed to FormatPlugins and FormatMatchers whenever Schemas need to access the file system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results
[ https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365707#comment-14365707 ] Sean Hsuan-Yi Chu commented on DRILL-2380: -- Drill gave the same result as Postgres: +---+-+ | i_manufact_id | total_sales | +---+-+ | 930 | 1.18| | 818 | 41.86 | | 913 | 141.9 | | 784 | 184.9 | | 488 | 275.08 | | 993 | 301.6 | | 700 | 340.520004 | | 895 | 802.3 | | 766 | 839.76 | | 858 | 859.18 | +---+-+ 10 rows selected (21.237 seconds) TPC-DS Query 33 and simplified variants return wrong results Key: DRILL-2380 URL: https://issues.apache.org/jira/browse/DRILL-2380 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Sean Hsuan-Yi Chu Priority: Critical Fix For: 0.9.0 TPC-DS query 33 returns wrong results. {code:sql} WITH ss AS (SELECT i_manufact_id, Sum(ss_ext_sales_price) total_sales FROM store_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ss_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), cs AS (SELECT i_manufact_id, Sum(cs_ext_sales_price) total_sales FROM catalog_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND cs_item_sk = i_item_sk AND cs_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND cs_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), ws AS (SELECT i_manufact_id, Sum(ws_ext_sales_price) total_sales FROM web_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ws_item_sk = i_item_sk AND ws_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ws_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id) SELECT i_manufact_id, Sum(total_sales) total_sales FROM (SELECT i_manufact_id, total_sales FROM ss UNION ALL SELECT i_manufact_id, total_sales FROM cs UNION ALL SELECT i_manufact_id, total_sales FROM ws) tmp1 GROUP BY i_manufact_id ORDER BY total_sales LIMIT 10; Drill Results: +---+-+ | i_manufact_id | total_sales | +---+-+ | 440 | 0.12| | 434 | 13.16 | | 415 | 14.04 | | 449 | 15.63 | | 563 | 31.46 | | 357 | 49.50 | | 624 | 67.94 | | 192 | 74.40 | | 137 | 83.42 | | 240 | 85.26 | +---+-+ 10 rows selected (7.57 seconds) Postgres Results: i_manufact_id | total_sales ---+- 930 |1.18 818 | 41.86 913 | 141.90 784 | 184.90 488 | 275.08 993 | 301.60 700 | 340.52 895 | 802.30 766 | 839.76 858 | 859.18 (10 rows) {code} The following simplified variants also return wrong results: {code:sql} SELECT sum(x) FROM (SELECT ss_ext_sales_price x, ss_item_sk FROM store_sales GROUP BY ss_item_sk, ss_ext_sales_price UNION ALL SELECT cs_ext_sales_price x, cs_item_sk FROM catalog_sales GROUP BY cs_item_sk, cs_ext_sales_price) tmp
[jira] [Updated] (DRILL-2309) 'null' is counted with subquery
[ https://issues.apache.org/jira/browse/DRILL-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-2309: --- Attachment: DRILL-2309.patch [~amansinha100] can you please review. 'null' is counted with subquery --- Key: DRILL-2309 URL: https://issues.apache.org/jira/browse/DRILL-2309 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 0.8.0 Reporter: Chun Chang Assignee: Mehant Baid Priority: Critical Fix For: 0.9.0 Attachments: DRILL-2309.patch #Thu Feb 19 18:40:10 EST 2015 git.commit.id.abbrev=1ceddff The following query returns correct count involving columns that contains null value. {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi; +++ |gbyi| EXPR$1 | +++ | 0 | 33580 | | 1 | 33317 | | 2 | 33438 | | 3 | 33535 | | 4 | 33369 | | 5 | 32990 | | 6 | 33661 | | 7 | 33130 | | 8 | 33362 | | 9 | 33364 | | 10 | 33229 | | 11 | 33567 | | 12 | 33379 | | 13 | 33045 | | 14 | 33305 | +++ {code} But if you add more aggregation to the query, the returned count is wrong (pay attention to the last column). {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, sum(tt.id), avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi; +++++ |gbyi| EXPR$1 | EXPR$2 | EXPR$3 | +++++ | 0 | 33445554017 | 499613.0956877819 | 66943 | | 1 | 33209358334 | 500760.0252919893 | 66318 | | 2 | 33369118041 | 498091.82200273 | 66994 | | 3 | 33254533860 | 498696.5063226428 | 66683 | | 4 | 33393965595 | 501125.64656145993 | 66638 | | 5 | 33216885506 | 499961.32710397616 | 66439 | | 6 | 33380205950 | 498875.3923256599 | 66911 | | 7 | 33405849390 | 501093.43067788356 | 6 | | 8 | 33136951190 | 498458.1044031481 | 66479 | | 9 | 33319291474 | 499967.5392457864 | 66643 | | 10 | 937 | 499190.47462408233 | 66787 | | 11 | 33571590550 | 502095.86682194035 | 66863 | | 12 | 33437342090 | 501708.8141502653 | 66647 | | 13 | 33071800925 | 498896.453904129 | 66290 | | 14 | 33448664191 | 501487.4206955959 | 66699 | +++++ [code} plan for the query returned the wrong result: {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ explain plan for select tt.gbyi, sum(tt.id), avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi; +++ |text|json| +++ | 00-00Screen 00-01 Project(gbyi=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3]) 00-02SingleMergeExchange(sort0=[0 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], dir0=[ASC]) 01-03 Project(gbyi=[$0], EXPR$1=[CASE(=($2, 0), null, $1)], EXPR$2=[CAST(/(CastHigh(CASE(=($4, 0), null, $3)), $4)):ANY], EXPR$3=[$5]) 01-04HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[$SUM0($2)], agg#2=[$SUM0($3)], agg#3=[$SUM0($4)], EXPR$3=[$SUM0($5)]) 01-05 HashToRandomExchange(dist0=[[$0]]) 02-01HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT($1)], agg#2=[$SUM0($2)], agg#3=[COUNT($2)], EXPR$3=[COUNT()]) 02-02 Project(gbyi=[$3], id=[$2], fl=[$1], nul=[$0]) 02-03Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, columns=[`gbyi`, `id`, `fl`, `nul`], files=[maprfs:/drill/testdata/complex_type/json/complex.json]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2309) 'null' is counted with subquery
[ https://issues.apache.org/jira/browse/DRILL-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-2309: --- Attachment: (was: DRILL-2309.patch) 'null' is counted with subquery --- Key: DRILL-2309 URL: https://issues.apache.org/jira/browse/DRILL-2309 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 0.8.0 Reporter: Chun Chang Assignee: Mehant Baid Priority: Critical Fix For: 0.9.0 #Thu Feb 19 18:40:10 EST 2015 git.commit.id.abbrev=1ceddff The following query returns correct count involving columns that contains null value. {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi; +++ |gbyi| EXPR$1 | +++ | 0 | 33580 | | 1 | 33317 | | 2 | 33438 | | 3 | 33535 | | 4 | 33369 | | 5 | 32990 | | 6 | 33661 | | 7 | 33130 | | 8 | 33362 | | 9 | 33364 | | 10 | 33229 | | 11 | 33567 | | 12 | 33379 | | 13 | 33045 | | 14 | 33305 | +++ {code} But if you add more aggregation to the query, the returned count is wrong (pay attention to the last column). {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, sum(tt.id), avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi; +++++ |gbyi| EXPR$1 | EXPR$2 | EXPR$3 | +++++ | 0 | 33445554017 | 499613.0956877819 | 66943 | | 1 | 33209358334 | 500760.0252919893 | 66318 | | 2 | 33369118041 | 498091.82200273 | 66994 | | 3 | 33254533860 | 498696.5063226428 | 66683 | | 4 | 33393965595 | 501125.64656145993 | 66638 | | 5 | 33216885506 | 499961.32710397616 | 66439 | | 6 | 33380205950 | 498875.3923256599 | 66911 | | 7 | 33405849390 | 501093.43067788356 | 6 | | 8 | 33136951190 | 498458.1044031481 | 66479 | | 9 | 33319291474 | 499967.5392457864 | 66643 | | 10 | 937 | 499190.47462408233 | 66787 | | 11 | 33571590550 | 502095.86682194035 | 66863 | | 12 | 33437342090 | 501708.8141502653 | 66647 | | 13 | 33071800925 | 498896.453904129 | 66290 | | 14 | 33448664191 | 501487.4206955959 | 66699 | +++++ [code} plan for the query returned the wrong result: {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ explain plan for select tt.gbyi, sum(tt.id), avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi; +++ |text|json| +++ | 00-00Screen 00-01 Project(gbyi=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3]) 00-02SingleMergeExchange(sort0=[0 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], dir0=[ASC]) 01-03 Project(gbyi=[$0], EXPR$1=[CASE(=($2, 0), null, $1)], EXPR$2=[CAST(/(CastHigh(CASE(=($4, 0), null, $3)), $4)):ANY], EXPR$3=[$5]) 01-04HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[$SUM0($2)], agg#2=[$SUM0($3)], agg#3=[$SUM0($4)], EXPR$3=[$SUM0($5)]) 01-05 HashToRandomExchange(dist0=[[$0]]) 02-01HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT($1)], agg#2=[$SUM0($2)], agg#3=[COUNT($2)], EXPR$3=[COUNT()]) 02-02 Project(gbyi=[$3], id=[$2], fl=[$1], nul=[$0]) 02-03Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, columns=[`gbyi`, `id`, `fl`, `nul`], files=[maprfs:/drill/testdata/complex_type/json/complex.json]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2491) Fix use of injectable QueryDateTimeInfo in localtimestamp()
[ https://issues.apache.org/jira/browse/DRILL-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-2491: --- Attachment: DRILL-2491.patch Fix use of injectable QueryDateTimeInfo in localtimestamp() --- Key: DRILL-2491 URL: https://issues.apache.org/jira/browse/DRILL-2491 Project: Apache Drill Issue Type: Bug Affects Versions: 0.8.0 Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 0.8.0 Attachments: DRILL-2491.patch After the recent changes to remove RecordBatch from the setup() method of UDF's we introduced a new injectable QueryDateTimeInfo to store the query's start timestamp and timezone information. However seems like in one of the UDF's (localtimestamp) this injectable was not correctly used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation
[ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha updated DRILL-2488: -- Attachment: 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch It turns out to be an issue with supported encoding for MergeJoin. Merge Join execution operator currently does not process incoming batches with SV2 or SV4, so if there was a Limit or Sort below, we need to insert a SelectionVectorRemover below the MJ. Uploaded a patch with a simple fix. [~vkorukanti] could you please review ? I haven't run all tests yet..still in process. Wrong result on join between two subqueries with aggregation Key: DRILL-2488 URL: https://issues.apache.org/jira/browse/DRILL-2488 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Aman Sinha Priority: Critical Fix For: 0.9.0 Attachments: 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch, t1.parquet {code} 0: jdbc:drill:schema=dfs select * from t1; ++++ | a1 | b1 | c1 | ++++ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null | | 10 | j | 2015-01-10 | ++++ 10 rows selected (0.15 seconds) {code} This result is incorrect, one row is missing {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq2(x1, y1) . . . . . . . . . . . . on . . . . . . . . . . . . sq1.x1 = sq2.x1 and . . . . . . . . . . . . sq2.y1 = sq2.y1 . . . . . . . . . . . . ; +++++ | x1 | y1 |x10 |y10 | +++++ | b | 1 | b | 1 | | c | 1 | c | 1 | | e | 1 | e | 1 | | f | 1 | f | 1 | +++++ 4 rows selected (0.28 seconds) {code} Explain plan for the wrong result: {code} 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Limit(offset=[1], fetch=[5]) 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-09Sort(sort0=[$0], dir0=[ASC]) 00-11 StreamAgg(group=[{0, 1}]) 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) 00-04Project(b10=[$0], EXPR$10=[$1]) 00-06 SelectionVectorRemover 00-08Sort(sort0=[$0], dir0=[ASC]) 00-10 Filter(condition=[=($1, $1)]) 00-12Limit(offset=[1], fetch=[5]) 00-14
[jira] [Commented] (DRILL-2491) Fix use of injectable QueryDateTimeInfo in localtimestamp()
[ https://issues.apache.org/jira/browse/DRILL-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366432#comment-14366432 ] Jason Altekruse commented on DRILL-2491: +1 Fix use of injectable QueryDateTimeInfo in localtimestamp() --- Key: DRILL-2491 URL: https://issues.apache.org/jira/browse/DRILL-2491 Project: Apache Drill Issue Type: Bug Affects Versions: 0.8.0 Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 0.8.0 Attachments: DRILL-2491.patch After the recent changes to remove RecordBatch from the setup() method of UDF's we introduced a new injectable QueryDateTimeInfo to store the query's start timestamp and timezone information. However seems like in one of the UDF's (localtimestamp) this injectable was not correctly used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2488) Wrong result on join between two subqueries with aggregation
[ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366531#comment-14366531 ] Venki Korukanti commented on DRILL-2488: Looks good, +1. Wrong result on join between two subqueries with aggregation Key: DRILL-2488 URL: https://issues.apache.org/jira/browse/DRILL-2488 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Aman Sinha Priority: Critical Fix For: 0.9.0 Attachments: 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch, t1.parquet {code} 0: jdbc:drill:schema=dfs select * from t1; ++++ | a1 | b1 | c1 | ++++ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null | | 10 | j | 2015-01-10 | ++++ 10 rows selected (0.15 seconds) {code} This result is incorrect, one row is missing {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq2(x1, y1) . . . . . . . . . . . . on . . . . . . . . . . . . sq1.x1 = sq2.x1 and . . . . . . . . . . . . sq2.y1 = sq2.y1 . . . . . . . . . . . . ; +++++ | x1 | y1 |x10 |y10 | +++++ | b | 1 | b | 1 | | c | 1 | c | 1 | | e | 1 | e | 1 | | f | 1 | f | 1 | +++++ 4 rows selected (0.28 seconds) {code} Explain plan for the wrong result: {code} 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Limit(offset=[1], fetch=[5]) 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-09Sort(sort0=[$0], dir0=[ASC]) 00-11 StreamAgg(group=[{0, 1}]) 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) 00-04Project(b10=[$0], EXPR$10=[$1]) 00-06 SelectionVectorRemover 00-08Sort(sort0=[$0], dir0=[ASC]) 00-10 Filter(condition=[=($1, $1)]) 00-12Limit(offset=[1], fetch=[5]) 00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-16Sort(sort0=[$0], dir0=[ASC]) 00-17 StreamAgg(group=[{0, 1}]) 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-19 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]],
[jira] [Updated] (DRILL-2309) Selecting count(), avg() of nullable columns causes wrong results
[ https://issues.apache.org/jira/browse/DRILL-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-2309: --- Summary: Selecting count(), avg() of nullable columns causes wrong results (was: 'null' is counted with subquery) Selecting count(), avg() of nullable columns causes wrong results - Key: DRILL-2309 URL: https://issues.apache.org/jira/browse/DRILL-2309 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 0.8.0 Reporter: Chun Chang Assignee: Mehant Baid Priority: Critical Fix For: 0.9.0 Attachments: DRILL-2309.patch #Thu Feb 19 18:40:10 EST 2015 git.commit.id.abbrev=1ceddff The following query returns correct count involving columns that contains null value. {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi; +++ |gbyi| EXPR$1 | +++ | 0 | 33580 | | 1 | 33317 | | 2 | 33438 | | 3 | 33535 | | 4 | 33369 | | 5 | 32990 | | 6 | 33661 | | 7 | 33130 | | 8 | 33362 | | 9 | 33364 | | 10 | 33229 | | 11 | 33567 | | 12 | 33379 | | 13 | 33045 | | 14 | 33305 | +++ {code} But if you add more aggregation to the query, the returned count is wrong (pay attention to the last column). {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, sum(tt.id), avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi; +++++ |gbyi| EXPR$1 | EXPR$2 | EXPR$3 | +++++ | 0 | 33445554017 | 499613.0956877819 | 66943 | | 1 | 33209358334 | 500760.0252919893 | 66318 | | 2 | 33369118041 | 498091.82200273 | 66994 | | 3 | 33254533860 | 498696.5063226428 | 66683 | | 4 | 33393965595 | 501125.64656145993 | 66638 | | 5 | 33216885506 | 499961.32710397616 | 66439 | | 6 | 33380205950 | 498875.3923256599 | 66911 | | 7 | 33405849390 | 501093.43067788356 | 6 | | 8 | 33136951190 | 498458.1044031481 | 66479 | | 9 | 33319291474 | 499967.5392457864 | 66643 | | 10 | 937 | 499190.47462408233 | 66787 | | 11 | 33571590550 | 502095.86682194035 | 66863 | | 12 | 33437342090 | 501708.8141502653 | 66647 | | 13 | 33071800925 | 498896.453904129 | 66290 | | 14 | 33448664191 | 501487.4206955959 | 66699 | +++++ [code} plan for the query returned the wrong result: {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ explain plan for select tt.gbyi, sum(tt.id), avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi; +++ |text|json| +++ | 00-00Screen 00-01 Project(gbyi=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3]) 00-02SingleMergeExchange(sort0=[0 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], dir0=[ASC]) 01-03 Project(gbyi=[$0], EXPR$1=[CASE(=($2, 0), null, $1)], EXPR$2=[CAST(/(CastHigh(CASE(=($4, 0), null, $3)), $4)):ANY], EXPR$3=[$5]) 01-04HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[$SUM0($2)], agg#2=[$SUM0($3)], agg#3=[$SUM0($4)], EXPR$3=[$SUM0($5)]) 01-05 HashToRandomExchange(dist0=[[$0]]) 02-01HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT($1)], agg#2=[$SUM0($2)], agg#3=[COUNT($2)], EXPR$3=[COUNT()]) 02-02 Project(gbyi=[$3], id=[$2], fl=[$1], nul=[$0]) 02-03Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, columns=[`gbyi`, `id`, `fl`, `nul`], files=[maprfs:/drill/testdata/complex_type/json/complex.json]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2309) 'null' is counted with subquery
[ https://issues.apache.org/jira/browse/DRILL-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-2309: --- Attachment: DRILL-2309.patch Minor update to the patch. 'null' is counted with subquery --- Key: DRILL-2309 URL: https://issues.apache.org/jira/browse/DRILL-2309 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 0.8.0 Reporter: Chun Chang Assignee: Mehant Baid Priority: Critical Fix For: 0.9.0 Attachments: DRILL-2309.patch #Thu Feb 19 18:40:10 EST 2015 git.commit.id.abbrev=1ceddff The following query returns correct count involving columns that contains null value. {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi; +++ |gbyi| EXPR$1 | +++ | 0 | 33580 | | 1 | 33317 | | 2 | 33438 | | 3 | 33535 | | 4 | 33369 | | 5 | 32990 | | 6 | 33661 | | 7 | 33130 | | 8 | 33362 | | 9 | 33364 | | 10 | 33229 | | 11 | 33567 | | 12 | 33379 | | 13 | 33045 | | 14 | 33305 | +++ {code} But if you add more aggregation to the query, the returned count is wrong (pay attention to the last column). {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ select tt.gbyi, sum(tt.id), avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi; +++++ |gbyi| EXPR$1 | EXPR$2 | EXPR$3 | +++++ | 0 | 33445554017 | 499613.0956877819 | 66943 | | 1 | 33209358334 | 500760.0252919893 | 66318 | | 2 | 33369118041 | 498091.82200273 | 66994 | | 3 | 33254533860 | 498696.5063226428 | 66683 | | 4 | 33393965595 | 501125.64656145993 | 66638 | | 5 | 33216885506 | 499961.32710397616 | 66439 | | 6 | 33380205950 | 498875.3923256599 | 66911 | | 7 | 33405849390 | 501093.43067788356 | 6 | | 8 | 33136951190 | 498458.1044031481 | 66479 | | 9 | 33319291474 | 499967.5392457864 | 66643 | | 10 | 937 | 499190.47462408233 | 66787 | | 11 | 33571590550 | 502095.86682194035 | 66863 | | 12 | 33437342090 | 501708.8141502653 | 66647 | | 13 | 33071800925 | 498896.453904129 | 66290 | | 14 | 33448664191 | 501487.4206955959 | 66699 | +++++ [code} plan for the query returned the wrong result: {code} 0: jdbc:drill:schema=dfs.drillTestDirComplexJ explain plan for select tt.gbyi, sum(tt.id), avg(tt.fl), count(tt.nul) from (select t.id, t.gbyi, t.fl, t.nul from `complex.json` t) tt group by tt.gbyi order by tt.gbyi; +++ |text|json| +++ | 00-00Screen 00-01 Project(gbyi=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3]) 00-02SingleMergeExchange(sort0=[0 ASC]) 01-01 SelectionVectorRemover 01-02Sort(sort0=[$0], dir0=[ASC]) 01-03 Project(gbyi=[$0], EXPR$1=[CASE(=($2, 0), null, $1)], EXPR$2=[CAST(/(CastHigh(CASE(=($4, 0), null, $3)), $4)):ANY], EXPR$3=[$5]) 01-04HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[$SUM0($2)], agg#2=[$SUM0($3)], agg#3=[$SUM0($4)], EXPR$3=[$SUM0($5)]) 01-05 HashToRandomExchange(dist0=[[$0]]) 02-01HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT($1)], agg#2=[$SUM0($2)], agg#3=[COUNT($2)], EXPR$3=[COUNT()]) 02-02 Project(gbyi=[$3], id=[$2], fl=[$1], nul=[$0]) 02-03Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, columns=[`gbyi`, `id`, `fl`, `nul`], files=[maprfs:/drill/testdata/complex_type/json/complex.json]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2483) Make buffer that rows are read into during execution configurable for testing purposes
[ https://issues.apache.org/jira/browse/DRILL-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366224#comment-14366224 ] Victoria Markman commented on DRILL-2483: - Hakim (thank you) pointed me to a discussion on dev mailing list that happened two months ago: https://www.mail-archive.com/dev%40drill.apache.org/msg00551.html Make buffer that rows are read into during execution configurable for testing purposes -- Key: DRILL-2483 URL: https://issues.apache.org/jira/browse/DRILL-2483 Project: Apache Drill Issue Type: Wish Reporter: Victoria Markman We've found a bug recently where if table had multiple duplicate rows and duplicate rows span multiple buffers, merge join returned wrong result. Test case had a table with 10,000 rows. The same problem could be reproduced on a much smaller data set if buffer size was configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation
[ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victoria Markman updated DRILL-2488: Attachment: t1.parquet Wrong result on join between two subqueries with aggregation Key: DRILL-2488 URL: https://issues.apache.org/jira/browse/DRILL-2488 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Chris Westin Priority: Critical Fix For: 0.9.0 Attachments: t1.parquet {code} 0: jdbc:drill:schema=dfs select * from t1; ++++ | a1 | b1 | c1 | ++++ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null | | 10 | j | 2015-01-10 | ++++ 10 rows selected (0.15 seconds) {code} This result is incorrect, one row is missing {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq2(x1, y1) . . . . . . . . . . . . on . . . . . . . . . . . . sq1.x1 = sq2.x1 and . . . . . . . . . . . . sq2.y1 = sq2.y1 . . . . . . . . . . . . ; +++++ | x1 | y1 |x10 |y10 | +++++ | b | 1 | b | 1 | | c | 1 | c | 1 | | e | 1 | e | 1 | | f | 1 | f | 1 | +++++ 4 rows selected (0.28 seconds) {code} Explain plan for the wrong result: {code} 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Limit(offset=[1], fetch=[5]) 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-09Sort(sort0=[$0], dir0=[ASC]) 00-11 StreamAgg(group=[{0, 1}]) 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) 00-04Project(b10=[$0], EXPR$10=[$1]) 00-06 SelectionVectorRemover 00-08Sort(sort0=[$0], dir0=[ASC]) 00-10 Filter(condition=[=($1, $1)]) 00-12Limit(offset=[1], fetch=[5]) 00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-16Sort(sort0=[$0], dir0=[ASC]) 00-17 StreamAgg(group=[{0, 1}]) 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-19 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) {code} If you turn off
[jira] [Updated] (DRILL-2438) Query on views with Avg on integer column returns wrong result
[ https://issues.apache.org/jira/browse/DRILL-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated DRILL-2438: --- Assignee: Mehant Baid (was: Venki Korukanti) Query on views with Avg on integer column returns wrong result -- Key: DRILL-2438 URL: https://issues.apache.org/jira/browse/DRILL-2438 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Mehant Baid Priority: Critical Fix For: 0.9.0 Git.Commit.ID: b3bdc27 (Mar 10) Average on an integer column returns an (inaccurate) integer value, instead of an (accurate) decimal value. *The following query returns wrong results:* {code:sql} SELECT i_item_id, avg(i_manufact_id) agg1 . . . . . . . . . . . . . . . . . FROM item . . . . . . . . . . . . . . . . . GROUP BY i_item_id . . . . . . . . . . . . . . . . . ORDER BY i_item_id . . . . . . . . . . . . . . . . . LIMIT 5; +++ | i_item_id |agg1| +++ | AAAB | 152| | AAAC | 187| | AAAE | 251| | AABA | 199| | AABB | 636| +++ 5 rows selected (0.324 seconds) {code} *Postgres results:* {code:sql} # SELECT i_item_id, avg(i_manufact_id) agg1 tpcds1_new-# FROM item tpcds1_new-# GROUP BY i_item_id tpcds1_new-# ORDER BY i_item_id tpcds1_new-# LIMIT 5; i_item_id | agg1 --+-- AAAB | 152. AAAC | 373. AAAE | 251. AABA | 198.6667 AABB | 636. (5 rows) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2488) Wrong result on join between two subqueries with aggregation
[ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366331#comment-14366331 ] Victoria Markman commented on DRILL-2488: - {code} #Fri Mar 13 17:54:51 EDT 2015 git.commit.id.abbrev=7b4c887 {code} Wrong result on join between two subqueries with aggregation Key: DRILL-2488 URL: https://issues.apache.org/jira/browse/DRILL-2488 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Chris Westin Priority: Critical Fix For: 0.9.0 Attachments: t1.parquet {code} 0: jdbc:drill:schema=dfs select * from t1; ++++ | a1 | b1 | c1 | ++++ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null | | 10 | j | 2015-01-10 | ++++ 10 rows selected (0.15 seconds) {code} This result is incorrect, one row is missing {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq2(x1, y1) . . . . . . . . . . . . on . . . . . . . . . . . . sq1.x1 = sq2.x1 and . . . . . . . . . . . . sq2.y1 = sq2.y1 . . . . . . . . . . . . ; +++++ | x1 | y1 |x10 |y10 | +++++ | b | 1 | b | 1 | | c | 1 | c | 1 | | e | 1 | e | 1 | | f | 1 | f | 1 | +++++ 4 rows selected (0.28 seconds) {code} Explain plan for the wrong result: {code} 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Limit(offset=[1], fetch=[5]) 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-09Sort(sort0=[$0], dir0=[ASC]) 00-11 StreamAgg(group=[{0, 1}]) 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) 00-04Project(b10=[$0], EXPR$10=[$1]) 00-06 SelectionVectorRemover 00-08Sort(sort0=[$0], dir0=[ASC]) 00-10 Filter(condition=[=($1, $1)]) 00-12Limit(offset=[1], fetch=[5]) 00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-16Sort(sort0=[$0], dir0=[ASC]) 00-17 StreamAgg(group=[{0, 1}]) 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-19 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]],
[jira] [Created] (DRILL-2491) Fix use of injectable QueryDateTimeInfo in localtimestamp()
Mehant Baid created DRILL-2491: -- Summary: Fix use of injectable QueryDateTimeInfo in localtimestamp() Key: DRILL-2491 URL: https://issues.apache.org/jira/browse/DRILL-2491 Project: Apache Drill Issue Type: Bug Affects Versions: 0.8.0 Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 0.8.0 After the recent changes to remove RecordBatch from the setup() method of UDF's we introduced a new injectable QueryDateTimeInfo to store the query's start timestamp and timezone information. However seems like in one of the UDF's (localtimestamp) this injectable was not correctly used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2143) Remove RecordBatch from setup method of DrillFunc interface
[ https://issues.apache.org/jira/browse/DRILL-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2143. Resolution: Fixed Resolved in bff7b9ef5a9f345908aca160a97b98f6ab187708 and 1c5decc17cf38cbf4a4119d7ca19653cb19e1b53 Remove RecordBatch from setup method of DrillFunc interface --- Key: DRILL-2143 URL: https://issues.apache.org/jira/browse/DRILL-2143 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Reporter: Jason Altekruse Assignee: Jason Altekruse Fix For: 0.8.0 Attachments: DRILL-2143-part1-feb-27.patch, DRILL-2143-part1-feb-6.patch, DRILL-2143-part1-mar-3.patch, DRILL-2143-part2-15-mar-15.patch, DRILL-2143-part2-feb-27.patch, DRILL-2143-part2-feb-6.patch, DRILL-2143-part2-mar-3.patch, DRILL-2143-remove-record-batch-from-udfs.patch Drill UDFs currently are exposed to too much system state by receiving a reference to a RecordBatch in their setup method. This is not necessary as all of the schema change triggered operator functionality is handled outside of UDFs (the UDFS themselves are actually required to define a specific type they take as input, except in the case of complex types (maps and lists)). The only remaining artifact left from this interface is the date/time functions that ask for the query start time or current timezone. This can be provided to functions using a new injectable type, as DrillBufs are provided to functions currently. For more info read here: http://mail-archives.apache.org/mod_mbox/drill-dev/201501.mbox/%3ccampyv7ac_-9u4irz+5fxoenzbojctovjronn0qri4bqzf53...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2486) Return format differences between drill odbc from interval date queries
[ https://issues.apache.org/jira/browse/DRILL-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366197#comment-14366197 ] Daniel Barclay (Drill) commented on DRILL-2486: --- The result from ODBC seems easier to read. Except that the units aren't explicit in the ODBC output. Note the SQLLine output above follows the standard format for durations from ISO 8601 and [XML Schema Part 2: Datatypes sect; 3.2.6 duration|http://www.w3.org/TR/xmlschema-2/#duration]. Return format differences between drill odbc from interval date queries --- Key: DRILL-2486 URL: https://issues.apache.org/jira/browse/DRILL-2486 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 0.8.0 Reporter: Krystal Assignee: Daniel Barclay (Drill) Priority: Minor git.commit.id=ae2053d2a078a40033a140f2dfaeef802a5e8254 The format of results from interval date queries is different between drill and odbc. Below are some examples. From drill: SELECT interval '10' day from basic limit 1; ++ | EXPR$0 | ++ | P10D | ++ SELECT interval '12-11' year to month from basic limit 1; ++ | EXPR$0 | ++ | P12Y11M| SELECT interval '1' year from basic limit 1; ++ | EXPR$0 | ++ | P1Y| ++ SELECT interval '9' month from basic limit 1; ++ | EXPR$0 | ++ | P9M| ++ From ODBC: SQL SELECT interval '10' day from basic limit 1 +---+ | EXPR$0| +---+ | 10 00:00:00.00| +---+ SQL SELECT interval '12-11' year to month from basic limit 1 +--+ | EXPR$0 | +--+ | 12-11| +--+ SQL SELECT interval '1' year from basic limit 1 +--+ | EXPR$0 | +--+ | 1-00 | +--+ SQL SELECT interval '9' month from basic limit 1 +--+ | EXPR$0 | +--+ | 0-09 | +--+ We should have consistent output from the 2 sources. The result from ODBC seems easier to read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2487) Schema is ignored when using : between schema and zk on sqlline connection string
Krystal created DRILL-2487: -- Summary: Schema is ignored when using : between schema and zk on sqlline connection string Key: DRILL-2487 URL: https://issues.apache.org/jira/browse/DRILL-2487 Project: Apache Drill Issue Type: Bug Components: Client - CLI Affects Versions: 0.8.0 Reporter: Krystal Assignee: Daniel Barclay (Drill) git.commit.id=ae2053d2a078a40033a140f2dfaeef802a5e8254 Invoking sqlline using a : between the schema and zk causes sqlline not to connect the specified schema. For example: root@qa-node113:~# /opt/drill/bin/sqlline -u 'jdbc:drill:schema=hive:zk=10.10.100.113:5181' touch: cannot touch `/var/log/drill/sqlline.log': No such file or directory Drill log directory /var/log/drill does not exist or is not writable, defaulting to /opt/drill/log sqlline version 1.1.6 0: jdbc:drill:schema=hive:zk=10.10.100.113:51 show tables; Query failed: RelConversionException: No schema selected. Select a schema using 'USE schema' command If I put a ; between schema and zk, then sqlline connects to the specified schema: root@qa-node113:~# /opt/drill/bin/sqlline -u 'jdbc:drill:schema=hive;zk=10.10.100.113:5181' touch: cannot touch `/var/log/drill/sqlline.log': No such file or directory Drill log directory /var/log/drill does not exist or is not writable, defaulting to /opt/drill/log sqlline version 1.1.6 0: jdbc:drill:schema=hive show tables; +--++ | TABLE_SCHEMA | TABLE_NAME | +--++ | hive.default | t2 | | hive.default | episodes_partitioned | | hive.default | store | | hive.default | store_sales | -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation
[ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Westin updated DRILL-2488: Fix Version/s: 0.9.0 Wrong result on join between two subqueries with aggregation Key: DRILL-2488 URL: https://issues.apache.org/jira/browse/DRILL-2488 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Chris Westin Priority: Critical Fix For: 0.9.0 {code} 0: jdbc:drill:schema=dfs select * from t1; ++++ | a1 | b1 | c1 | ++++ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null | | 10 | j | 2015-01-10 | ++++ 10 rows selected (0.15 seconds) {code} This result is incorrect, one row is missing {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq2(x1, y1) . . . . . . . . . . . . on . . . . . . . . . . . . sq1.x1 = sq2.x1 and . . . . . . . . . . . . sq2.y1 = sq2.y1 . . . . . . . . . . . . ; +++++ | x1 | y1 |x10 |y10 | +++++ | b | 1 | b | 1 | | c | 1 | c | 1 | | e | 1 | e | 1 | | f | 1 | f | 1 | +++++ 4 rows selected (0.28 seconds) {code} Explain plan for the wrong result: {code} 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Limit(offset=[1], fetch=[5]) 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-09Sort(sort0=[$0], dir0=[ASC]) 00-11 StreamAgg(group=[{0, 1}]) 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) 00-04Project(b10=[$0], EXPR$10=[$1]) 00-06 SelectionVectorRemover 00-08Sort(sort0=[$0], dir0=[ASC]) 00-10 Filter(condition=[=($1, $1)]) 00-12Limit(offset=[1], fetch=[5]) 00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-16Sort(sort0=[$0], dir0=[ASC]) 00-17 StreamAgg(group=[{0, 1}]) 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-19 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) {code} If you turn off merge join, query returns correct result:
[jira] [Commented] (DRILL-2416) Zookeeper in sqlline connection string does not override the entry from drill-override.conf
[ https://issues.apache.org/jira/browse/DRILL-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366283#comment-14366283 ] Krystal commented on DRILL-2416: The zk= scenario does default to the connection from drill-override.conf Zookeeper in sqlline connection string does not override the entry from drill-override.conf Key: DRILL-2416 URL: https://issues.apache.org/jira/browse/DRILL-2416 Project: Apache Drill Issue Type: Bug Components: Client - CLI Affects Versions: 0.8.0 Reporter: Krystal Assignee: Daniel Barclay (Drill) git.commit.id=f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe On the sqlline jdbc connection string, I changed the zookeeper ip to point to another cluster; however, sqlline kept connecting to the drillbits specified in drill-override.conf. I updated the drill-override.conf with the other zookeeper information, then I was able to successfully connected to the drillbits on a remote cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2489) Accessing Connection, Statement, PreparedStatement after they are closed should throw a SQLException
Rahul Challapalli created DRILL-2489: Summary: Accessing Connection, Statement, PreparedStatement after they are closed should throw a SQLException Key: DRILL-2489 URL: https://issues.apache.org/jira/browse/DRILL-2489 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Reporter: Rahul Challapalli Assignee: Daniel Barclay (Drill) git.commit.id.abbrev=7b4c887 According to JDBC spec we should throw a SQLException when we access methods on a closed Connection, Statement, or PreparedStatement. Drill is currently not doing it. I can raise multiple JIRA's if the developer wishes to work on them independently -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2490) convert confluence sql commands pages
Kristine Hahn created DRILL-2490: Summary: convert confluence sql commands pages Key: DRILL-2490 URL: https://issues.apache.org/jira/browse/DRILL-2490 Project: Apache Drill Issue Type: Task Reporter: Kristine Hahn -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-2488) Wrong result on join between two subqueries with aggregation
[ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha reassigned DRILL-2488: - Assignee: Aman Sinha (was: Chris Westin) Wrong result on join between two subqueries with aggregation Key: DRILL-2488 URL: https://issues.apache.org/jira/browse/DRILL-2488 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Aman Sinha Priority: Critical Fix For: 0.9.0 Attachments: t1.parquet {code} 0: jdbc:drill:schema=dfs select * from t1; ++++ | a1 | b1 | c1 | ++++ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null | | 10 | j | 2015-01-10 | ++++ 10 rows selected (0.15 seconds) {code} This result is incorrect, one row is missing {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq2(x1, y1) . . . . . . . . . . . . on . . . . . . . . . . . . sq1.x1 = sq2.x1 and . . . . . . . . . . . . sq2.y1 = sq2.y1 . . . . . . . . . . . . ; +++++ | x1 | y1 |x10 |y10 | +++++ | b | 1 | b | 1 | | c | 1 | c | 1 | | e | 1 | e | 1 | | f | 1 | f | 1 | +++++ 4 rows selected (0.28 seconds) {code} Explain plan for the wrong result: {code} 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Limit(offset=[1], fetch=[5]) 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-09Sort(sort0=[$0], dir0=[ASC]) 00-11 StreamAgg(group=[{0, 1}]) 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) 00-04Project(b10=[$0], EXPR$10=[$1]) 00-06 SelectionVectorRemover 00-08Sort(sort0=[$0], dir0=[ASC]) 00-10 Filter(condition=[=($1, $1)]) 00-12Limit(offset=[1], fetch=[5]) 00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-16Sort(sort0=[$0], dir0=[ASC]) 00-17 StreamAgg(group=[{0, 1}]) 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-19 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) {code} If
[jira] [Resolved] (DRILL-2406) Fix expression interpreter to allow executing expressions at planning time
[ https://issues.apache.org/jira/browse/DRILL-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse resolved DRILL-2406. Resolution: Fixed Resolve din 0aa8b19d624d173da51de36aa164f3435d3366a4 and 3f93454f014196a4da198ce012b605b70081fde0 Fix expression interpreter to allow executing expressions at planning time -- Key: DRILL-2406 URL: https://issues.apache.org/jira/browse/DRILL-2406 Project: Apache Drill Issue Type: Improvement Reporter: Jason Altekruse Assignee: Jason Altekruse Priority: Critical Fix For: 0.8.0 Attachments: DRILL-2406-part1-15-mar-15.patch, DRILL-2406-part1-planning-time-expression-evaulutation.patch, DRILL-2406-part2-15-mar-15.patch, DRILL-2406-part2-planning-time-expression-evaulutation.diff, DRILL-2406-part2-v2-planning-time-expression-evaulutation.patch, DRILL-2406-part2-v3-planning-time-expression-evaulutation.diff The expression interpreter currently available in Drill cannot be used at planning time, as it does not have a means to connect to the direct memory allocator stored at the DrillbitContext level. To implement new rules based on evaluating expressions on constants, or small datasets, such as partition information this limitation must be addressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-2463) Implement JDBC mapping of SQL NULL for ResultSet.getXxx() methods
[ https://issues.apache.org/jira/browse/DRILL-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365904#comment-14365904 ] Daniel Barclay (Drill) edited comment on DRILL-2463 at 3/17/15 10:33 PM: - Retracted, since byte code error symptom is not from this change. (It exists on the master branch.) [Was: Current attempts to fix the lower layer now seem to result in a problem with scalar replacement manipulation of byte code. To avoid mixing in below-JDBC cleanup/soft-bug changes with JDBC hard-bug* changes and delaying the JDBC bug-fix changes until lower-level problems and indirect requirements are understood and solved (to avoid the fate of DRILL-1735--having otherwise-independent changes delayed by other things), I think we should implement all the checks in AvaticaDrillSqlAccessor now, filing a Jira report and putting TODO notes in AvaticaDrillSqlAccessor to later implement the non-primitive-type checks in the lower layer (and then remove the then-redundant non-primitive-type checks from AvaticaDrillSqlAccessor). (*Calling ResultSet.getBoolean(...) when the value is SQL NULL throws an exception (and ResultSet.isNull() can't be used without first calling a getXxx(...) method for the column).) ] was (Author: dsbos): Current attempts to fix the lower layer now seem to result in a problem with scalar replacement manipulation of byte code. To avoid mixing in below-JDBC cleanup/soft-bug changes with JDBC hard-bug* changes and delaying the JDBC bug-fix changes until lower-level problems and indirect requirements are understood and solved (to avoid the fate of DRILL-1735--having otherwise-independent changes delayed by other things), I think we should implement all the checks in AvaticaDrillSqlAccessor now, filing a Jira report and putting TODO notes in AvaticaDrillSqlAccessor to later implement the non-primitive-type checks in the lower layer (and then remove the then-redundant non-primitive-type checks from AvaticaDrillSqlAccessor). (*Calling ResultSet.getBoolean(...) when the value is SQL NULL throws an exception (and ResultSet.isNull() can't be used without first calling a getXxx(...) method for the column).) Implement JDBC mapping of SQL NULL for ResultSet.getXxx() methods - Key: DRILL-2463 URL: https://issues.apache.org/jira/browse/DRILL-2463 Project: Apache Drill Issue Type: Bug Reporter: Daniel Barclay (Drill) Assignee: Daniel Barclay (Drill) Fix AvaticaDrillSqlAccessor to implement mapping of SQL NULL to dummy primitive values (e.g,., returning 0 for ResultSet.getInt(...)). Fix SqlAccessors template to implement mapping of SQL NULL to null pointers (e.g., returning null from ResultSet.getString(...).) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2488) Wrong result on join between two subqueries with aggregation
Victoria Markman created DRILL-2488: --- Summary: Wrong result on join between two subqueries with aggregation Key: DRILL-2488 URL: https://issues.apache.org/jira/browse/DRILL-2488 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Chris Westin Priority: Critical {code} 0: jdbc:drill:schema=dfs select * from t1; ++++ | a1 | b1 | c1 | ++++ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null | | 10 | j | 2015-01-10 | ++++ 10 rows selected (0.15 seconds) {code} This result is incorrect, one row is missing {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq2(x1, y1) . . . . . . . . . . . . on . . . . . . . . . . . . sq1.x1 = sq2.x1 and . . . . . . . . . . . . sq2.y1 = sq2.y1 . . . . . . . . . . . . ; +++++ | x1 | y1 |x10 |y10 | +++++ | b | 1 | b | 1 | | c | 1 | c | 1 | | e | 1 | e | 1 | | f | 1 | f | 1 | +++++ 4 rows selected (0.28 seconds) {code} Explain plan for the wrong result: {code} 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Limit(offset=[1], fetch=[5]) 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-09Sort(sort0=[$0], dir0=[ASC]) 00-11 StreamAgg(group=[{0, 1}]) 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) 00-04Project(b10=[$0], EXPR$10=[$1]) 00-06 SelectionVectorRemover 00-08Sort(sort0=[$0], dir0=[ASC]) 00-10 Filter(condition=[=($1, $1)]) 00-12Limit(offset=[1], fetch=[5]) 00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-16Sort(sort0=[$0], dir0=[ASC]) 00-17 StreamAgg(group=[{0, 1}]) 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-19 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) {code} If you turn off merge join, query returns correct result: {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . .
[jira] [Commented] (DRILL-2416) Zookeeper in sqlline connection string does not override the entry from drill-override.conf
[ https://issues.apache.org/jira/browse/DRILL-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366279#comment-14366279 ] Krystal commented on DRILL-2416: I opened drill-2487 for the : vs ; issue. For this issue, here is my drill-override.conf content: drill.exec: { cluster-id: krystal-drillbits, zk.connect: 10.10.100.113:5181,10.10.100.114:5181,10.10.100.115:5181 } From sqlline, connecting to the hive schema using the same zk info: root@qa-node113:~# /opt/drill/bin/sqlline -u 'jdbc:drill:schema=hive;zk=10.10.100.113:5181' touch: cannot touch `/var/log/drill/sqlline.log': No such file or directory Drill log directory /var/log/drill does not exist or is not writable, defaulting to /opt/drill/log sqlline version 1.1.6 0: jdbc:drill:schema=hive show tables; +--++ | TABLE_SCHEMA | TABLE_NAME | +--++ | hive.default | t2 | | hive.default | episodes_partitioned | | hive.default | store | | hive.default | store_sales | | hive.default | promotion | | hive.default | voter | | hive.default | orc_create_people_staging | | hive.default | m7_students | Leaving the drill-override content the same, I updated the zookeeper connection to point to a remote drillbit: root@qa-node113:~# /opt/drill/bin/sqlline -u 'jdbc:drill:schema=hive;zk=10.10.100.56:5181' touch: cannot touch `/var/log/drill/sqlline.log': No such file or directory Drill log directory /var/log/drill does not exist or is not writable, defaulting to /opt/drill/log No DrillbitEndpoint can be found sqlline version 1.1.6 If I then update the drill-override.conf on the client node to contain the info of the remote drillbit, then I was able to successfully connect: drill.exec: { cluster-id: qa-node56-drillbits, zk.connect: 10.10.100.56:5181 } root@qa-node113:~# /opt/drill/bin/sqlline -u 'jdbc:drill:schema=hive;zk=10.10.100.56:5181' touch: cannot touch `/var/log/drill/sqlline.log': No such file or directory Drill log directory /var/log/drill does not exist or is not writable, defaulting to /opt/drill/log sqlline version 1.1.6 0: jdbc:drill:schema=hive show tables; +--++ | TABLE_SCHEMA | TABLE_NAME | +--++ | hive.default | bit_table | | hive.default | stinyint_table | | hive.default | string_table | | hive.default | real_table | | hive.default | interval_table | | hive.default | binary_table | | hive.default | emp| | hive.default | bigint_table | Zookeeper in sqlline connection string does not override the entry from drill-override.conf Key: DRILL-2416 URL: https://issues.apache.org/jira/browse/DRILL-2416 Project: Apache Drill Issue Type: Bug Components: Client - CLI Affects Versions: 0.8.0 Reporter: Krystal Assignee: Daniel Barclay (Drill) git.commit.id=f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe On the sqlline jdbc connection string, I changed the zookeeper ip to point to another cluster; however, sqlline kept connecting to the drillbits specified in drill-override.conf. I updated the drill-override.conf with the other zookeeper information, then I was able to successfully connected to the drillbits on a remote cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation
[ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victoria Markman updated DRILL-2488: Description: {code} 0: jdbc:drill:schema=dfs select * from t1; ++++ | a1 | b1 | c1 | ++++ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null | | 10 | j | 2015-01-10 | ++++ 10 rows selected (0.15 seconds) {code} This result is incorrect, one row is missing {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq2(x1, y1) . . . . . . . . . . . . on . . . . . . . . . . . . sq1.x1 = sq2.x1 and . . . . . . . . . . . . sq2.y1 = sq2.y1 . . . . . . . . . . . . ; +++++ | x1 | y1 |x10 |y10 | +++++ | b | 1 | b | 1 | | c | 1 | c | 1 | | e | 1 | e | 1 | | f | 1 | f | 1 | +++++ 4 rows selected (0.28 seconds) {code} Explain plan for the wrong result: {code} 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Limit(offset=[1], fetch=[5]) 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-09Sort(sort0=[$0], dir0=[ASC]) 00-11 StreamAgg(group=[{0, 1}]) 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) 00-04Project(b10=[$0], EXPR$10=[$1]) 00-06 SelectionVectorRemover 00-08Sort(sort0=[$0], dir0=[ASC]) 00-10 Filter(condition=[=($1, $1)]) 00-12Limit(offset=[1], fetch=[5]) 00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-16Sort(sort0=[$0], dir0=[ASC]) 00-17 StreamAgg(group=[{0, 1}]) 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-19 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) {code} If you turn off merge join, query returns correct result: {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . ( . . . . . . . . . . . .
[jira] [Commented] (DRILL-2438) Query on views with Avg on integer column returns wrong result
[ https://issues.apache.org/jira/browse/DRILL-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366315#comment-14366315 ] Venki Korukanti commented on DRILL-2438: This doesn't look related to views not storing nullability. For some reason an extra cast is inserted to cast the result to integer. {code} 00-01 Project(i_item_id=[$0], agg1=[$1]) 00-02SelectionVectorRemover 00-03 Limit(fetch=[5]) 00-04SelectionVectorRemover 00-05 TopN(limit=[5]) 00-06Project(i_item_id=[$0], agg1=[CAST(/(CastHigh(CASE(=($2, 0), null, $1)), $2)):INTEGER]) 00-07 HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT($1)]) 00-08Project(i_item_id=[CASE(=(ITEM($0, 1), ''), null, CAST(ITEM($0, 1)):VARCHAR(200) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], i_manufact_id=[CASE(=(ITEM($0, 13), ''), null, CAST(ITEM($0, 13)):INTEGER)]) 00-09 Scan(groupscan=[EasyGroupScan [selectionRoot=/Users/hadoop/data/scale1/item.dat, numFiles=1, columns=[`columns`[1], `columns`[13]], files=[file:/Users/hadoop/data/scale1/item.dat]]]) {code} Query on views with Avg on integer column returns wrong result -- Key: DRILL-2438 URL: https://issues.apache.org/jira/browse/DRILL-2438 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Mehant Baid Priority: Critical Fix For: 0.9.0 Git.Commit.ID: b3bdc27 (Mar 10) Average on an integer column returns an (inaccurate) integer value, instead of an (accurate) decimal value. *The following query returns wrong results:* {code:sql} SELECT i_item_id, avg(i_manufact_id) agg1 . . . . . . . . . . . . . . . . . FROM item . . . . . . . . . . . . . . . . . GROUP BY i_item_id . . . . . . . . . . . . . . . . . ORDER BY i_item_id . . . . . . . . . . . . . . . . . LIMIT 5; +++ | i_item_id |agg1| +++ | AAAB | 152| | AAAC | 187| | AAAE | 251| | AABA | 199| | AABB | 636| +++ 5 rows selected (0.324 seconds) {code} *Postgres results:* {code:sql} # SELECT i_item_id, avg(i_manufact_id) agg1 tpcds1_new-# FROM item tpcds1_new-# GROUP BY i_item_id tpcds1_new-# ORDER BY i_item_id tpcds1_new-# LIMIT 5; i_item_id | agg1 --+-- AAAB | 152. AAAC | 373. AAAE | 251. AABA | 198.6667 AABB | 636. (5 rows) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2488) Wrong result on join between two subqueries with aggregation
[ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366370#comment-14366370 ] Aman Sinha commented on DRILL-2488: --- Adding a simplified query that does not do COUNT(distinct) and no GROUP-BY and still manifests the problem: {code} select * from ( select b1 from dfs.`/Users/asinha/data/t1.parquet` order by b1 limit 5 offset 3) as sq1(x1) inner join ( select b1 from dfs.`/Users/asinha/data/t1.parquet` order by b1 limit 5 offset 3) as sq2(x1) on sq1.x1 = sq2.x1 ; {code} With HashJoin plan, this produces 5 rows (correct). With MergeJoin plan, this produces 2 rows (wrong). However, I don't think this is an issue with MergeJoin; it seems to be related to OFFSET. Removing the offset produces correct results and changing it produces different wrong results. I will investigate some more. Wrong result on join between two subqueries with aggregation Key: DRILL-2488 URL: https://issues.apache.org/jira/browse/DRILL-2488 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Chris Westin Priority: Critical Fix For: 0.9.0 Attachments: t1.parquet {code} 0: jdbc:drill:schema=dfs select * from t1; ++++ | a1 | b1 | c1 | ++++ | 1 | a | 2015-01-01 | | 2 | b | 2015-01-02 | | 3 | c | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | e | 2015-01-05 | | 6 | f | 2015-01-06 | | 7 | g | 2015-01-07 | | null | h | 2015-01-08 | | 9 | i | null | | 10 | j | 2015-01-10 | ++++ 10 rows selected (0.15 seconds) {code} This result is incorrect, one row is missing {code} 0: jdbc:drill:schema=dfs select * from . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq1(x1, y1) . . . . . . . . . . . . . . . . . . . . . . . . inner join . . . . . . . . . . . . . . . . . . . . . . . . ( . . . . . . . . . . . . select . . . . . . . . . . . . b1, . . . . . . . . . . . . count(distinct a1) . . . . . . . . . . . . from . . . . . . . . . . . . t1 . . . . . . . . . . . . group by . . . . . . . . . . . . b1 . . . . . . . . . . . . order by . . . . . . . . . . . . b1 limit 5 offset 1 . . . . . . . . . . . . ) as sq2(x1, y1) . . . . . . . . . . . . on . . . . . . . . . . . . sq1.x1 = sq2.x1 and . . . . . . . . . . . . sq2.y1 = sq2.y1 . . . . . . . . . . . . ; +++++ | x1 | y1 |x10 |y10 | +++++ | b | 1 | b | 1 | | c | 1 | c | 1 | | e | 1 | e | 1 | | f | 1 | f | 1 | +++++ 4 rows selected (0.28 seconds) {code} Explain plan for the wrong result: {code} 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Limit(offset=[1], fetch=[5]) 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) 00-09Sort(sort0=[$0], dir0=[ASC]) 00-11 StreamAgg(group=[{0, 1}]) 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) 00-04Project(b10=[$0], EXPR$10=[$1]) 00-06
[jira] [Comment Edited] (DRILL-1833) Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data
[ https://issues.apache.org/jira/browse/DRILL-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365644#comment-14365644 ] Venki Korukanti edited comment on DRILL-1833 at 3/17/15 5:50 PM: - We currently store MapViewName, ViewLocation in ZK. When listing views (as part of SHOW TABLES), we take view list from ZK store and for each entry we check if the view definition exists in given location. As the view list is empty in ZK, we don't list any views in SHOW TABLES. When creating view, we create it by default in workspace schema location. Also when querying we refer directly to FileSystem for view definition. Info in ZK is redundant as we are always trusting the information in FileSystem. We can remove store view persistent info in ZK. One thing I must point out is: In future if we support create view with custom view location, then it won't be visible in SHOW TABLES as SHOW TABLES only searches for .view.drill files in workspace root directory. This should be ok as the view created with custom location is considered external to schema. was (Author: vkorukanti): We currently store MapViewName, ViewLocation in ZK. When listing views (as part of SHOW TABLES), we take view list from ZK store and for each entry we check if the view definition exists in given location. As the view list is empty in ZK, we don't list any views in SHOW TABLES. When creating view, we create it by default in workspace schema location. Also when querying we refer directly to FileSystem for view definition. Info in ZK is redundant as we are always trusting the information in FileSystem. We can remove store view persistent info in ZK. One thing I must point is: In future if we support create view with custom view location, then it won't be visible in SHOW TABLES as SHOW TABLES only searches for .view.drill files in workspace root directory. This should be ok as the view created with custom location is considered external to schema. Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data --- Key: DRILL-1833 URL: https://issues.apache.org/jira/browse/DRILL-1833 Project: Apache Drill Issue Type: Bug Components: Storage - Information Schema Environment: git.commit.id.abbrev=2396670 Reporter: Xiao Meng Assignee: Venki Korukanti Fix For: 0.9.0 Attachments: DRILL-1833-1.patch After wiping out the ZooKeeper data, the drillbit cannot automatically register the view into INFORMATION_SCHEMA.`TABLES` even after we query the view. For example, for a workspace dfs.tmp, there is a view file `varchar_view.view.drill` under the corresponding directory '/tmp'. We can query: {code} select * from dfs.test.`varchar_view` {code} But this view still won't show up in INFORMATION_SCHEMA.`TABLES`. After I recreate the view based on the contents of `varchar_view.view.drill`, the view shows in the INFORMATION_SCHEMA.`TABLES`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1833) Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data
[ https://issues.apache.org/jira/browse/DRILL-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365665#comment-14365665 ] Venki Korukanti commented on DRILL-1833: RB Link: https://reviews.apache.org/r/32165/ Views cannot be registered into the INFROMATION_SCHEMA.`TABLES` after wiping out ZooKeeper data --- Key: DRILL-1833 URL: https://issues.apache.org/jira/browse/DRILL-1833 Project: Apache Drill Issue Type: Bug Components: Storage - Information Schema Environment: git.commit.id.abbrev=2396670 Reporter: Xiao Meng Assignee: Venki Korukanti Fix For: 0.9.0 Attachments: DRILL-1833-1.patch After wiping out the ZooKeeper data, the drillbit cannot automatically register the view into INFORMATION_SCHEMA.`TABLES` even after we query the view. For example, for a workspace dfs.tmp, there is a view file `varchar_view.view.drill` under the corresponding directory '/tmp'. We can query: {code} select * from dfs.test.`varchar_view` {code} But this view still won't show up in INFORMATION_SCHEMA.`TABLES`. After I recreate the view based on the contents of `varchar_view.view.drill`, the view shows in the INFORMATION_SCHEMA.`TABLES`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2380) TPC-DS Query 33 and simplified variants return wrong results
[ https://issues.apache.org/jira/browse/DRILL-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu resolved DRILL-2380. -- Resolution: Fixed Fix Version/s: (was: 0.9.0) 0.8.0 TPC-DS Query 33 and simplified variants return wrong results Key: DRILL-2380 URL: https://issues.apache.org/jira/browse/DRILL-2380 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Sean Hsuan-Yi Chu Priority: Critical Fix For: 0.8.0 TPC-DS query 33 returns wrong results. {code:sql} WITH ss AS (SELECT i_manufact_id, Sum(ss_ext_sales_price) total_sales FROM store_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ss_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), cs AS (SELECT i_manufact_id, Sum(cs_ext_sales_price) total_sales FROM catalog_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND cs_item_sk = i_item_sk AND cs_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND cs_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id), ws AS (SELECT i_manufact_id, Sum(ws_ext_sales_price) total_sales FROM web_sales, date_dim, customer_address, item WHERE i_manufact_id IN (SELECT i_manufact_id FROM item WHERE i_category IN ( 'Books' )) AND ws_item_sk = i_item_sk AND ws_sold_date_sk = d_date_sk AND d_year = 1999 AND d_moy = 3 AND ws_bill_addr_sk = ca_address_sk AND ca_gmt_offset = -5 GROUP BY i_manufact_id) SELECT i_manufact_id, Sum(total_sales) total_sales FROM (SELECT i_manufact_id, total_sales FROM ss UNION ALL SELECT i_manufact_id, total_sales FROM cs UNION ALL SELECT i_manufact_id, total_sales FROM ws) tmp1 GROUP BY i_manufact_id ORDER BY total_sales LIMIT 10; Drill Results: +---+-+ | i_manufact_id | total_sales | +---+-+ | 440 | 0.12| | 434 | 13.16 | | 415 | 14.04 | | 449 | 15.63 | | 563 | 31.46 | | 357 | 49.50 | | 624 | 67.94 | | 192 | 74.40 | | 137 | 83.42 | | 240 | 85.26 | +---+-+ 10 rows selected (7.57 seconds) Postgres Results: i_manufact_id | total_sales ---+- 930 |1.18 818 | 41.86 913 | 141.90 784 | 184.90 488 | 275.08 993 | 301.60 700 | 340.52 895 | 802.30 766 | 839.76 858 | 859.18 (10 rows) {code} The following simplified variants also return wrong results: {code:sql} SELECT sum(x) FROM (SELECT ss_ext_sales_price x, ss_item_sk FROM store_sales GROUP BY ss_item_sk, ss_ext_sales_price UNION ALL SELECT cs_ext_sales_price x, cs_item_sk FROM catalog_sales GROUP BY cs_item_sk, cs_ext_sales_price) tmp GROUP BY x LIMIT 10; Drill Results: ++ | EXPR$0 | ++ | 14141.40 | | 28060.00 | | 30912.70 | | 43706.88 | | 38267.64 | | 10173.00 | | 37829.25 | | 5349.50| | 107515.80 | | 4440.84| ++ 10 rows selected (14.435 seconds) Postgres Results: sum -- 45234.00 5735.31 2275.60 6921.32 2590.46 6615.09 14080.77 24819.76 25127.20 (10 rows) SELECT sum(x) FROM (SELECT
[jira] [Created] (DRILL-2483) Make buffer that rows are read into during execution configurable for testing purposes
Victoria Markman created DRILL-2483: --- Summary: Make buffer that rows are read into during execution configurable for testing purposes Key: DRILL-2483 URL: https://issues.apache.org/jira/browse/DRILL-2483 Project: Apache Drill Issue Type: Wish Reporter: Victoria Markman We've found a bug recently where if table had multiple duplicate rows and duplicate rows span multiple buffers, merge join returned wrong result. Test case had a table with 10,000 rows. The same problem could be reproduced on a much smaller data set if buffer size was configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-2482) JDBC : calling getObject when the actual column type is 'NVARCHAR' results in NoClassDefFoundError
[ https://issues.apache.org/jira/browse/DRILL-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365965#comment-14365965 ] Rahul Challapalli edited comment on DRILL-2482 at 3/17/15 8:14 PM: --- It is of type NVARCHAR. I was just casual in the description when I used varchar I did not find the org.apache.hadoop.io.Text class file in the jar. Also there is no 'hadoop' folder under 'org/apache' itself. BTW, I am working with the jar file that you provided and not off master was (Author: rkins): It is of type NVARCHAR. I did not find the org.apache.hadoop.io.Text class file in the jar. Also there is no 'hadoop' folder under 'org/apache' itself. BTW, I am working with the jar file that you provided and not off master JDBC : calling getObject when the actual column type is 'NVARCHAR' results in NoClassDefFoundError -- Key: DRILL-2482 URL: https://issues.apache.org/jira/browse/DRILL-2482 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Reporter: Rahul Challapalli Assignee: Daniel Barclay (Drill) git.commit.id.abbrev=7b4c887 I tried to call getObject(i) on a column which is of type varchar, drill failed with the below error : {code} Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/io/Text at org.apache.drill.exec.vector.VarCharVector$Accessor.getObject(VarCharVector.java:407) at org.apache.drill.exec.vector.NullableVarCharVector$Accessor.getObject(NullableVarCharVector.java:386) at org.apache.drill.exec.vector.accessor.NullableVarCharAccessor.getObject(NullableVarCharAccessor.java:98) at org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:137) at org.apache.drill.jdbc.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:136) at net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351) at Dummy.testComplexQuery(Dummy.java:94) at Dummy.main(Dummy.java:30) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Text at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 8 more {code} When the underlying type is a primitive, the getObject call succeeds -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2484) Document CASE expression
Victoria Markman created DRILL-2484: --- Summary: Document CASE expression Key: DRILL-2484 URL: https://issues.apache.org/jira/browse/DRILL-2484 Project: Apache Drill Issue Type: Bug Components: Documentation Reporter: Victoria Markman Assignee: Bridget Bevens Case expression is not documented, we support searched case: CASE WHEN boolean-expression THEN statements [ WHEN boolean-expression THEN statements ... ] [ ELSE statements ] END CASE; See postgres page for example: http://www.postgresql.org/docs/9.1/static/plpgsql-control-structures.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2180) Star is not expanded when being used with flatten
[ https://issues.apache.org/jira/browse/DRILL-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365974#comment-14365974 ] Sean Hsuan-Yi Chu commented on DRILL-2180: -- [~mehant], can you review it? Star is not expanded when being used with flatten - Key: DRILL-2180 URL: https://issues.apache.org/jira/browse/DRILL-2180 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Sean Hsuan-Yi Chu Assignee: Mehant Baid Priority: Critical Fix For: 0.9.0 Attachments: DRILL-2180.1.patch For example, select *, flatten(j.topping) tt + from dfs_test.`%s` j (using the same data set in DRILL-2012) * tt null {id:5001,type:None} null {id:5002,type:Glazed} null {id:5005,type:Sugar} null {id:5007,type:Powdered Sugar} null {id:5006,type:Chocolate with Sprinkles} null {id:5003,type:Chocolate} null {id:5004,type:Maple} Note that the first column is messed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2485) Configuration parameters need to be named consistently
Khurram Faraaz created DRILL-2485: - Summary: Configuration parameters need to be named consistently Key: DRILL-2485 URL: https://issues.apache.org/jira/browse/DRILL-2485 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Khurram Faraaz Assignee: Jinfeng Ni Priority: Minor All existing configuration parameters need to be named consistently, the below two configuration parameters are not named using the same format as other config options are. drill.exec.functions.cast_empty_string_to_null - accepts a string input drill.exec.storage.file.partition.column.label - accepts either true/false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-1943) Handle aliases and column names that differ in case only
[ https://issues.apache.org/jira/browse/DRILL-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu resolved DRILL-1943. -- Resolution: Fixed Resolved in Commit#: ae2053d2a078a40033a140f2dfaeef802a5e8254 Handle aliases and column names that differ in case only Key: DRILL-1943 URL: https://issues.apache.org/jira/browse/DRILL-1943 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Parth Chandra Assignee: Sean Hsuan-Yi Chu Fix For: 0.9.0 1) Consider the query select a, a from foo. For this query we return the columns a and a0. For the query select a, A from foo we return only one column and also leak memory. (see DRILL-1911). The same behaviour exists if the query uses aliases. This is not correct. Aliases are explicitly specified names to remove ambiguity in column names and should be unique (ignoring case). A query like : select A as a1, B as A1 from foo should give a syntax error. This should be the behaviour in subqueries, view creation and CTAS queries as well. 2) If a subquery (or view) has column names that are different only in case, the use of the subquery or view should result in ann error if the top level query references the ambiguous column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2482) JDBC : calling getObject when the actual column type is 'NVARCHAR' results in NoClassDefFoundError
[ https://issues.apache.org/jira/browse/DRILL-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365965#comment-14365965 ] Rahul Challapalli commented on DRILL-2482: -- It is of type NVARCHAR. I did not find the org.apache.hadoop.io.Text class file in the jar. Also there is no 'hadoop' folder under 'org/apache' itself. BTW, I am working with the jar file that you provided and not off master JDBC : calling getObject when the actual column type is 'NVARCHAR' results in NoClassDefFoundError -- Key: DRILL-2482 URL: https://issues.apache.org/jira/browse/DRILL-2482 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Reporter: Rahul Challapalli Assignee: Daniel Barclay (Drill) git.commit.id.abbrev=7b4c887 I tried to call getObject(i) on a column which is of type varchar, drill failed with the below error : {code} Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/io/Text at org.apache.drill.exec.vector.VarCharVector$Accessor.getObject(VarCharVector.java:407) at org.apache.drill.exec.vector.NullableVarCharVector$Accessor.getObject(NullableVarCharVector.java:386) at org.apache.drill.exec.vector.accessor.NullableVarCharAccessor.getObject(NullableVarCharAccessor.java:98) at org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:137) at org.apache.drill.jdbc.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:136) at net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351) at Dummy.testComplexQuery(Dummy.java:94) at Dummy.main(Dummy.java:30) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Text at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 8 more {code} When the underlying type is a primitive, the getObject call succeeds -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2482) JDBC : calling getObject when the actual column type is 'NVARCHAR' results in NoClassDefFoundError
[ https://issues.apache.org/jira/browse/DRILL-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365919#comment-14365919 ] Daniel Barclay (Drill) commented on DRILL-2482: --- That's weird--I've seen org.apache.hadoop.io.Text objects in tracing proxy output (so the class was found and loaded in that case). Can you check whether that class--or one with a name ending like that but starting with a different, probably added, package--exists in the Drill Jar file you're using? Also, is the type NVARCHAR (per your title) or VARCHAR (per your description)? (Or: Where does it seem to be NVARCHAR() and where does it seem to be VARCHAR? (I've got a pending patch for an NVARCHAR that should be VARCHAR.)) JDBC : calling getObject when the actual column type is 'NVARCHAR' results in NoClassDefFoundError -- Key: DRILL-2482 URL: https://issues.apache.org/jira/browse/DRILL-2482 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Reporter: Rahul Challapalli Assignee: Daniel Barclay (Drill) git.commit.id.abbrev=7b4c887 I tried to call getObject(i) on a column which is of type varchar, drill failed with the below error : {code} Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/io/Text at org.apache.drill.exec.vector.VarCharVector$Accessor.getObject(VarCharVector.java:407) at org.apache.drill.exec.vector.NullableVarCharVector$Accessor.getObject(NullableVarCharVector.java:386) at org.apache.drill.exec.vector.accessor.NullableVarCharAccessor.getObject(NullableVarCharAccessor.java:98) at org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:137) at org.apache.drill.jdbc.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:136) at net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351) at Dummy.testComplexQuery(Dummy.java:94) at Dummy.main(Dummy.java:30) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Text at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 8 more {code} When the underlying type is a primitive, the getObject call succeeds -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-1911) Querying same field multiple times with different case would hit memory leak and return incorrect result.
[ https://issues.apache.org/jira/browse/DRILL-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu resolved DRILL-1911. -- Resolution: Fixed Querying same field multiple times with different case would hit memory leak and return incorrect result. -- Key: DRILL-1911 URL: https://issues.apache.org/jira/browse/DRILL-1911 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Jinfeng Ni Assignee: Sean Hsuan-Yi Chu Fix For: 0.8.0 git.commit.id.abbrev=309e1be If query the same field twice, with different case, Drill will throw memory assertion error. select employee_id, Employee_id from cp.`employee.json` limit 2; +-+ | employee_id | +-+ | 1 | | 2 | Query failed: Query failed: Failure while running fragment., Attempted to close accountor with 2 buffer(s) still allocatedfor QueryId: 2b5cc8eb-2817-aadb-e0fa-49272796592a, MajorFragmentId: 0, MinorFragmentId: 0. Total 1 allocation(s) of byte size(s): 4096, at stack location: org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:212) org.apache.drill.exec.vector.UInt1Vector.allocateNewSafe(UInt1Vector.java:137) org.apache.drill.exec.vector.NullableBigIntVector.allocateNewSafe(NullableBigIntVector.java:173) org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doAlloc(ProjectRecordBatch.java:229) org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:167) org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93) org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:97) org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:114) org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Also, notice that the query result only contains one field; the second field is missing. The plan looks fine. Drill Physical : 00-00Screen: rowcount = 463.0, cumulative cost = {1900.3 rows, 996.3 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 103 00-01 Project(employee_id=[$0], Employee_id=[$1]): rowcount = 463.0, cumulative cost = {1854.0 rows, 950.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 102 00-02SelectionVectorRemover: rowcount = 463.0, cumulative cost = {1391.0 rows, 942.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 101 00-03 Limit(fetch=[2]): rowcount = 463.0, cumulative cost = {928.0 rows, 479.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 100 00-04Project(employee_id=[$0], Employee_id=[$0]): rowcount = 463.0, cumulative cost = {926.0 rows, 471.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 99 00-05 Scan(groupscan=[EasyGroupScan [selectionRoot=/employee.json, numFiles=1, columns=[`employee_id`], files=[/employee.json]]]): rowcount = 463.0, cumulative cost = {463.0 rows, 463.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 98 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-1943) Handle aliases and column names that differ in case only
[ https://issues.apache.org/jira/browse/DRILL-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu updated DRILL-1943: - Fix Version/s: (was: 0.9.0) 0.8.0 Handle aliases and column names that differ in case only Key: DRILL-1943 URL: https://issues.apache.org/jira/browse/DRILL-1943 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Parth Chandra Assignee: Sean Hsuan-Yi Chu Fix For: 0.8.0 1) Consider the query select a, a from foo. For this query we return the columns a and a0. For the query select a, A from foo we return only one column and also leak memory. (see DRILL-1911). The same behaviour exists if the query uses aliases. This is not correct. Aliases are explicitly specified names to remove ambiguity in column names and should be unique (ignoring case). A query like : select A as a1, B as A1 from foo should give a syntax error. This should be the behaviour in subqueries, view creation and CTAS queries as well. 2) If a subquery (or view) has column names that are different only in case, the use of the subquery or view should result in ann error if the top level query references the ambiguous column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2473) Set query timezone at session level
[ https://issues.apache.org/jira/browse/DRILL-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365261#comment-14365261 ] Andries Engelbrecht commented on DRILL-2473: Need to think if we want connection or session level, as an application may establish a single connection but serve multiple users from different timezones. Set query timezone at session level --- Key: DRILL-2473 URL: https://issues.apache.org/jira/browse/DRILL-2473 Project: Apache Drill Issue Type: Improvement Components: Query Planning Optimization Affects Versions: Future Reporter: Andries Engelbrecht Assignee: Jinfeng Ni Ability to set the user timezone for queries at session level to allow different users querying the same data form different timezones to localize the results to the desired timezone. Allowance for DST where applicable should be incorporated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2429) Update Supported Date/Time Data Type Formats doc
[ https://issues.apache.org/jira/browse/DRILL-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kristine Hahn resolved DRILL-2429. -- Resolution: Fixed committed https://reviews.apache.org/r/32138/ Update Supported Date/Time Data Type Formats doc Key: DRILL-2429 URL: https://issues.apache.org/jira/browse/DRILL-2429 Project: Apache Drill Issue Type: Task Components: Documentation Affects Versions: 0.7.0 Reporter: Kristine Hahn Assignee: Kristine Hahn Fix For: 0.8.0 Test/revise/update Supported Date/Time Data Type Formats. Fold in review comments of other sections. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2275) need implementations of sys tables for drill memory and threads profiles
[ https://issues.apache.org/jira/browse/DRILL-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudheesh Katkam updated DRILL-2275: --- Attachment: DRILL-2275.4.patch.txt need implementations of sys tables for drill memory and threads profiles Key: DRILL-2275 URL: https://issues.apache.org/jira/browse/DRILL-2275 Project: Apache Drill Issue Type: Task Components: Metadata Reporter: Zhiyong Liu Assignee: Sudheesh Katkam Priority: Critical Fix For: 0.9.0 Attachments: DRILL-2275.1.patch.txt, DRILL-2275.2.patch.txt, DRILL-2275.3.patch.txt, DRILL-2275.4.patch.txt In order to check drill state information, the following tables are to be implemented: 1. Memory: a query such as select * from sys.drillmemory; should return a result set like the following: +++--+++ |drillbit| total_sys_memory |heap_size | direct_alloc_memory | +++--+++ | node1:port1 | 24596676k | 15200420k | 1012372k | +++--+++ | node2:port2 | 24596676k | 15200420k | 2012372k | +++--+++ 2. Threads: For each node in a cluster, we need counts of threads of the drillbits. A query like this: select * from sys.drillbitthreads; should return a result set like the following: +++--+++ |drillbit| pool_name | total_threads | busy_threads | +++--+++ | node1:port1 | pool1 | 8 | 2 | +++--+++ | node2:port2 | pool2 | 10 | 5 | +++--+++ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2397) Enhance SQL Ref Data Types docs
[ https://issues.apache.org/jira/browse/DRILL-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kristine Hahn resolved DRILL-2397. -- Resolution: Fixed committed: df1b7e5a9397b02a880230e6dc51a07f2b1ff997 Enhance SQL Ref Data Types docs --- Key: DRILL-2397 URL: https://issues.apache.org/jira/browse/DRILL-2397 Project: Apache Drill Issue Type: Task Components: Documentation Affects Versions: 0.7.0 Reporter: Kristine Hahn Assignee: Kristine Hahn Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2481) Querying individual column from view results in AssertionError
Khurram Faraaz created DRILL-2481: - Summary: Querying individual column from view results in AssertionError Key: DRILL-2481 URL: https://issues.apache.org/jira/browse/DRILL-2481 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Khurram Faraaz Assignee: Jinfeng Ni Querying an individual column from a view results in an AssertionError Data used was from a csv file, its content was a single row (pls see below) 1,John Doe,HR,5000,Software Engineer {code } 0: jdbc:drill: use dfs.tmp; +++ | ok | summary | +++ | true | Default schema changed to 'dfs.tmp' | +++ 1 row selected (0.188 seconds) 0: jdbc:drill: create view v1 as select * from `employee.csv` union all select * from `employee.csv`; +++ | ok | summary | +++ | true | View 'v1' created successfully in 'dfs.tmp' schema | +++ 1 row selected (0.073 seconds) 0: jdbc:drill: create view v2 as select * from `employee.csv` union all select * from `employee.csv`; +++ | ok | summary | +++ | true | View 'v2' created successfully in 'dfs.tmp' schema | +++ 1 row selected (0.046 seconds) 0: jdbc:drill: select * from v1; ++ | columns | ++ | [1,John Doe,HR,5000,Software Engineer] | | [1,John Doe,HR,5000,Software Engineer] | ++ 2 rows selected (0.087 seconds) 0: jdbc:drill: select * from v2; ++ | columns | ++ | [1,John Doe,HR,5000,Software Engineer] | | [1,John Doe,HR,5000,Software Engineer] | ++ 2 rows selected (0.075 seconds) 0: jdbc:drill: describe v1; +-++-+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | +-++-+ | * | ANY| NO | +-++-+ 1 row selected (0.084 seconds) 0: jdbc:drill: describe v2; +-++-+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | +-++-+ | * | ANY| NO | +-++-+ 1 row selected (0.083 seconds) 0: jdbc:drill: select columns[0] from v1; Query failed: AssertionError: ANY Error: exception while executing query: Failure while executing query. (state=,code=0) {code} {code} Stack trace from driblet.log 2015-03-17 16:40:43,176 [2af7a6f3-bbcf-3a34-dfae-5b5bb11ff4a9:foreman] INFO o.a.d.e.s.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1 using 1 threads. Time: 1ms total, 1.060851ms avg, 1ms max. 2015-03-17 16:40:43,178 [2af7a6f3-bbcf-3a34-dfae-5b5bb11ff4a9:foreman] INFO o.a.drill.exec.work.foreman.Foreman - State change requested. PENDING -- FAILED org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: ANY at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:195) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75] Caused by: java.lang.AssertionError: ANY at org.eigenbase.reltype.RelDataTypeImpl.getFieldCount(RelDataTypeImpl.java:114) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.relopt.RelOptUtil$2.size(RelOptUtil.java:143) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexChecker.visitInputRef(RexChecker.java:111) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexChecker.visitInputRef(RexChecker.java:55) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexInputRef.accept(RexInputRef.java:103) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:136) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:55) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexCall.accept(RexCall.java:106) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:136) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexChecker.visitCall(RexChecker.java:55) ~[optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.rex.RexCall.accept(RexCall.java:106) ~[optiq-core-0.9-drill-r20.jar:na] at
[jira] [Updated] (DRILL-2358) Ensure DrillScanRel differentiates skip-all, scan-all scan-some in a backward compatible fashion
[ https://issues.apache.org/jira/browse/DRILL-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanifi Gunes updated DRILL-2358: Assignee: Aman Sinha (was: Rahul Challapalli) Ensure DrillScanRel differentiates skip-all, scan-all scan-some in a backward compatible fashion -- Key: DRILL-2358 URL: https://issues.apache.org/jira/browse/DRILL-2358 Project: Apache Drill Issue Type: Sub-task Components: Query Planning Optimization Reporter: Hanifi Gunes Assignee: Aman Sinha Fix For: 1.0.0 This subtask proposes to change DrillScanRel so that it will understand relay skipped list of columns, if any, to readers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)