[jira] [Assigned] (DRILL-1867) Case sensitivity of query identifiers is inconstistent
[ https://issues.apache.org/jira/browse/DRILL-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni reassigned DRILL-1867: - Assignee: Jinfeng Ni Case sensitivity of query identifiers is inconstistent -- Key: DRILL-1867 URL: https://issues.apache.org/jira/browse/DRILL-1867 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 0.7.0 Reporter: Victoria Markman Assignee: Jinfeng Ni git.commit.id.abbrev=c65928f In this query it looks like identifies are case insensitive. 0: jdbc:drill:schema=dfs select Eventdate from `test.json` where eventdate is not null; ++ | Eventdate | ++ | 2014-01-01 | | 2014-01-01 | | 2014-02-01 | ++ 3 rows selected (0.107 seconds) However, in subquery, they suddenly change the behavior and become case sensitive. 0: jdbc:drill:schema=dfs select EVENTDATE from ( select Eventdate from `test.json` where eventdate is not null ); Query failed: Query failed: Failure validating SQL. org.eigenbase.util.EigenbaseContextException: From line 1, column 8 to line 1, column 16: Column 'EVENTDATE' not found in any table Error: exception while executing query: Failure while executing query. (state=,code=0) This happens not only when querying unstructured data, but Drill tables as well: 0: jdbc:drill:schema=dfs select Customerid from test where customerid = 100; ++ | Customerid | ++ | 100| ++ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-1913) Drill did not correctly preserve output field's case sensitivity
Jinfeng Ni created DRILL-1913: - Summary: Drill did not correctly preserve output field's case sensitivity Key: DRILL-1913 URL: https://issues.apache.org/jira/browse/DRILL-1913 Project: Apache Drill Issue Type: Bug Reporter: Jinfeng Ni Priority: Minor For the following query, {code} select EMPLOYEE_ID from cp.`employee.json` limit 2; +-+ | employee_id | +-+ | 1 | | 2 | +-+ {code} Even though upper-case EMPLOYEE_ID matches the field 'employee_id' in the input JSON file, Drill did not preset the upper-case in the output result, which seems to be not right. The plan coming out of optiq/calcite seems right, as it has a project which will do the renaming. But Drill's logical planning will remove such project, hence not being able to preserve the upper-case in the output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1867) Case sensitivity of query identifiers is inconstistent
[ https://issues.apache.org/jira/browse/DRILL-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261787#comment-14261787 ] Jinfeng Ni commented on DRILL-1867: --- Add test case for parquet / cdv data sources. Open two new JIRA to track existing issues related to case sensitivity handling : DRILL-1911 and DRILL-1913. In particular, DRILL-1911 is the memory leak and correctness issue when projection of same column with different case sensitivities. Case sensitivity of query identifiers is inconstistent -- Key: DRILL-1867 URL: https://issues.apache.org/jira/browse/DRILL-1867 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 0.7.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Attachments: 0001-DRILL-1867-Fix-case-insensitive-matching-for-subquer.patch git.commit.id.abbrev=c65928f In this query it looks like identifies are case insensitive. 0: jdbc:drill:schema=dfs select Eventdate from `test.json` where eventdate is not null; ++ | Eventdate | ++ | 2014-01-01 | | 2014-01-01 | | 2014-02-01 | ++ 3 rows selected (0.107 seconds) However, in subquery, they suddenly change the behavior and become case sensitive. 0: jdbc:drill:schema=dfs select EVENTDATE from ( select Eventdate from `test.json` where eventdate is not null ); Query failed: Query failed: Failure validating SQL. org.eigenbase.util.EigenbaseContextException: From line 1, column 8 to line 1, column 16: Column 'EVENTDATE' not found in any table Error: exception while executing query: Failure while executing query. (state=,code=0) This happens not only when querying unstructured data, but Drill tables as well: 0: jdbc:drill:schema=dfs select Customerid from test where customerid = 100; ++ | Customerid | ++ | 100| ++ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1905) UNION ALL of two identical json files fails with exception
[ https://issues.apache.org/jira/browse/DRILL-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262460#comment-14262460 ] Jinfeng Ni commented on DRILL-1905: --- +1. Looks good to me. For the change in Calcite, do we need to make sure 'ANY' is NULLable? Seems there is a recent issue where 'ANY' type is not treated as Nullable. UNION ALL of two identical json files fails with exception -- Key: DRILL-1905 URL: https://issues.apache.org/jira/browse/DRILL-1905 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.7.0 Reporter: Victoria Markman Assignee: Aman Sinha Fix For: 0.8.0 Attachments: 0001-Create-result-type-for-ANY-in-leastRestrictiveSqlTyp.patch, 0001-DRILL-1905-There-are-3-parts-of-the-fix-1-Check-comp.patch {code} git.commit.id.abbrev=e3ab2c1 {code} {code} 0: jdbc:drill:schema=dfs select * from `logs/2014/02/t.json`; ++++ | a1 | b1 | c1 | ++++ | 0 | 0 | true | | 0 | 0 | false | | 0 | 0 | false | | 1 | 1 | true | | 1 | 1 | true | ++++ 5 rows selected (0.099 seconds) 0: jdbc:drill:schema=dfs select * from `logs/2014/03/t.json`; ++++ | a1 | b1 | c1 | ++++ | 0 | 0 | true | | 0 | 0 | false | | 0 | 0 | false | | 1 | 1 | true | | 1 | 1 | true | ++++ 5 rows selected (0.065 seconds) 0: jdbc:drill:schema=dfs select * from `logs/2014/02/t.json` union all select * from `logs/2014/03/t.json`; Query failed: Query failed: Unexpected exception during fragment initialization: DrillUnionRel#1680 Error: exception while executing query: Failure while executing query. (state=,code=0) 2014-12-20 01:27:01,939 [2b6b2f9a-7e4e-1670-5210-c2a8dabea33b:foreman] ERROR o.a.drill.exec.work.foreman.Foreman - Error 4505ba6b-a2b6-4355-aac1-1edd74bc347d: Query failed: Unexpected exception during fragment initialization: DrillUnionRel#1680 org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: DrillUnionRel#1680 at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:194) [drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT] at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254) [drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] Caused by: java.lang.AssertionError: DrillUnionRel#1680 at org.eigenbase.rel.AbstractRelNode.getRowType(AbstractRelNode.java:211) ~[optiq-core-0.9-drill-r12.jar:na] at org.apache.drill.exec.planner.common.DrillUnionRelBase.isCompatible(DrillUnionRelBase.java:44) ~[drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT] at org.apache.drill.exec.planner.common.DrillUnionRelBase.init(DrillUnionRelBase.java:38) ~[drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT] at org.apache.drill.exec.planner.logical.DrillUnionRel.init(DrillUnionRel.java:42) ~[drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT] at org.apache.drill.exec.planner.logical.DrillUnionRel.copy(DrillUnionRel.java:49) ~[drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT] at org.apache.drill.exec.planner.logical.DrillUnionRel.copy(DrillUnionRel.java:38) ~[drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT] at org.eigenbase.rel.SetOpRel.copy(SetOpRel.java:73) ~[optiq-core-0.9-drill-r12.jar:na] at org.eigenbase.rel.SetOpRel.copy(SetOpRel.java:33) ~[optiq-core-0.9-drill-r12.jar:na] at org.eigenbase.relopt.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:472) ~[optiq-core-0.9-drill-r12.jar:na] at org.eigenbase.relopt.volcano.RelSubset.buildCheapestPlan(RelSubset.java:287) ~[optiq-core-0.9-drill-r12.jar:na] at org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:677) ~[optiq-core-0.9-drill-r12.jar:na] at
[jira] [Commented] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery
[ https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292184#comment-14292184 ] Jinfeng Ni commented on DRILL-2069: --- Upgrade to critical, since it's incorrect result, and it represents one quite typical query pattern. Star is not expanded correctly in the query with IN clause containing subquery -- Key: DRILL-2069 URL: https://issues.apache.org/jira/browse/DRILL-2069 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Critical t1.json {code} { a1: aa, b1: 1 } { a1: bb, b1: 2 } { a1: cc, b1: 3 } {code} t2.json {code} { a2: aa, b2: 1 } { a2: bb, b2: 2 } { a2: xx, b2: 10 } {code} Star is expanded incorrectly, we should get only columns from `t1.json` {code} 0: jdbc:drill:schema=dfs select * from `t1.json` where a1 in (select a2 from `t2.json`); +++++ | a2 | a1 | b1 |a10 | +++++ | aa | aa | 1 | aa | | bb | bb | 2 | bb | +++++ 2 rows selected (0.172 seconds) {code} explain plan {code} 00-01 Project(*=[$0]) 00-02Project(*=[$0]) 00-03 HashJoin(condition=[=($1, $2)], joinType=[inner]) 00-05Project(*=[$0], a1=[$1]) 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], files=[maprfs:/test/t1.json]]]) 00-04HashAgg(group=[{0}]) 00-06 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], files=[maprfs:/test/t2.json]]]) {code} Workaround - specify columns explicitly {code} 0: jdbc:drill:schema=dfs select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 in (select t2.a2 from `t2.json` t2); +++ | a1 |a10 | +++ | aa | aa | | bb | bb | +++ 2 rows selected (0.24 seconds) {code} Note to myself: include cases like below during verification: {code} 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in (select a2, b2 from `t2.json`); +++++++ | a2 | b2 | a1 | b1 |a10 |b10 | +++++++ | aa | 1 | aa | 1 | aa | 1 | | bb | 2 | bb | 2 | bb | 2 | +++++++ 2 rows selected (0.323 seconds) 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in (select * from `t2.json`); Query failed: SqlValidatorException: Values passed to IN operator must have compatible types Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery
[ https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2069: -- Priority: Critical (was: Major) Star is not expanded correctly in the query with IN clause containing subquery -- Key: DRILL-2069 URL: https://issues.apache.org/jira/browse/DRILL-2069 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Critical t1.json {code} { a1: aa, b1: 1 } { a1: bb, b1: 2 } { a1: cc, b1: 3 } {code} t2.json {code} { a2: aa, b2: 1 } { a2: bb, b2: 2 } { a2: xx, b2: 10 } {code} Star is expanded incorrectly, we should get only columns from `t1.json` {code} 0: jdbc:drill:schema=dfs select * from `t1.json` where a1 in (select a2 from `t2.json`); +++++ | a2 | a1 | b1 |a10 | +++++ | aa | aa | 1 | aa | | bb | bb | 2 | bb | +++++ 2 rows selected (0.172 seconds) {code} explain plan {code} 00-01 Project(*=[$0]) 00-02Project(*=[$0]) 00-03 HashJoin(condition=[=($1, $2)], joinType=[inner]) 00-05Project(*=[$0], a1=[$1]) 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], files=[maprfs:/test/t1.json]]]) 00-04HashAgg(group=[{0}]) 00-06 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], files=[maprfs:/test/t2.json]]]) {code} Workaround - specify columns explicitly {code} 0: jdbc:drill:schema=dfs select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 in (select t2.a2 from `t2.json` t2); +++ | a1 |a10 | +++ | aa | aa | | bb | bb | +++ 2 rows selected (0.24 seconds) {code} Note to myself: include cases like below during verification: {code} 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in (select a2, b2 from `t2.json`); +++++++ | a2 | b2 | a1 | b1 |a10 |b10 | +++++++ | aa | 1 | aa | 1 | aa | 1 | | bb | 2 | bb | 2 | bb | 2 | +++++++ 2 rows selected (0.323 seconds) 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in (select * from `t2.json`); Query failed: SqlValidatorException: Values passed to IN operator must have compatible types Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-1911) Querying same field multiple times with different case would hit memory leak and return incorrect result.
[ https://issues.apache.org/jira/browse/DRILL-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni reassigned DRILL-1911: - Assignee: Jinfeng Ni (was: Deneche A. Hakim) Querying same field multiple times with different case would hit memory leak and return incorrect result. -- Key: DRILL-1911 URL: https://issues.apache.org/jira/browse/DRILL-1911 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Jinfeng Ni Assignee: Jinfeng Ni Fix For: 0.8.0 git.commit.id.abbrev=309e1be If query the same field twice, with different case, Drill will throw memory assertion error. select employee_id, Employee_id from cp.`employee.json` limit 2; +-+ | employee_id | +-+ | 1 | | 2 | Query failed: Query failed: Failure while running fragment., Attempted to close accountor with 2 buffer(s) still allocatedfor QueryId: 2b5cc8eb-2817-aadb-e0fa-49272796592a, MajorFragmentId: 0, MinorFragmentId: 0. Total 1 allocation(s) of byte size(s): 4096, at stack location: org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:212) org.apache.drill.exec.vector.UInt1Vector.allocateNewSafe(UInt1Vector.java:137) org.apache.drill.exec.vector.NullableBigIntVector.allocateNewSafe(NullableBigIntVector.java:173) org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doAlloc(ProjectRecordBatch.java:229) org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:167) org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93) org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:97) org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:114) org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Also, notice that the query result only contains one field; the second field is missing. The plan looks fine. Drill Physical : 00-00Screen: rowcount = 463.0, cumulative cost = {1900.3 rows, 996.3 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 103 00-01 Project(employee_id=[$0], Employee_id=[$1]): rowcount = 463.0, cumulative cost = {1854.0 rows, 950.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 102 00-02SelectionVectorRemover: rowcount = 463.0, cumulative cost = {1391.0 rows, 942.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 101 00-03 Limit(fetch=[2]): rowcount = 463.0, cumulative cost = {928.0 rows, 479.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 100 00-04Project(employee_id=[$0], Employee_id=[$0]): rowcount = 463.0, cumulative cost = {926.0 rows, 471.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 99 00-05 Scan(groupscan=[EasyGroupScan [selectionRoot=/employee.json, numFiles=1, columns=[`employee_id`], files=[/employee.json]]]): rowcount = 463.0, cumulative cost = {463.0 rows, 463.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 98 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2168) Wrong result on grouping by expression involving CONCAT function
[ https://issues.apache.org/jira/browse/DRILL-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306347#comment-14306347 ] Jinfeng Ni commented on DRILL-2168: --- The plan looks invalid. The $1 in the following Project operator is referring to the sum() aggregation result, which is not right. 00-02Project(EXPR$0=[concat(CAST(EXTRACT(FLAG(DAY), $1)) 00-03 HashAgg(group=[{0}], EXPR$1=[SUM($1)]) Wrong result on grouping by expression involving CONCAT function Key: DRILL-2168 URL: https://issues.apache.org/jira/browse/DRILL-2168 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Critical {code} 0: jdbc:drill:schema=dfs select * from test; +++ | a1 | b1 | +++ | 51237400 | 2014-03-16 03:55:21.0 | | -691523338 | 2014-02-13 15:47:22.0 | | -1843395360 | 2014-01-14 03:31:27.0 | | 1095015454 | 2014-09-27 17:35:21.0 | | -48817354 | 2014-01-17 16:14:25.0 | | 59892266 | 2014-04-23 05:08:34.0 | | -827965492 | 2014-06-21 07:29:12.0 | | -1281245980 | 2014-10-25 15:49:46.0 | | -1778510302 | 2014-07-31 04:11:53.0 | | 1346460386 | 2014-07-31 04:11:53.0 | +++ 10 rows selected (0.072 seconds) 0: jdbc:drill:schema=dfs select . . . . . . . . . . . . concat(cast(extract(day from b1) as varchar(10)), '-', cast(extract(month from b1) as varchar(10)),'-',cast(extract(year from b1) as varchar(10))), . . . . . . . . . . . . sum(a1) . . . . . . . . . . . . from . . . . . . . . . . . . test . . . . . . . . . . . . group by . . . . . . . . . . . . concat(cast(extract(day from b1) as varchar(10)), '-', cast(extract(month from b1) as varchar(10)),'-',cast(extract(year from b1) as varchar(10))); +++ | EXPR$0 | EXPR$1 | +++ | 1-1-1970 | 51237400 | | 23-12-1969 | -691523338 | | 10-12-1969 | -1843395360 | | 13-1-1970 | 1095015454 | | 31-12-1969 | -48817354 | | 1-1-1970 | 59892266 | | 22-12-1969 | -827965492 | | 17-12-1969 | -1281245980 | | 26-12-1969 | -432049916 | +++ 9 rows selected (0.103 seconds) {code} Query plan: {code} 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) 00-02Project(EXPR$0=[concat(CAST(EXTRACT(FLAG(DAY), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(MONTH), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], EXPR$1=[$1]) 00-03 HashAgg(group=[{0}], EXPR$1=[SUM($1)]) 00-04Project($f0=[CONCAT(CAST(EXTRACT(FLAG(DAY), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(MONTH), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], a1=[$1]) 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/test]], selectionRoot=/aggregation/test, numFiles=1, columns=[`b1`, `a1`]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2107) Hash Join throw IOBE for a query with exists subquery.
[ https://issues.apache.org/jira/browse/DRILL-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2107: -- Priority: Blocker (was: Critical) Hash Join throw IOBE for a query with exists subquery. --- Key: DRILL-2107 URL: https://issues.apache.org/jira/browse/DRILL-2107 Project: Apache Drill Issue Type: New Feature Components: Execution - Operators Reporter: Jinfeng Ni Assignee: Mehant Baid Priority: Blocker Attachments: q4_1_hj.json, q4_1_hj_phy.txt, q4_1_mj.json, q4_1_mj_phy.txt I hit an IOBE for TestTpchDistributed Q4, when I tried to enable an optimizer rule. Then, I simplified Q4 to the following, and still re-produce the same IOBE. {code} select o.o_orderpriority from cp.`tpch/orders.parquet` o where exists ( select * from cp.`tpch/lineitem.parquet` l where l.l_orderkey = o.o_orderkey ) ; {code} Stack trace of the exception: {code} java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_45] at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_45] at org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:232) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById(RecordBatchLoader.java:149) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getValueAccessorById(UnorderedReceiverBatch.java:132) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.test.generated.HashTableGen307.doSetup(HashTableTemplate.java:71) ~[na:na] at org.apache.drill.exec.test.generated.HashTableGen307.updateBatches(HashTableTemplate.java:473) ~[na:na] at org.apache.drill.exec.test.generated.HashJoinProbeGen313.executeProbePhase(HashJoinProbeTemplate.java:139) ~[na:na] at org.apache.drill.exec.test.generated.HashJoinProbeGen313.probeAndProject(HashJoinProbeTemplate.java:223) ~[na:na] at org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:227) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] {code} The physical plan seems to be correct, after enabling the new rule. Actually, if I disable HashJoin, and use merge join for the query, it works fine. So, seems the IOBE exposes some bug in HashJoin. To re-produce this issue, two options: 1 ) - Modify DrillRuleSets.java, remove the comment before SwapJoinRule - alter session set `planner.slice_target` = 10; - run the query 2) use the attached physical plan in json file, and use submitplan to submit the physical plan. For comparison, I also attached the physical plan when disabling hashjoin (use merge join), and the explain plan at physical operator level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery
[ https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301871#comment-14301871 ] Jinfeng Ni commented on DRILL-2069: --- Revise patch based on review comments. Add two new unit test cases. Star is not expanded correctly in the query with IN clause containing subquery -- Key: DRILL-2069 URL: https://issues.apache.org/jira/browse/DRILL-2069 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Critical Attachments: 0001-DRILL-2069-Fix-star-query-prefix-logic-it-s-used-wit.patch t1.json {code} { a1: aa, b1: 1 } { a1: bb, b1: 2 } { a1: cc, b1: 3 } {code} t2.json {code} { a2: aa, b2: 1 } { a2: bb, b2: 2 } { a2: xx, b2: 10 } {code} Star is expanded incorrectly, we should get only columns from `t1.json` {code} 0: jdbc:drill:schema=dfs select * from `t1.json` where a1 in (select a2 from `t2.json`); +++++ | a2 | a1 | b1 |a10 | +++++ | aa | aa | 1 | aa | | bb | bb | 2 | bb | +++++ 2 rows selected (0.172 seconds) {code} explain plan {code} 00-01 Project(*=[$0]) 00-02Project(*=[$0]) 00-03 HashJoin(condition=[=($1, $2)], joinType=[inner]) 00-05Project(*=[$0], a1=[$1]) 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], files=[maprfs:/test/t1.json]]]) 00-04HashAgg(group=[{0}]) 00-06 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], files=[maprfs:/test/t2.json]]]) {code} Workaround - specify columns explicitly {code} 0: jdbc:drill:schema=dfs select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 in (select t2.a2 from `t2.json` t2); +++ | a1 |a10 | +++ | aa | aa | | bb | bb | +++ 2 rows selected (0.24 seconds) {code} Note to myself: include cases like below during verification: {code} 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in (select a2, b2 from `t2.json`); +++++++ | a2 | b2 | a1 | b1 |a10 |b10 | +++++++ | aa | 1 | aa | 1 | aa | 1 | | bb | 2 | bb | 2 | bb | 2 | +++++++ 2 rows selected (0.323 seconds) 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in (select * from `t2.json`); Query failed: SqlValidatorException: Values passed to IN operator must have compatible types Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery
[ https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2069: -- Attachment: (was: 0001-DRILL-2069-Fix-star-query-prefix-logic-it-s-used-wit.patch) Star is not expanded correctly in the query with IN clause containing subquery -- Key: DRILL-2069 URL: https://issues.apache.org/jira/browse/DRILL-2069 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Critical t1.json {code} { a1: aa, b1: 1 } { a1: bb, b1: 2 } { a1: cc, b1: 3 } {code} t2.json {code} { a2: aa, b2: 1 } { a2: bb, b2: 2 } { a2: xx, b2: 10 } {code} Star is expanded incorrectly, we should get only columns from `t1.json` {code} 0: jdbc:drill:schema=dfs select * from `t1.json` where a1 in (select a2 from `t2.json`); +++++ | a2 | a1 | b1 |a10 | +++++ | aa | aa | 1 | aa | | bb | bb | 2 | bb | +++++ 2 rows selected (0.172 seconds) {code} explain plan {code} 00-01 Project(*=[$0]) 00-02Project(*=[$0]) 00-03 HashJoin(condition=[=($1, $2)], joinType=[inner]) 00-05Project(*=[$0], a1=[$1]) 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], files=[maprfs:/test/t1.json]]]) 00-04HashAgg(group=[{0}]) 00-06 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], files=[maprfs:/test/t2.json]]]) {code} Workaround - specify columns explicitly {code} 0: jdbc:drill:schema=dfs select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 in (select t2.a2 from `t2.json` t2); +++ | a1 |a10 | +++ | aa | aa | | bb | bb | +++ 2 rows selected (0.24 seconds) {code} Note to myself: include cases like below during verification: {code} 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in (select a2, b2 from `t2.json`); +++++++ | a2 | b2 | a1 | b1 |a10 |b10 | +++++++ | aa | 1 | aa | 1 | aa | 1 | | bb | 2 | bb | 2 | bb | 2 | +++++++ 2 rows selected (0.323 seconds) 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in (select * from `t2.json`); Query failed: SqlValidatorException: Values passed to IN operator must have compatible types Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery
[ https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2069: -- Attachment: 0001-DRILL-2069-Fix-star-query-prefix-logic-it-s-used-wit.patch Star is not expanded correctly in the query with IN clause containing subquery -- Key: DRILL-2069 URL: https://issues.apache.org/jira/browse/DRILL-2069 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Critical Attachments: 0001-DRILL-2069-Fix-star-query-prefix-logic-it-s-used-wit.patch t1.json {code} { a1: aa, b1: 1 } { a1: bb, b1: 2 } { a1: cc, b1: 3 } {code} t2.json {code} { a2: aa, b2: 1 } { a2: bb, b2: 2 } { a2: xx, b2: 10 } {code} Star is expanded incorrectly, we should get only columns from `t1.json` {code} 0: jdbc:drill:schema=dfs select * from `t1.json` where a1 in (select a2 from `t2.json`); +++++ | a2 | a1 | b1 |a10 | +++++ | aa | aa | 1 | aa | | bb | bb | 2 | bb | +++++ 2 rows selected (0.172 seconds) {code} explain plan {code} 00-01 Project(*=[$0]) 00-02Project(*=[$0]) 00-03 HashJoin(condition=[=($1, $2)], joinType=[inner]) 00-05Project(*=[$0], a1=[$1]) 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], files=[maprfs:/test/t1.json]]]) 00-04HashAgg(group=[{0}]) 00-06 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], files=[maprfs:/test/t2.json]]]) {code} Workaround - specify columns explicitly {code} 0: jdbc:drill:schema=dfs select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 in (select t2.a2 from `t2.json` t2); +++ | a1 |a10 | +++ | aa | aa | | bb | bb | +++ 2 rows selected (0.24 seconds) {code} Note to myself: include cases like below during verification: {code} 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in (select a2, b2 from `t2.json`); +++++++ | a2 | b2 | a1 | b1 |a10 |b10 | +++++++ | aa | 1 | aa | 1 | aa | 1 | | bb | 2 | bb | 2 | bb | 2 | +++++++ 2 rows selected (0.323 seconds) 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in (select * from `t2.json`); Query failed: SqlValidatorException: Values passed to IN operator must have compatible types Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2095) Order by on a repeated index inside a sub query results in an NPE
[ https://issues.apache.org/jira/browse/DRILL-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2095: -- Fix Version/s: 1.0.0 Order by on a repeated index inside a sub query results in an NPE - Key: DRILL-2095 URL: https://issues.apache.org/jira/browse/DRILL-2095 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Jinfeng Ni Priority: Critical Fix For: 1.0.0 git.commit.id.abbrev=3e33880 Data Set : {code} { id : 1, list : [1,2] } {code} The below query succeeds {code} 0: jdbc:drill:schema=dfs.drillTestDir select d.id id from `data.json` d order by d.list[0]; ++ | id | ++ | 1 | ++ 1 row selected (0.12 seconds) {code} However when we use the same query inside a sub-query we get an NPE {code} 0: jdbc:drill:schema=dfs.drillTestDir select s.id from (select d.id id from `data.json` d order by d.list[0]) s; Query failed: NullPointerException: Error: exception while executing query: Failure while executing query. (state=,code=0) {code} Explain plan also results in the same NPE for the above query -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2087) Some planner option setting might produce a wrong plan and wrong query result for join query.
[ https://issues.apache.org/jira/browse/DRILL-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2087: -- Fix Version/s: 0.9.0 Some planner option setting might produce a wrong plan and wrong query result for join query. - Key: DRILL-2087 URL: https://issues.apache.org/jira/browse/DRILL-2087 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Jacques Nadeau Fix For: 0.9.0 Attachments: 0001-DRILL-2087-Fix-an-incorrect-join-plan-by-ensure-eith.patch I have the following parquet data ( essentially tpch's nation/region tables, but break into multiple parts). {code} ls -l nation total 24 -rw-r--r--@ 1 jni staff 1376 Jan 27 16:47 01.parquet -rw-r--r--@ 1 jni staff 1400 Jan 27 16:47 02.parquet -rw-r--r--@ 1 jni staff 1279 Jan 27 16:47 03.parquet ls -l region total 24 -rw-r--r--@ 1 jni staff 564 Jan 27 16:54 01.parquet -rw-r--r--@ 1 jni staff 491 Jan 27 16:54 02.parquet -rw-r--r--@ 1 jni staff 506 Jan 27 16:54 03.parquet {code} With the default planner setting, the following query produce the correct result : {code} 0: jdbc:drill:zk=local select t1.n_nationkey from dfs.`/Users/jni/work/data/parquet/region` t2 join dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = t2.r_regionkey; SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. +-+ | n_nationkey | +-+ | 18 | | 12 | | 9 | | 8 | | 21 | | 19 | | 7 | | 6 | | 23 | | 22 | | 20 | | 13 | | 11 | | 10 | | 4 | | 0 | | 16 | | 15 | | 14 | | 5 | | 24 | | 17 | | 3 | | 2 | | 1 | +-+ 25 rows selected (2.057 seconds) {code} However, if I modify the planner option (`planner.slice_target` and `planner.join.row_count_estimate_factor` ) as following, then the same query would return 0 row, which is not correct. {code} 0: jdbc:drill:zk=local alter session set `planner.slice_target` = 10; +++ | ok | summary | +++ | true | planner.slice_target updated. | +++ 1 row selected (0.093 seconds) 0: jdbc:drill:zk=local alter session set `planner.join.row_count_estimate_factor` = 0.1; +++ | ok | summary | +++ | true | planner.join.row_count_estimate_factor updated. | +++ 1 row selected (0.041 seconds) 0: jdbc:drill:zk=local select t1.n_nationkey from dfs.`/Users/jni/work/data/parquet/region` t2 join dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = t2.r_regionkey; +-+ | n_nationkey | +-+ +-+ No rows selected (0.71 seconds) {code} If we look at explain plan result, after changing the planner option, we will see for the JOIN operator, the LEFT does not have any EXCHANGE operator, while the RIGHT has HashToRandomExchange operator. That seems to be not a valid plan. {code} 00-00Screen 00-01 UnionExchange 01-01Project(n_nationkey=[$2]) 01-02 HashJoin(condition=[=($1, $0)], joinType=[inner]) 01-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/region]], selectionRoot=/Users/jni/work/data/parquet/region, numFiles=1, columns=[`r_regionkey`]]]) 01-03HashToRandomExchange(dist0=[[$0]]) 02-01 Project(n_regionkey=[$1], n_nationkey=[$0]) 02-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/nation]], selectionRoot=/Users/jni/work/data/parquet/nation, numFiles=1, columns=[`n_regionkey`, `n_nationkey`]]]) {code} The cause of this problem seems to be that Drill will remove EXCHANGE operator under some conditions. That condition does not guarantee that a JOIN operator always either has EXCHANGE on both sides, or none. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2081) Filter not being pushed down into the sub query when we use flatten
[ https://issues.apache.org/jira/browse/DRILL-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2081: -- Fix Version/s: 1.0.0 Filter not being pushed down into the sub query when we use flatten --- Key: DRILL-2081 URL: https://issues.apache.org/jira/browse/DRILL-2081 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Jinfeng Ni Priority: Minor Fix For: 1.0.0 git.commit.id.abbrev=3c6d0ef The plan indicates that the query is not being pushed down into the subquery {code} 0: jdbc:drill:schema=dfs.drillTestDir explain plan for select s.evnts.evnt_id from (select d.type type, flatten(d.events) evnts from `data.json` d) s where s.evnts.type = 'cmpgn4' and s.type='web'; +++ |text|json| +++ | 00-00Screen 00-01 Project(EXPR$0=[ITEM($1, 'evnt_id')]) 00-02SelectionVectorRemover 00-03 Filter(condition=[AND(=(ITEM($1, 'type'), 'cmpgn4'), =($0, 'web'))]) 00-04Project(type=[$1], evnts=[$2]) 00-05 Flatten(flattenField=[$2]) 00-06Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$0]) 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/testdata/flatten_operators/data.json, numFiles=1, columns=[`type`, `events`], files=[maprfs:/drill/testdata/flatten_operators/data.json]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2155) Subquery in projection list that returns scalar result throws an exception
[ https://issues.apache.org/jira/browse/DRILL-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2155: -- Fix Version/s: 0.9.0 Assignee: Sean Hsuan-Yi Chu (was: Jinfeng Ni) Subquery in projection list that returns scalar result throws an exception -- Key: DRILL-2155 URL: https://issues.apache.org/jira/browse/DRILL-2155 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Sean Hsuan-Yi Chu Fix For: 0.9.0 Since there is no correlation, it shouldn't even plan ... I think we should throw the same error as in DRILL-1921 (cross join) {code} 0: jdbc:drill:schema=dfs select c_integer, (select current_timestamp from t1 limit 1) from t1; Query failed: AssertionError: must call validate first Error: exception while executing query: Failure while executing query. (state=,code=0) 0: jdbc:drill:schema=dfs select c_integer, (select current_timestamp from t1 limit 1) b from t1; Query failed: AssertionError: must call validate first Error: exception while executing query: Failure while executing query. (state=,code=0) {code} drillbit.log {code} 2015-02-03 22:33:16,008 [2b2eb354-2084-7da7-4e33-0debc7a90921:foreman] INFO o.a.drill.exec.work.foreman.Foreman - State change requested. PENDING -- FAILED org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: must call validate first at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:197) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] Caused by: java.lang.AssertionError: must call validate first at org.eigenbase.sql.validate.IdentifierNamespace.resolve(IdentifierNamespace.java:139) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:1686) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:494) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:474) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2657) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertQueryOrInList(SqlToRelConverter.java:1267) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertExists(SqlToRelConverter.java:1240) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.substituteSubquery(SqlToRelConverter.java:1003) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.replaceSubqueries(SqlToRelConverter.java:859) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelConverter.java:3279) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:519) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:474) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2657) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:432) ~[optiq-core-0.9-drill-r18.jar:na] at net.hydromatic.optiq.prepare.PlannerImpl.convert(PlannerImpl.java:186) ~[optiq-core-0.9-drill-r18.jar:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:149) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:126) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at
[jira] [Updated] (DRILL-2094) Drill has problems with reading json fields when used in a subquery
[ https://issues.apache.org/jira/browse/DRILL-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2094: -- Fix Version/s: 1.0.0 Drill has problems with reading json fields when used in a subquery --- Key: DRILL-2094 URL: https://issues.apache.org/jira/browse/DRILL-2094 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Jinfeng Ni Fix For: 1.0.0 git.commit.id.abbrev=3e33880 Data Set : {code} { id : 1, list : [1,2] } {code} The below query works {code} 0: jdbc:drill:schema=dfs.drillTestDir select id from `temp1.json` order by list[0]; ++ | id | ++ | 1 | ++ 1 row selected (0.146 seconds) {code} However when I used the same exact query as part of a sub-query, I get an error from drill {code} 0: jdbc:drill:schema=dfs.drillTestDir select s.id from (select id from `temp1.json` order by list[0]) s; Query failed: SqlValidatorException: Table 'list' not found Error: exception while executing query: Failure while executing query. (state=,code=0) {code} Explain plan also does not work and it returns the same problem -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2173) Enable querying partition information without reading all data
[ https://issues.apache.org/jira/browse/DRILL-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2173: -- Fix Version/s: 0.8.0 Enable querying partition information without reading all data -- Key: DRILL-2173 URL: https://issues.apache.org/jira/browse/DRILL-2173 Project: Apache Drill Issue Type: New Feature Components: Query Planning Optimization Affects Versions: 0.7.0 Reporter: Jason Altekruse Assignee: Jason Altekruse Fix For: 0.8.0 Attachments: Drill-2173-partition-queries-with-pruning.patch When reading a series of files in nested directories, Drill currently adds columns representing the directory structure that was traversed to reach the file currently being read. These columns are stored as varchar under tha names dir0, dir1, ... As these are just regular columns, Drill allows arbitrary queries against this data, in terms of aggregates, filter, sort, etc. To allow optimizing reads, basic partition pruning has already been added to prune in the case of an expression like dir0 = 2015 or a simple in list, which is converted during planning to a series of ORs of equals expressions. If users want to query the directory information dynamically, and not include specific directory names in the query, this will prompt a full table scan and filter operation on the dir columns. This enhancement is to allow more complex queries to be run against directory metadata, and only scanning the matching directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2073) Filter on a field in a nested repeated type throws an exception
[ https://issues.apache.org/jira/browse/DRILL-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2073: -- Component/s: (was: Query Planning Optimization) Execution - Data Types Filter on a field in a nested repeated type throws an exception --- Key: DRILL-2073 URL: https://issues.apache.org/jira/browse/DRILL-2073 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Reporter: Rahul Challapalli Assignee: Mehant Baid Attachments: error.log git.commit.id.abbrev=3c6d0ef Data Set : {code} { rm: [ {rptd: [{ a: foo},{b:boo}]} ], rm1:[{a:foo},{b:boo}] } {code} The below query tries to apply a filter on field which does not exist. However the field is still present in a different element of the same array. {code} select rm[0].rptd[0] from `temp.json` where rm[0].rptd[0].b = 'boo'; Query failed: Query failed: Failure while running fragment., index: -4, length: 4 (expected: range(0, 16384)) [ 01887113-c758-41bf-96d1-5eede9b1e411 on qa-node191.qa.lab:31010 ] [ 01887113-c758-41bf-96d1-5eede9b1e411 on qa-node191.qa.lab:31010 ] {code} The above query should result in an empty result.The above error only happens when we apply a filter on a nested array element. The below query works fine {code} 0: jdbc:drill:schema=dfs.drillTestDir select rm1[0].a from `nested.json` where rm1[0].b = 'boo'; ++ | EXPR$0 | ++ ++ {code} Attached the log file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2146) Flatten Function on two layer deep field fails to plan
[ https://issues.apache.org/jira/browse/DRILL-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2146: -- Fix Version/s: 0.9.0 Flatten Function on two layer deep field fails to plan -- Key: DRILL-2146 URL: https://issues.apache.org/jira/browse/DRILL-2146 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Sean Hsuan-Yi Chu Assignee: Sean Hsuan-Yi Chu Fix For: 0.9.0 Use the same data in DRILL-2012 to run this query: select flatten(j.batters.batter) bb from dfs_test.`%s` j where j.type = 'donut' fails to plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2053) Column names are case sensitive if column is coming from WITH clause
[ https://issues.apache.org/jira/browse/DRILL-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2053: -- Fix Version/s: 1.0.0 Column names are case sensitive if column is coming from WITH clause Key: DRILL-2053 URL: https://issues.apache.org/jira/browse/DRILL-2053 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Fix For: 1.0.0 test.json {code} {customerid:100,customername:AAA} {customerid:101,customername:BBB} {customerid:102,customername:CCC} {code} Wrong result: {code} 0: jdbc:drill:schema=dfs with a as ( select * from `test.json` ) select * from a, `test.json` b where a.CUSTOMERID = b.CUSTOMERID; ++--+-+---+ | customerid | customername | customerid0 | customername0 | ++--+-+---+ ++--+-+---+ No rows selected (0.202 seconds) {code} Correct result, when column name matches the case of the column name in the json file: {code} 0: jdbc:drill:schema=dfs with a as ( select * from `test.json` ) select * from a, `test.json` b where a.customerid = b.customerid; ++--+-+---+ | customerid | customername | customerid0 | customername0 | ++--+-+---+ | 100| AAA | 100 | AAA | | 101| BBB | 101 | BBB | | 102| CCC | 102 | CCC | ++--+-+---+ 3 rows selected (0.204 seconds) {code} Correct result when column does not match case, but is coming directly from the table: {code} 0: jdbc:drill:schema=dfs with a as ( select * from `test.json` ) select * from a, `test.json` b where a.customerid = b.customerID; ++--+-+---+ | customerid | customername | customerid0 | customername0 | ++--+-+---+ | 100| AAA | 100 | AAA | | 101| BBB | 101 | BBB | | 102| CCC | 102 | CCC | ++--+-+---+ 3 rows selected (0.197 seconds) {code} If you change case of a column name that comes from subquery (WITH clause), this is where it goes all wrong: {code} 0: jdbc:drill:schema=dfs with a as ( select * from `test.json` ) select * from a, `test.json` b where a.Customerid = b.customerid; ++--+-+---+ | customerid | customername | customerid0 | customername0 | ++--+-+---+ ++--+-+---+ No rows selected (0.186 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2157) Directory pruning on subdirectories only and data type conversions for directory filters
[ https://issues.apache.org/jira/browse/DRILL-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2157: -- Fix Version/s: 1.0.0 Assignee: Aman Sinha (was: Jinfeng Ni) Directory pruning on subdirectories only and data type conversions for directory filters Key: DRILL-2157 URL: https://issues.apache.org/jira/browse/DRILL-2157 Project: Apache Drill Issue Type: Improvement Components: Query Planning Optimization Affects Versions: Future Reporter: Andries Engelbrecht Assignee: Aman Sinha Priority: Minor Fix For: 1.0.0 Drill will scan all files and directories when using only a subdirectory as a predicate. Additionally if the data type for the directory filter is not a string and is converted Drill will also first scan all the subdirectories adn files before applying the filter. My current observation is that for a directory structure as listed below, the pruning only works if the full tree is provided. If only a lower level directory is supplied in the filter condition Drill only uses it as a filter. With directory structure as below /2015 /01 /10 /11 /12 /13 /14 Query: select count(id) from `/foo` t where dir0='2015' and dir1='01' and dir2='10' Produces the correct pruning and query plan 01-02Project(id=[$3]): rowcount = 3670316.0, cumulative cost = {1.1010948E7 rows, 1.4681284E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 28434 01-03 Project(dir0=[$0], dir1=[$3], dir2=[$2], id=[$1]): rowcount = 3670316.0, cumulative cost = {7340632.0 rows, 1.468128E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 28433 01-04Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, numFiles=24, columns=[`dir0`, `dir1`, `dir2`, `id`] However: select count(id) from `/foo` t where dir2='10' Produces full scan of all sub directories and only applies a filter condition after the fact. Notice the numFiles between the 2, even though it lists columns in the base scan 01-04Filter(condition=[=($0, '10')]): rowcount = 9423761.7, cumulative cost = {1.88475234E8 rows, 3.76950476E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 27470 01-05 Project(dir2=[$1], id=[$0]): rowcount = 6.2825078E7, cumulative cost = {1.25650156E8 rows, 1.25650164E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 27469 01-06Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, numFiles=405, columns=[`dir2`, `id`] Also using the wrong data type for the filter produces a full scan select count(id) from `/foo` where dir_year=2015 and dir_month=01 and dir_day=14 Produces 01-04Filter(condition=[AND(=(CAST($1):ANY NOT NULL, 2015), =(CAST($2):ANY NOT NULL, 1), =(CAST($3):ANY NOT NULL, 10))]): rowcount = 212034.63825, cumulative cost = {1.88475234E8 rows, 1.005201264E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 34910 01-05 Project(id=[$2], dir0=[$3], dir1=[$1], dir2=[$0]): rowcount = 6.2825078E7, cumulative cost = {1.25650156E8 rows, 2.51300328E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 34909 01-06Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, numFiles=405, columns=[`id`, `dir0`, `dir1`, `dir2`], -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2081) Filter not being pushed down into the sub query when we use flatten
[ https://issues.apache.org/jira/browse/DRILL-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309710#comment-14309710 ] Jinfeng Ni commented on DRILL-2081: --- s.evnts.type = 'cmpgn4' can not pushed into subquery, since the column comes from FLATTEN operator. Are you expecting that filter of s.type = 'web' is pushed down? Filter not being pushed down into the sub query when we use flatten --- Key: DRILL-2081 URL: https://issues.apache.org/jira/browse/DRILL-2081 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Jinfeng Ni Priority: Minor git.commit.id.abbrev=3c6d0ef The plan indicates that the query is not being pushed down into the subquery {code} 0: jdbc:drill:schema=dfs.drillTestDir explain plan for select s.evnts.evnt_id from (select d.type type, flatten(d.events) evnts from `data.json` d) s where s.evnts.type = 'cmpgn4' and s.type='web'; +++ |text|json| +++ | 00-00Screen 00-01 Project(EXPR$0=[ITEM($1, 'evnt_id')]) 00-02SelectionVectorRemover 00-03 Filter(condition=[AND(=(ITEM($1, 'type'), 'cmpgn4'), =($0, 'web'))]) 00-04Project(type=[$1], evnts=[$2]) 00-05 Flatten(flattenField=[$2]) 00-06Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$0]) 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/testdata/flatten_operators/data.json, numFiles=1, columns=[`type`, `events`], files=[maprfs:/drill/testdata/flatten_operators/data.json]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2170) Wrong result when joining to a subquery that has group-by, order-by and LIMIT
[ https://issues.apache.org/jira/browse/DRILL-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2170: -- Fix Version/s: 0.8.0 Wrong result when joining to a subquery that has group-by, order-by and LIMIT - Key: DRILL-2170 URL: https://issues.apache.org/jira/browse/DRILL-2170 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.7.0 Reporter: Aman Sinha Assignee: Aman Sinha Priority: Critical Fix For: 0.8.0 On TPCH SF1: first query gives right result, second produces wrong result. The main difference is the order-by in the second query is on the aggregation function, not the grouping key. {code} // Correct result : jdbc:drill:zk=local select count(*) from (select l_orderkey, sum(l_quantity), sum(l_extendedprice) from lineitem group by l_orderkey order by 1 limit 100) sq inner join orders o on sq.l_orderkey = o.o_orderkey; ++ | EXPR$0 | ++ | 100| ++ 1 row selected (2.67 seconds) // Wrong result 0: jdbc:drill:zk=local select count(*) from (select l_orderkey, sum(l_quantity), sum(l_extendedprice) from lineitem group by l_orderkey order by 3 limit 100) sq inner join orders o on sq.l_orderkey = o.o_orderkey; ++ | EXPR$0 | ++ | 400| ++ 1 row selected (3.163 seconds) The plan for the second query shows that there's a missing SingleMergeExchange after the TopN operation. This is needed because before the TopN we did a HashToRandomExchange and since we are doing a LIMIT, we need to merge the output of TopN into a single stream. {code} 0: jdbc:drill:zk=local explain plan for select count(*) from (select l_orderkey, sum(l_quantity), sum(l_extendedprice) from lineitem group by l_orderkey order by 3 limit 100) sq inner join orders o on sq.l_orderkey = o.o_orderkey; +++ |text|json| +++ | 00-00Screen 00-01 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) 00-02UnionExchange 01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) 01-02Project($f0=[0]) 01-03 HashJoin(condition=[=($0, $1)], joinType=[inner]) 01-05HashToRandomExchange(dist0=[[$0]]) 02-01 Project(l_orderkey=[$0]) 02-02SelectionVectorRemover 02-03 Limit(fetch=[100]) 02-04SelectionVectorRemover 02-05 TopN(limit=[100]) 02-06HashToRandomExchange(dist0=[[$2]]) 04-01 HashAgg(group=[{0}], EXPR$1=[SUM($1)], EXPR$2=[SUM($2)]) 04-02HashToRandomExchange(dist0=[[$0]]) 05-01 HashAgg(group=[{0}], EXPR$1=[SUM($1)], EXPR$2=[SUM($2)]) 05-02Project(l_orderkey=[$1], l_quantity=[$2], l_extendedprice=[$0]) 05-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpch-sf1/lineitem]], selectionRoot=/Users/asinha/data/tpch-sf1/lineitem, numFiles=1, columns=[`l_orderkey`, `l_quantity`, `l_extendedprice`]]]) 01-04HashToRandomExchange(dist0=[[$0]]) 03-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpch-sf1/orders]], selectionRoot=/Users/asinha/data/tpch-sf1/orders, numFiles=1, columns=[`o_orderkey`]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery
[ https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2069: -- Fix Version/s: 0.8.0 Star is not expanded correctly in the query with IN clause containing subquery -- Key: DRILL-2069 URL: https://issues.apache.org/jira/browse/DRILL-2069 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Critical Fix For: 0.8.0 Attachments: 0001-DRILL-2069-Fix-star-query-prefix-logic-it-s-used-wit.patch t1.json {code} { a1: aa, b1: 1 } { a1: bb, b1: 2 } { a1: cc, b1: 3 } {code} t2.json {code} { a2: aa, b2: 1 } { a2: bb, b2: 2 } { a2: xx, b2: 10 } {code} Star is expanded incorrectly, we should get only columns from `t1.json` {code} 0: jdbc:drill:schema=dfs select * from `t1.json` where a1 in (select a2 from `t2.json`); +++++ | a2 | a1 | b1 |a10 | +++++ | aa | aa | 1 | aa | | bb | bb | 2 | bb | +++++ 2 rows selected (0.172 seconds) {code} explain plan {code} 00-01 Project(*=[$0]) 00-02Project(*=[$0]) 00-03 HashJoin(condition=[=($1, $2)], joinType=[inner]) 00-05Project(*=[$0], a1=[$1]) 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], files=[maprfs:/test/t1.json]]]) 00-04HashAgg(group=[{0}]) 00-06 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], files=[maprfs:/test/t2.json]]]) {code} Workaround - specify columns explicitly {code} 0: jdbc:drill:schema=dfs select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 in (select t2.a2 from `t2.json` t2); +++ | a1 |a10 | +++ | aa | aa | | bb | bb | +++ 2 rows selected (0.24 seconds) {code} Note to myself: include cases like below during verification: {code} 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in (select a2, b2 from `t2.json`); +++++++ | a2 | b2 | a1 | b1 |a10 |b10 | +++++++ | aa | 1 | aa | 1 | aa | 1 | | bb | 2 | bb | 2 | bb | 2 | +++++++ 2 rows selected (0.323 seconds) 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in (select * from `t2.json`); Query failed: SqlValidatorException: Values passed to IN operator must have compatible types Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1921) Throw unsupported error message some set operators that are not currently supported
[ https://issues.apache.org/jira/browse/DRILL-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310094#comment-14310094 ] Jinfeng Ni commented on DRILL-1921: --- +1 Throw unsupported error message some set operators that are not currently supported --- Key: DRILL-1921 URL: https://issues.apache.org/jira/browse/DRILL-1921 Project: Apache Drill Issue Type: Bug Components: SQL Parser Reporter: Victoria Markman Assignee: Jinfeng Ni Fix For: 0.8.0 Attachments: DRILL-1921.1.patch, DRILL-1921.2.patch, DRILL-1921.3.patch, DRILL-1921.4.patch, DRILL-1921.5.patch, DRILL-1921.6.patch, DRILL-1921.7.patch Throw unsupported error message for these operators: (instead of Could not be implemented error below) INTERSECT EXCEPT UNION CROSS JOIN FULL OUTER JOIN Query failed: Query failed: Unexpected exception during fragment initialization: Node [rel#517133:Subset#3.LOGICAL.ANY([]).[]] could not be implemented; planner state: Enhacement requests in Jira for these above mentioned operators: {code} INTERSECT DRILL-1308 MINUSNo enhancement request ( not recognized in grammar) EXCEPT No enhancement request UNIONDRILL-1169 CROSS JOIN DRILL-786 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2139) Star is not expanded correctly in select distinct query
[ https://issues.apache.org/jira/browse/DRILL-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309675#comment-14309675 ] Jinfeng Ni commented on DRILL-2139: --- Probably we want to disable select distinct * in schema-less query. Star is not expanded correctly in select distinct query - Key: DRILL-2139 URL: https://issues.apache.org/jira/browse/DRILL-2139 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Fix For: 1.0.0 {code} 0: jdbc:drill:schema=dfs select distinct * from t1; ++ | * | ++ | null | ++ 1 row selected (0.14 seconds) 0: jdbc:drill:schema=dfs select distinct * from `test.json`; ++ | * | ++ | null | ++ 1 row selected (0.163 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2139) Star is not expanded correctly in select distinct query
[ https://issues.apache.org/jira/browse/DRILL-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2139: -- Fix Version/s: 1.0.0 Star is not expanded correctly in select distinct query - Key: DRILL-2139 URL: https://issues.apache.org/jira/browse/DRILL-2139 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Fix For: 1.0.0 {code} 0: jdbc:drill:schema=dfs select distinct * from t1; ++ | * | ++ | null | ++ 1 row selected (0.14 seconds) 0: jdbc:drill:schema=dfs select distinct * from `test.json`; ++ | * | ++ | null | ++ 1 row selected (0.163 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2115) Disable function CARDINALITY in grammar
[ https://issues.apache.org/jira/browse/DRILL-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2115: -- Fix Version/s: 0.9.0 Disable function CARDINALITY in grammar --- Key: DRILL-2115 URL: https://issues.apache.org/jira/browse/DRILL-2115 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Sean Hsuan-Yi Chu Priority: Minor Fix For: 0.9.0 Since we don't support multiset type, we should disable CARDINALITY function as well: {code} 0: jdbc:drill:schema=dfs select cardinality(list) from `test.json`; Query failed: Query stopped., Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [cardinality(BIGINT-REPEATED)]. Full expression: --UNKNOWN EXPRESSION--.. [ db86cf79-6083-4ad7-afa1-fac534b942bc on atsqa4-134.qa.lab:31010 ] Error: exception while executing query: Failure while executing query. (state=,code=0) 0: jdbc:drill:schema=dfs select cardinality(id) from `test.json`; Query failed: Query stopped., Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [cardinality(BIGINT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. [ 1f209cb2-4d60-4c1d-b1d0-d921f8e8a913 on atsqa4-134.qa.lab:31010 ] Error: exception while executing query: Failure while executing query. (state=,code=0) 0: jdbc:drill:schema=dfs select cardinality() from `test.json`; Query failed: SqlValidatorException: Invalid number of arguments to function 'CARDINALITY'. Was expecting 1 arguments Error: exception while executing query: Failure while executing query. (state=,code=0) 0: jdbc:drill:schema=dfs select cardinality('aa') from `test.json`; Query failed: SqlValidatorException: Cannot apply 'CARDINALITY' to arguments of type 'CARDINALITY(CHAR(2))'. Supported form(s): 'CARDINALITY(MULTISET)' 'CARDINALITY(ARRAY)' 'CARDINALITY(MAP)' Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2116) Add non-reserved keywords to non-reserved keyword list in parser
[ https://issues.apache.org/jira/browse/DRILL-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2116: -- Fix Version/s: 0.9.0 Assignee: Sean Hsuan-Yi Chu (was: Aman Sinha) Add non-reserved keywords to non-reserved keyword list in parser Key: DRILL-2116 URL: https://issues.apache.org/jira/browse/DRILL-2116 Project: Apache Drill Issue Type: Bug Components: SQL Parser Reporter: Jacques Nadeau Assignee: Sean Hsuan-Yi Chu Fix For: 0.9.0 There are a number of keywords in Drill that shouldn't be considered reserved when parsing. Calcite allows us to customize the list of un-reserved keywords and we should update the list to allow more words. Things that I've run across include value, user, left, etc. This is a very common usability problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2055) Drill should error out for Invalid json file if it has the same map key names.
[ https://issues.apache.org/jira/browse/DRILL-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2055: -- Component/s: (was: Query Planning Optimization) Storage - JSON Assignee: Parth Chandra (was: Jinfeng Ni) Drill should error out for Invalid json file if it has the same map key names. -- Key: DRILL-2055 URL: https://issues.apache.org/jira/browse/DRILL-2055 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 0.7.0 Reporter: Hao Zhu Assignee: Parth Chandra Priority: Minor For json file with same map key names: { a : x, a : y } Should we consider it invalid json format and error out? Ref: http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object#answer-23195243 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-1921) Throw unsupported error message some set operators that are not currently supported
[ https://issues.apache.org/jira/browse/DRILL-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni reassigned DRILL-1921: - Assignee: Sean Hsuan-Yi Chu (was: Jinfeng Ni) Throw unsupported error message some set operators that are not currently supported --- Key: DRILL-1921 URL: https://issues.apache.org/jira/browse/DRILL-1921 Project: Apache Drill Issue Type: Bug Components: SQL Parser Reporter: Victoria Markman Assignee: Sean Hsuan-Yi Chu Fix For: 0.8.0 Attachments: DRILL-1921.8.patch Throw unsupported error message for these operators: (instead of Could not be implemented error below) INTERSECT EXCEPT UNION CROSS JOIN Query failed: Query failed: Unexpected exception during fragment initialization: Node [rel#517133:Subset#3.LOGICAL.ANY([]).[]] could not be implemented; planner state: Enhacement requests in Jira for these above mentioned operators: {code} INTERSECT DRILL-1308 MINUSNo enhancement request ( not recognized in grammar) EXCEPT No enhancement request UNIONDRILL-1169 CROSS JOIN DRILL-786 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2054) Support `||` as concat operator for varchar inputs
[ https://issues.apache.org/jira/browse/DRILL-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304262#comment-14304262 ] Jinfeng Ni commented on DRILL-2054: --- Minor comments: 1. For Junit test, add combination of c1 || c2, cast(c1 as varchar(30)) || cast(c2 as varchar(30)), where c1/c2 is any type when it goes through Calcite/optiq, but actually a varchar column in a schemaless-table. These two cases covers cases of any || any, varchar || varchar. The test you have only tests the case of any || varchar. Support `||` as concat operator for varchar inputs Key: DRILL-2054 URL: https://issues.apache.org/jira/browse/DRILL-2054 Project: Apache Drill Issue Type: New Feature Reporter: Sean Hsuan-Yi Chu Assignee: Jinfeng Ni Fix For: Future Attachments: DRILL-2054.1.patch, DRILL-2054.2.patch, DRILL-2054ForOptiq.1.patch For example, the query: select n_nationkey || ' + ' || n_name || ' = ' as CONCAT, n_nationkey, ' + ' as PLUS, n_name from cp.`tpch/nation.parquet` should print out CONCATn_nationkey PLUSn_name 0 + ALGERIA = 0+ ALGERIA 1 + ARGENTINA = 1+ ARGENTINA 2 + BRAZIL = 2+ BRAZIL 3 + CANADA = 3+ CANADA ... ... ... (rows/records are omitted due to space limitation) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1921) Throw unsupported error message some set operators that are not currently supported
[ https://issues.apache.org/jira/browse/DRILL-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304360#comment-14304360 ] Jinfeng Ni commented on DRILL-1921: --- Couple of additional comments: 1. VisitingUnsupportedOperators:64. You had better use sqlCall.getKind == SqlKind.UNION SqlSetOperator op = (SqlSetOperator) sqlCall.getOperator(); if (! op.isAll()) { throw exception; } We should try to avoid using function/operator's name as the comparison condition, when we have other way. 2. Why call UNION as NOT-ALL in the error message? Use UNION makes more sense to me. 3. Use SqlTypeName.TINYINT.name() etc in disabledTypes. 4. DefaultSqlHandler.validateNode(), in stead of create a new VisitingUnsupportedOperators each time, try to create a put static method in that visitor and call the static method, if the visitor does not require any customization for each run. Throw unsupported error message some set operators that are not currently supported --- Key: DRILL-1921 URL: https://issues.apache.org/jira/browse/DRILL-1921 Project: Apache Drill Issue Type: Bug Components: SQL Parser Reporter: Victoria Markman Assignee: Jinfeng Ni Fix For: 0.8.0 Attachments: DRILL-1921.1.patch, DRILL-1921.2.patch, DRILL-1921.3.patch, DRILL-1921.4.patch Throw unsupported error message for these operators: (instead of Could not be implemented error below) INTERSECT EXCEPT UNION CROSS JOIN FULL OUTER JOIN Query failed: Query failed: Unexpected exception during fragment initialization: Node [rel#517133:Subset#3.LOGICAL.ANY([]).[]] could not be implemented; planner state: Enhacement requests in Jira for these above mentioned operators: {code} INTERSECT DRILL-1308 MINUSNo enhancement request ( not recognized in grammar) EXCEPT No enhancement request UNIONDRILL-1169 CROSS JOIN DRILL-786 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-1889) when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns
[ https://issues.apache.org/jira/browse/DRILL-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-1889: -- Attachment: (was: 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch.2) when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns Key: DRILL-1889 URL: https://issues.apache.org/jira/browse/DRILL-1889 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Jinfeng Ni Priority: Critical Fix For: 0.9.0 Attachments: 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch.3 git.commit.id.abbrev=9dfa4a1 Dataset : {code} { col1:1, col2:a } { col1:2, col2:b } { col1:2, col2:abc } {code} Query : {code} select * from `b.json` order by length(col2); ++++ |col1|col2| EXPR$1 | ++++ | 1 | a | 1 | | 2 | b | 1 | | 2 | abc| 3 | ++++ {code} Drill adds the length column. (EXPR$1) Not sure if this is intended behavior since postgres does not do this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1889) when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns
[ https://issues.apache.org/jira/browse/DRILL-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279086#comment-14279086 ] Jinfeng Ni commented on DRILL-1889: --- Revise the code based on the comments. Add the comments to the code. Also, rename method isRegularColumn() to isRegularColumnOrExp() to make it cleaner. Also, there is incorrect result issue for query like select *, col, *, col from t. The last col will be missing in the output. The plan for this query looks OK and the bug seems to be in execution operator (ProjectRecordBatch). Husan will open a different JIRA and submit a patch, as he has prototyped a fix for this issue. when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns Key: DRILL-1889 URL: https://issues.apache.org/jira/browse/DRILL-1889 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Jinfeng Ni Priority: Critical Fix For: 0.9.0 Attachments: 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch.3 git.commit.id.abbrev=9dfa4a1 Dataset : {code} { col1:1, col2:a } { col1:2, col2:b } { col1:2, col2:abc } {code} Query : {code} select * from `b.json` order by length(col2); ++++ |col1|col2| EXPR$1 | ++++ | 1 | a | 1 | | 2 | b | 1 | | 2 | abc| 3 | ++++ {code} Drill adds the length column. (EXPR$1) Not sure if this is intended behavior since postgres does not do this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1889) when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns
[ https://issues.apache.org/jira/browse/DRILL-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277756#comment-14277756 ] Jinfeng Ni commented on DRILL-1889: --- The cause of this wrong result for select * query is that Drill's plan currently does not distinguish the columns expanded from the * column, and the expressions referenced in the query. DRILL-931 introduces the concept of prefix table columns for join query when it references * column. Turns out that for a single table having * column, we also have to add the prefix for columns expanded from * column, so that the execution time would be able to differentiate them. Per discussion with Aman/Hsuan, the main idea of the fix is as follows: 1. Add prefix to table columns, when the query references at least one * column, plus other regular column / expressions. The prefix would be removed right before the results are outputted to SCREEN operator. 2. Star column will not subsume a regular column. 3. Regular column reference will not be prefixed; so that it would be able to distinguished from prefixed column from star column expansion. We have to keep those regular columns, since the parent operator might have reference to them. 4. Star column will subsume another star column, if it comes from the same table with same prefix. [~amansinha100], could you please review the patch? when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns Key: DRILL-1889 URL: https://issues.apache.org/jira/browse/DRILL-1889 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Jinfeng Ni Priority: Critical Fix For: 0.9.0 git.commit.id.abbrev=9dfa4a1 Dataset : {code} { col1:1, col2:a } { col1:2, col2:b } { col1:2, col2:abc } {code} Query : {code} select * from `b.json` order by length(col2); ++++ |col1|col2| EXPR$1 | ++++ | 1 | a | 1 | | 2 | b | 1 | | 2 | abc| 3 | ++++ {code} Drill adds the length column. (EXPR$1) Not sure if this is intended behavior since postgres does not do this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-1889) when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns
[ https://issues.apache.org/jira/browse/DRILL-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-1889: -- Attachment: (was: 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch) when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns Key: DRILL-1889 URL: https://issues.apache.org/jira/browse/DRILL-1889 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Jinfeng Ni Priority: Critical Fix For: 0.9.0 Attachments: 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch.2 git.commit.id.abbrev=9dfa4a1 Dataset : {code} { col1:1, col2:a } { col1:2, col2:b } { col1:2, col2:abc } {code} Query : {code} select * from `b.json` order by length(col2); ++++ |col1|col2| EXPR$1 | ++++ | 1 | a | 1 | | 2 | b | 1 | | 2 | abc| 3 | ++++ {code} Drill adds the length column. (EXPR$1) Not sure if this is intended behavior since postgres does not do this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-1889) when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns
[ https://issues.apache.org/jira/browse/DRILL-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-1889: -- Attachment: 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch.2 Made small change to the patch. when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns Key: DRILL-1889 URL: https://issues.apache.org/jira/browse/DRILL-1889 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Jinfeng Ni Priority: Critical Fix For: 0.9.0 Attachments: 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch.2 git.commit.id.abbrev=9dfa4a1 Dataset : {code} { col1:1, col2:a } { col1:2, col2:b } { col1:2, col2:abc } {code} Query : {code} select * from `b.json` order by length(col2); ++++ |col1|col2| EXPR$1 | ++++ | 1 | a | 1 | | 2 | b | 1 | | 2 | abc| 3 | ++++ {code} Drill adds the length column. (EXPR$1) Not sure if this is intended behavior since postgres does not do this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-1889) when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns
[ https://issues.apache.org/jira/browse/DRILL-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-1889: -- Attachment: 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns Key: DRILL-1889 URL: https://issues.apache.org/jira/browse/DRILL-1889 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Jinfeng Ni Priority: Critical Fix For: 0.9.0 Attachments: 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch git.commit.id.abbrev=9dfa4a1 Dataset : {code} { col1:1, col2:a } { col1:2, col2:b } { col1:2, col2:abc } {code} Query : {code} select * from `b.json` order by length(col2); ++++ |col1|col2| EXPR$1 | ++++ | 1 | a | 1 | | 2 | b | 1 | | 2 | abc| 3 | ++++ {code} Drill adds the length column. (EXPR$1) Not sure if this is intended behavior since postgres does not do this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1500) Partition filtering might lead to an unnecessary column in the result set.
[ https://issues.apache.org/jira/browse/DRILL-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279390#comment-14279390 ] Jinfeng Ni commented on DRILL-1500: --- With patch of DRILL-1889, the incorrect result problem is fixed. {code} select * from dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet` where dir0=1994 and dir1='Q1' order by dir0 limit 1; ++++++-++-+---++--+ |dir0|dir1| o_clerk | o_comment | o_custkey | o_orderdate | o_orderkey | o_orderpriority | o_orderstatus | o_shippriority | o_totalprice | ++++++-++-+---++--+ | 1994 | Q1 | Clerk#00743 | y pending requests integrate | 1292 | 1994-01-20 | 66 | 5-LOW | F | 0 | 104190.66| ++++++-++-+---++--+ 1 row selected (2.066 seconds) {code} Partition filtering might lead to an unnecessary column in the result set. --- Key: DRILL-1500 URL: https://issues.apache.org/jira/browse/DRILL-1500 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Aman Sinha Priority: Critical Fix For: 0.8.0 Attachments: 0001-DRILL-1500-Partial-fix-Don-t-overwrite-top-level-Pro.patch When partition filtering is used together with select * query, Drill might return the partitioning column duplicately. Q1 : {code} select * from dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet` where dir0=1994 and dir1='Q1' order by dir0 limit 1; +++++++-++-+---++--+ | dir00|dir0|dir1| o_clerk | o_comment | o_custkey | o_orderdate | o_orderkey | o_orderpriority | o_orderstatus | o_shippriority | o_totalprice | +++++++-++-+---++--+ | 1994 | 1994 | Q1 | Clerk#00743 | y pending requests integrate | 1292 | 1994-01-20 | 66 | 5-LOW | F | 0 | 104190.66| +++++++-++-+---++--+ 1 row selected (2.097 seconds) {code} We can see that column dir0 appeared twice in the result set. In comparison, here is the query without partition filtering and the query result: Q2: {code} select * from dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet` order by dir0 limit 1; ++++++-++-+---++--+ |dir0|dir1| o_clerk | o_comment | o_custkey | o_orderdate | o_orderkey | o_orderpriority | o_orderstatus | o_shippriority | o_totalprice | ++++++-++-+---++--+ | 1994 | Q1 | Clerk#00743 | y pending requests integrate | 1292 | 1994-01-20 | 66 | 5-LOW | F | 0 | 104190.66| ++++++-++-+---++--+ 1 row selected (0.761 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2030) CTAS with SELECT * and expression creates column names with prefix
[ https://issues.apache.org/jira/browse/DRILL-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2030: -- Attachment: 0001-DRILL-2030-fix-column-prefix-for-CTAS-when-query-has.patch CTAS with SELECT * and expression creates column names with prefix -- Key: DRILL-2030 URL: https://issues.apache.org/jira/browse/DRILL-2030 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.7.0 Reporter: Aman Sinha Assignee: Jinfeng Ni Attachments: 0001-DRILL-2030-fix-column-prefix-for-CTAS-when-query-has.patch Doing a CTAS with the star column and an expression creates columns that contain the table prefix 'T||' . Looks like the top project did not strip out the prefix. {code} 0: jdbc:drill:zk=local create table region4 as select *, r_regionkey + 1 from cp.`tpch/region.parquet`; ++---+ | Fragment | Number of records written | ++---+ | 0_0| 5 | ++---+ 0: jdbc:drill:zk=local select * from region4; +-++---++ | T2¦¦r_regionkey | T2¦¦r_name | T2¦¦r_comment | EXPR$1 | +-++---++ {code} The column names are correct if there is a regular column with the star column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1934) Failure to plan queries containing scalar subqueries without correlation
[ https://issues.apache.org/jira/browse/DRILL-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287938#comment-14287938 ] Jinfeng Ni commented on DRILL-1934: --- The cause for the failiing cases is in the logic of Calcite's decorrelator. We probably will revisit this once Drill rebases it's own Optiq/Calcite. Failure to plan queries containing scalar subqueries without correlation Key: DRILL-1934 URL: https://issues.apache.org/jira/browse/DRILL-1934 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni {code} #Fri Jan 02 21:20:47 EST 2015 git.commit.id.abbrev=b491cdb {code} -- Failing queries {code} select * from cp.`tpch/nation.parquet` where n_regionkey ( select min(r_regionkey) from cp.`tpch/region.parquet`); select * from cp.`tpch/nation.parquet` where n_regionkey = ( select avg(r_regionkey) from cp.`tpch/region.parquet`); {code} -- Works with correlation {code} select * from cp.`tpch/nation.parquet` n where n.n_regionkey = ( select min(r.r_regionkey) from cp.`tpch/region.parquet` r where n.n_regionkey = r.r_regionkey); {code} All three queries above have the very similar logical planis produced by Calcite: {code} Query failed: Query failed: Unexpected exception during fragment initialization: Node [rel#14825:Subset#6.LOGICAL.ANY([]).[]] could not be implemented; planner state: Root: rel#14825:Subset#6.LOGICAL.ANY([]).[] Original rel: AbstractConverter(subset=[rel#14825:Subset#6.LOGICAL.ANY([]).[]], convention=[LOGICAL], DrillDistributionTraitDef=[ANY([])], sort=[[]]): rowcount = 1.7976931348623157E308, cumulative cost = {inf}, id = 14826 ProjectRel(subset=[rel#14824:Subset#6.NONE.ANY([]).[]], *=[$0]): rowcount = 1.7976931348623157E308, cumulative cost = {1.7976931348623157E308 rows, 1.7976931348623157E308 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14823 FilterRel(subset=[rel#14822:Subset#5.NONE.ANY([]).[]], condition=[=($1, $2)]): rowcount = 2.6965397022934733E307, cumulative cost = {2.6965397022934733E307 rows, 1.7976931348623157E308 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14821 JoinRel(subset=[rel#14820:Subset#4.NONE.ANY([]).[]], condition=[true], joinType=[left]): rowcount = 1.7976931348623157E308, cumulative cost = {1.7976931348623157E308 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14819 EnumerableTableAccessRel(subset=[rel#14813:Subset#0.ENUMERABLE.ANY([]).[]], table=[[cp, tpch/nation.parquet]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14794 AggregateRel(subset=[rel#14818:Subset#3.NONE.ANY([]).[]], group=[{}], EXPR$0=[AVG($0)]): rowcount = 1.7976931348623158E307, cumulative cost = {1.7976931348623158E307 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14817 ProjectRel(subset=[rel#14816:Subset#2.NONE.ANY([]).[]], r_regionkey=[$1]): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14815 EnumerableTableAccessRel(subset=[rel#14814:Subset#1.ENUMERABLE.ANY([]).[]], table=[[cp, tpch/region.parquet]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14795 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1655) Cannot plan simple join + group by + order + limit query
[ https://issues.apache.org/jira/browse/DRILL-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288580#comment-14288580 ] Jinfeng Ni commented on DRILL-1655: --- I could re-produce this same issue with a little bit simplied query: {code} select u.name, count(*) as number, count(u.average_stars) stars from mongo.yelp.users u, dfs.tmp.namegender n where u.name= n.name group by u.name order by stars desc limit 1; {code} However, a similar query to join one parquet file with the view defined on a CSV file works fine ( The query itself is not meanful). {code} select n.n_name, count(*) as num, avg(n.n_nationkey) as star from cp.`tpch/nation.parquet` n, dfs.tmp.namegender n2 where n.n_name = n2.name group by n.n_name order by star desc limit 1; ++++ | n_name |num |star| ++++ ++++ No rows selected (0.67 seconds){code} {code} From the error trace, the cause of this issue might be related to mongodb. Cannot plan simple join + group by + order + limit query - Key: DRILL-1655 URL: https://issues.apache.org/jira/browse/DRILL-1655 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Tomer Shiran Assignee: Jinfeng Ni Fix For: 0.8.0 The first query here works fine. As soon as I add another field (avg(u.average_stars) as stars) and order by on that field, it no longer works. {code} 0: jdbc:drill:zk=local select u.name, n.gender, count(*) as number from mongo.yelp.users u, dfs.mydata.namegender n where u.name = n.name group by u.name, n.gender order by number desc limit 1; ++++ |name| gender | number | ++++ | David | Male | 2453 | ++++ 1 row selected (6.008 seconds) 0: jdbc:drill:zk=local select u.name, n.gender, count(*) as number, avg(u.average_stars) stars from mongo.yelp.users u, dfs.mydata.namegender n where u.name = n.name group by u.name, n.gender order by stars desc limit 1; Query failed: Failure while parsing sql. Node [rel#38935:Subset#7.LOGICAL.ANY([]).[3 DESC]] could not be implemented; planner state: Root: rel#38935:Subset#7.LOGICAL.ANY([]).[3 DESC] Original rel: AbstractConverter(subset=[rel#38935:Subset#7.LOGICAL.ANY([]).[3 DESC]], convention=[LOGICAL], DrillDistributionTraitDef=[ANY([])], sort=[[3 DESC]]): rowcount = 1.7976931348623157E308, cumulative cost = {inf}, id = 38936 SortRel(subset=[rel#38934:Subset#7.NONE.ANY([]).[3 DESC]], sort0=[$3], dir0=[DESC], fetch=[1]): rowcount = 1.7976931348623157E308, cumulative cost = {Infinity rows, 1.7976931348623157E308 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 38933 AggregateRel(subset=[rel#38932:Subset#6.NONE.ANY([]).[]], group=[{0, 1}], number=[COUNT()], stars=[AVG($2)]): rowcount = 1.7976931348623158E307, cumulative cost = {1.7976931348623158E307 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 38931 ProjectRel(subset=[rel#38930:Subset#5.NONE.ANY([]).[]], name=[$1], gender=[$3], name0=[$2]): rowcount Error: exception while executing query: Failure while trying to get next result batch. (state=,code=0) = 1.7976931348623157E308, cumulative cost = {1.7976931348623157E308 rows, Infinity cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 38929 FilterRel(subset=[rel#38928:Subset#4.NONE.ANY([]).[]], condition=[=($1, $2)]): rowcount = 2.6965397022934733E307, cumulative cost = {2.6965397022934733E307 rows, 1.7976931348623157E308 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 38927 JoinRel(subset=[rel#38926:Subset#3.NONE.ANY([]).[]], condition=[true], joinType=[inner]): rowcount = 1.7976931348623157E308, cumulative cost = {1.7976931348623157E308 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 38925 EnumerableTableAccessRel(subset=[rel#38921:Subset#0.ENUMERABLE.ANY([]).[]], table=[[mongo, yelp, users]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 38897 ProjectRel(subset=[rel#38924:Subset#2.NONE.ANY([]).[]], name=[ITEM($1, 0)], gender=[ITEM($1, 4)]): rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 38923 EnumerableTableAccessRel(subset=[rel#38922:Subset#1.ENUMERABLE.ANY([]).[]], table=[[dfs, mydata, names.csv]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 38900 Sets: Set#0, type: (DrillRecordRow[*, name, average_stars]) rel#38921:Subset#0.ENUMERABLE.ANY([]).[], best=rel#38897,
[jira] [Commented] (DRILL-2041) Throw unsupported error message on GROUP BY/ORDER BY array/map types
[ https://issues.apache.org/jira/browse/DRILL-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284965#comment-14284965 ] Jinfeng Ni commented on DRILL-2041: --- For this particular issue, I'm going to make the error message easier to understand for end user, so that they know it's not allowed to compare a map/array or repeated scalar type, when they put such types of expression or column in GROUP BY, ORDER BY, or JOIN condition. I think there is an agreement that effort will be made to improve the error/exception reporting issue in general. Whether this should be thrown ad RpcException or as UnSupportedOperationException will be addressed in that effort ( probably in operators component). Throw unsupported error message on GROUP BY/ORDER BY array/map types Key: DRILL-2041 URL: https://issues.apache.org/jira/browse/DRILL-2041 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni test.json {code} { CustomerId: 100, cityId: 10 , phoneNumber: { areaCode: 622, number: 1567845}, years: [1990, 1993, 1998, 2008]}{code} {code} 0: jdbc:drill:schema=dfs select years from `test.json` order by years; ++ | years| ++ Query failed: RemoteRpcException: Failure while running fragment., Failure finding function that runtime code generation expected. Signature: compare_to( VARCHAR:REPEATEDVARCHAR:REPEATED, ) returns INT:REQUIRED [ 5e4a9574-1f9b-4753-ab1d-ef056639281f on atsqa4-133.qa.lab:31010 ] [ 5e4a9574-1f9b-4753-ab1d-ef056639281f on atsqa4-133.qa.lab:31010 ] java.lang.RuntimeException: java.sql.SQLException: Failure while executing query. at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) at sqlline.SqlLine.print(SqlLine.java:1809) at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) at sqlline.SqlLine.dispatch(SqlLine.java:889) at sqlline.SqlLine.begin(SqlLine.java:763) at sqlline.SqlLine.start(SqlLine.java:498) at sqlline.SqlLine.main(SqlLine.java:460) 0: jdbc:drill:schema=dfs select years from `test.json` group by years; Query failed: RemoteRpcException: Failure while running fragment., Failure finding function that runtime code generation expected. Signature: compare_to( VARCHAR:REPEATEDVARCHAR:REPEATED, ) returns INT:REQUIRED [ be1d64d5-bb03-4d5d-87b0-5cd8aa32f007 on atsqa4-133.qa.lab:31010 ] [ be1d64d5-bb03-4d5d-87b0-5cd8aa32f007 on atsqa4-133.qa.lab:31010 ] Error: exception while executing query: Failure while executing query. (state=,code=0) {code} {code} 0: jdbc:drill:schema=dfs select * from `test.json` order by phoneNumber; +++-++ | CustomerId | cityId | phoneNumber | years| +++-++ Query failed: RemoteRpcException: Failure while running fragment., Failure finding function that runtime code generation expected. Signature: compare_to( MAP:REQUIREDMAP:REQUIRED, ) returns INT:REQUIRED [ cbef0629-b393-4fbd-9c16-941ec8835d36 on atsqa4-133.qa.lab:31010 ] [ cbef0629-b393-4fbd-9c16-941ec8835d36 on atsqa4-133.qa.lab:31010 ] java.lang.RuntimeException: java.sql.SQLException: Failure while executing query. at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) at sqlline.SqlLine.print(SqlLine.java:1809) at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) at sqlline.SqlLine.dispatch(SqlLine.java:889) at sqlline.SqlLine.begin(SqlLine.java:763) at sqlline.SqlLine.start(SqlLine.java:498) at sqlline.SqlLine.main(SqlLine.java:460) 0: jdbc:drill:schema=dfs select phoneNumber from `test.json` group by phoneNumber; Query failed: RemoteRpcException: Failure while running fragment., Failure finding function that runtime code generation expected. Signature: compare_to( MAP:REQUIREDMAP:REQUIRED, ) returns INT:REQUIRED [ fbd3ca20-5d5a-4831-a066-a2d780547e66 on atsqa4-133.qa.lab:31010 ] [ fbd3ca20-5d5a-4831-a066-a2d780547e66 on atsqa4-133.qa.lab:31010 ] Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2030) CTAS with SELECT * and expression creates column names with prefix
[ https://issues.apache.org/jira/browse/DRILL-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285583#comment-14285583 ] Jinfeng Ni commented on DRILL-2030: --- Committed as [a8036d2] CTAS with SELECT * and expression creates column names with prefix -- Key: DRILL-2030 URL: https://issues.apache.org/jira/browse/DRILL-2030 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.7.0 Reporter: Aman Sinha Assignee: Jinfeng Ni Fix For: 0.8.0 Attachments: 0001-DRILL-2030-fix-column-prefix-for-CTAS-when-query-has.patch Doing a CTAS with the star column and an expression creates columns that contain the table prefix 'T||' . Looks like the top project did not strip out the prefix. {code} 0: jdbc:drill:zk=local create table region4 as select *, r_regionkey + 1 from cp.`tpch/region.parquet`; ++---+ | Fragment | Number of records written | ++---+ | 0_0| 5 | ++---+ 0: jdbc:drill:zk=local select * from region4; +-++---++ | T2¦¦r_regionkey | T2¦¦r_name | T2¦¦r_comment | EXPR$1 | +-++---++ {code} The column names are correct if there is a regular column with the star column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1563) IndexOutOfBoundException during change traits when aggregating over subquery with order-by
[ https://issues.apache.org/jira/browse/DRILL-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288108#comment-14288108 ] Jinfeng Ni commented on DRILL-1563: --- This seems to be have been fixed. I run a simliar query on today's master branch, and got correct results. {code} 0: jdbc:drill:zk=local select count(*) from (select o_orderkey, o_custkey from cp.`tpch/orders.parquet` order by o_orderkey, o_custkey); ++ | EXPR$0 | ++ | 15000 | ++ 1 row selected (0.491 seconds) 0: jdbc:drill:zk=local select * from sys.version; +++-+-++ | commit_id | commit_message | commit_time | build_email | build_time | +++-+-++ | 8d1e1affe86a5adca3bc17eeaf7520f0d379a393 | DRILL-1932: Query fails if the first record batch does not contain any record | 20.01.2015 @ 23:02:03 PST | j...@maprtech.com | 22.01.2015 @ 11:19:02 PST | +++-+-++ 1 row selected (0.108 seconds) {code} Can you re-run the original query? If the issue is still there, please re-open it. IndexOutOfBoundException during change traits when aggregating over subquery with order-by --- Key: DRILL-1563 URL: https://issues.apache.org/jira/browse/DRILL-1563 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.6.0 Reporter: Aman Sinha Assignee: Jinfeng Ni Fix For: 0.8.0 On TPCH SF1, doing a plain aggregation on top of a subquery that does order-by of 2 or more columns gives IOBE. This happens during planning, when processing traits. See below. 0: jdbc:drill:zk=local select count(*) from (select o_orderkey, o_custkey from orders order by o_orderkey, o_custkey); Query failed: Failure while setting up Foreman. index (1) must be less than size (1) Query failed: Failure while setting up Foreman. index (1) must be less than size (1) [4f2cc753-655e-47a1-a8f0-38e62eb79a2b] Here's the stack trace: java.lang.IndexOutOfBoundsException: index (1) must be less than size (1) com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305) ~[guava-14.0.1.jar:na] com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284) ~[guava-14.0.1.jar:na] com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:45) ~[guava-14.0.1.jar:na] org.eigenbase.rex.RexBuilder.makeInputRef(RexBuilder.java:764) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.rel.SortRel.init(SortRel.java:94) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.rel.SortRel.init(SortRel.java:59) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.rel.RelCollationTraitDef.convert(RelCollationTraitDef.java:77) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.rel.RelCollationTraitDef.convert(RelCollationTraitDef.java:36) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.relopt.volcano.VolcanoPlanner.changeTraitsUsingConverters(VolcanoPlanner.java:1010) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.relopt.volcano.VolcanoPlanner.changeTraitsUsingConverters(VolcanoPlanner.java:1102) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.relopt.volcano.AbstractConverter$ExpandConversionRule.onMatch(AbstractConverter.java:107) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661) ~[optiq-core-0.9-drill-r4.jar:na] net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) ~[optiq-core-0.9-drill-r4.jar:na] net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:273) ~[optiq-core-0.9-drill-r4.jar:na] org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:165) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating] org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:134) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating] org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:132) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating] org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:425) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating] org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:219)
[jira] [Resolved] (DRILL-1563) IndexOutOfBoundException during change traits when aggregating over subquery with order-by
[ https://issues.apache.org/jira/browse/DRILL-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni resolved DRILL-1563. --- Resolution: Fixed IndexOutOfBoundException during change traits when aggregating over subquery with order-by --- Key: DRILL-1563 URL: https://issues.apache.org/jira/browse/DRILL-1563 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.6.0 Reporter: Aman Sinha Assignee: Jinfeng Ni Fix For: 0.8.0 On TPCH SF1, doing a plain aggregation on top of a subquery that does order-by of 2 or more columns gives IOBE. This happens during planning, when processing traits. See below. 0: jdbc:drill:zk=local select count(*) from (select o_orderkey, o_custkey from orders order by o_orderkey, o_custkey); Query failed: Failure while setting up Foreman. index (1) must be less than size (1) Query failed: Failure while setting up Foreman. index (1) must be less than size (1) [4f2cc753-655e-47a1-a8f0-38e62eb79a2b] Here's the stack trace: java.lang.IndexOutOfBoundsException: index (1) must be less than size (1) com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305) ~[guava-14.0.1.jar:na] com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284) ~[guava-14.0.1.jar:na] com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:45) ~[guava-14.0.1.jar:na] org.eigenbase.rex.RexBuilder.makeInputRef(RexBuilder.java:764) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.rel.SortRel.init(SortRel.java:94) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.rel.SortRel.init(SortRel.java:59) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.rel.RelCollationTraitDef.convert(RelCollationTraitDef.java:77) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.rel.RelCollationTraitDef.convert(RelCollationTraitDef.java:36) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.relopt.volcano.VolcanoPlanner.changeTraitsUsingConverters(VolcanoPlanner.java:1010) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.relopt.volcano.VolcanoPlanner.changeTraitsUsingConverters(VolcanoPlanner.java:1102) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.relopt.volcano.AbstractConverter$ExpandConversionRule.onMatch(AbstractConverter.java:107) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223) ~[optiq-core-0.9-drill-r4.jar:na] org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661) ~[optiq-core-0.9-drill-r4.jar:na] net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) ~[optiq-core-0.9-drill-r4.jar:na] net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:273) ~[optiq-core-0.9-drill-r4.jar:na] org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:165) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating] org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:134) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating] org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:132) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating] org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:425) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating] org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:219) ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating] org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:250) [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45] java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-2199) Wrong results while using case expression with decimal data type
[ https://issues.apache.org/jira/browse/DRILL-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni reassigned DRILL-2199: - Assignee: Mehant Baid (was: Jinfeng Ni) Wrong results while using case expression with decimal data type Key: DRILL-2199 URL: https://issues.apache.org/jira/browse/DRILL-2199 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Attachments: DRILL-2199.patch Consider the below SQL statement: select case when true then cast(employee_id as decimal(15, 5)) else cast('0.0' as decimal(5, 2)) end from cp.`employee.json` where employee_id = 1 This should return 1.0 instead Drill returns 10. In the case statement the first expression (condition expression) is nullable and the else expression is non-nullable. For our codegen to work correctly we make the non-nullable expression also nullable using convertToNullableXHolder() functions. These functions however do not preserve the scale and precision of the input, hence the output we get is the unscaled value of the decimal. Fix is to make the converToNullableDecimalHolder() functions preserve the scale and precision by returning the correct major type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2054) Support `||` as concat operator for varchar inputs
[ https://issues.apache.org/jira/browse/DRILL-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326400#comment-14326400 ] Jinfeng Ni commented on DRILL-2054: --- LGTM. +1 Support `||` as concat operator for varchar inputs Key: DRILL-2054 URL: https://issues.apache.org/jira/browse/DRILL-2054 Project: Apache Drill Issue Type: New Feature Reporter: Sean Hsuan-Yi Chu Assignee: Jinfeng Ni Fix For: Future Attachments: DRILL-2054.1.patch For example, the query: select n_nationkey || ' + ' || n_name || ' = ' as CONCAT, n_nationkey, ' + ' as PLUS, n_name from cp.`tpch/nation.parquet` should print out CONCATn_nationkey PLUSn_name 0 + ALGERIA = 0+ ALGERIA 1 + ARGENTINA = 1+ ARGENTINA 2 + BRAZIL = 2+ BRAZIL 3 + CANADA = 3+ CANADA ... ... ... (rows/records are omitted due to space limitation) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-2268) Applying flatten after a join fails with IOBE
[ https://issues.apache.org/jira/browse/DRILL-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni reassigned DRILL-2268: - Assignee: Jinfeng Ni (was: Jason Altekruse) Applying flatten after a join fails with IOBE -- Key: DRILL-2268 URL: https://issues.apache.org/jira/browse/DRILL-2268 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Reporter: Rahul Challapalli Assignee: Jinfeng Ni Priority: Critical git.commit.id.abbrev=6676f2d Data Set : {code} { uid:1, lst_lst : [[1,2],[3,4]] } { uid:2, lst_lst : [[1,2],[3,4]] } {code} The below query fails : {code} select flatten(t1.lst_lst) from `temp.json` t1 inner join `temp.json` t2 on t1.uid=t2.uid; Query failed: RemoteRpcException: Failure while running fragment., index: -4, length: 4 (expected: range(0, 16384)) [ e32cd0c6-a84a-4bbb-9812-bc0f7de17a68 on qa-node190.qa.lab:31010 ] [ e32cd0c6-a84a-4bbb-9812-bc0f7de17a68 on qa-node190.qa.lab:31010 ] {code} Let me know if you have any questions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2269) Provide default implementation for getting cost of evaluating an expression in base class of LogicalExpression, in stead of throw Exception
Jinfeng Ni created DRILL-2269: - Summary: Provide default implementation for getting cost of evaluating an expression in base class of LogicalExpression, in stead of throw Exception Key: DRILL-2269 URL: https://issues.apache.org/jira/browse/DRILL-2269 Project: Apache Drill Issue Type: Bug Reporter: Jinfeng Ni Today, the abstract class of LogicalExpression will throw Exception for two methods related to estimate the cost of evaluating an expression, in the hope that sub classes will override those two methods. However, if someone forgets to override, compiler and mvn build process will still be successful, yet will hit Exception in execution-time, complaining the sub class does not implement these methods. As Drill's adding more sub classes of LogicalExpression to the code base, it's more likely for someone to forget to override those two methods, making it error-prone to just throw Exception in the base class. In stead, we should provide a default implementation for those two methods in the base class. Certain sub classes will override to change the behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-2269) Provide default implementation for getting cost of evaluating an expression in base class of LogicalExpression, in stead of throw Exception
[ https://issues.apache.org/jira/browse/DRILL-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni reassigned DRILL-2269: - Assignee: Jinfeng Ni Provide default implementation for getting cost of evaluating an expression in base class of LogicalExpression, in stead of throw Exception Key: DRILL-2269 URL: https://issues.apache.org/jira/browse/DRILL-2269 Project: Apache Drill Issue Type: Bug Reporter: Jinfeng Ni Assignee: Jinfeng Ni Today, the abstract class of LogicalExpression will throw Exception for two methods related to estimate the cost of evaluating an expression, in the hope that sub classes will override those two methods. However, if someone forgets to override, compiler and mvn build process will still be successful, yet will hit Exception in execution-time, complaining the sub class does not implement these methods. As Drill's adding more sub classes of LogicalExpression to the code base, it's more likely for someone to forget to override those two methods, making it error-prone to just throw Exception in the base class. In stead, we should provide a default implementation for those two methods in the base class. Certain sub classes will override to change the behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2168) Wrong result on grouping by expression involving CONCAT function
[ https://issues.apache.org/jira/browse/DRILL-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318783#comment-14318783 ] Jinfeng Ni commented on DRILL-2168: --- Uploaded the patches. The patches include a unit test case in Drill, and fix in Optiq/Calcite. Part of fix in Optiq/Calcite is submitted to Calcite master as a pull request. [~amansinha100], can you please review the patches? Thanks. Wrong result on grouping by expression involving CONCAT function Key: DRILL-2168 URL: https://issues.apache.org/jira/browse/DRILL-2168 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Critical Fix For: 0.9.0 Attachments: 0001-CALCITE-593-SqlValidator-in-Frameworks-should-by-def.patch, 0001-DRILL-2168-Fix-incorrect-query-result-issue-when-gro.patch, 0002-Make-sure-expression-in-the-select-list-is-expanded-.patch, 0003-Bump-to-version-r18.1.patch, 0_0_0.parquet {code} 0: jdbc:drill:schema=dfs select * from test; +++ | a1 | b1 | +++ | 51237400 | 2014-03-16 03:55:21.0 | | -691523338 | 2014-02-13 15:47:22.0 | | -1843395360 | 2014-01-14 03:31:27.0 | | 1095015454 | 2014-09-27 17:35:21.0 | | -48817354 | 2014-01-17 16:14:25.0 | | 59892266 | 2014-04-23 05:08:34.0 | | -827965492 | 2014-06-21 07:29:12.0 | | -1281245980 | 2014-10-25 15:49:46.0 | | -1778510302 | 2014-07-31 04:11:53.0 | | 1346460386 | 2014-07-31 04:11:53.0 | +++ 10 rows selected (0.072 seconds) 0: jdbc:drill:schema=dfs select . . . . . . . . . . . . concat(cast(extract(day from b1) as varchar(10)), '-', cast(extract(month from b1) as varchar(10)),'-',cast(extract(year from b1) as varchar(10))), . . . . . . . . . . . . sum(a1) . . . . . . . . . . . . from . . . . . . . . . . . . test . . . . . . . . . . . . group by . . . . . . . . . . . . concat(cast(extract(day from b1) as varchar(10)), '-', cast(extract(month from b1) as varchar(10)),'-',cast(extract(year from b1) as varchar(10))); +++ | EXPR$0 | EXPR$1 | +++ | 1-1-1970 | 51237400 | | 23-12-1969 | -691523338 | | 10-12-1969 | -1843395360 | | 13-1-1970 | 1095015454 | | 31-12-1969 | -48817354 | | 1-1-1970 | 59892266 | | 22-12-1969 | -827965492 | | 17-12-1969 | -1281245980 | | 26-12-1969 | -432049916 | +++ 9 rows selected (0.103 seconds) {code} Query plan: {code} 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) 00-02Project(EXPR$0=[concat(CAST(EXTRACT(FLAG(DAY), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(MONTH), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], EXPR$1=[$1]) 00-03 HashAgg(group=[{0}], EXPR$1=[SUM($1)]) 00-04Project($f0=[CONCAT(CAST(EXTRACT(FLAG(DAY), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(MONTH), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], a1=[$1]) 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/test]], selectionRoot=/aggregation/test, numFiles=1, columns=[`b1`, `a1`]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2168) Wrong result on grouping by expression involving CONCAT function
[ https://issues.apache.org/jira/browse/DRILL-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2168: -- Attachment: 0003-Bump-to-version-r18.1.patch 0002-Make-sure-expression-in-the-select-list-is-expanded-.patch 0001-CALCITE-593-SqlValidator-in-Frameworks-should-by-def.patch 0001-DRILL-2168-Fix-incorrect-query-result-issue-when-gro.patch Wrong result on grouping by expression involving CONCAT function Key: DRILL-2168 URL: https://issues.apache.org/jira/browse/DRILL-2168 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Critical Fix For: 0.9.0 Attachments: 0001-CALCITE-593-SqlValidator-in-Frameworks-should-by-def.patch, 0001-DRILL-2168-Fix-incorrect-query-result-issue-when-gro.patch, 0002-Make-sure-expression-in-the-select-list-is-expanded-.patch, 0003-Bump-to-version-r18.1.patch, 0_0_0.parquet {code} 0: jdbc:drill:schema=dfs select * from test; +++ | a1 | b1 | +++ | 51237400 | 2014-03-16 03:55:21.0 | | -691523338 | 2014-02-13 15:47:22.0 | | -1843395360 | 2014-01-14 03:31:27.0 | | 1095015454 | 2014-09-27 17:35:21.0 | | -48817354 | 2014-01-17 16:14:25.0 | | 59892266 | 2014-04-23 05:08:34.0 | | -827965492 | 2014-06-21 07:29:12.0 | | -1281245980 | 2014-10-25 15:49:46.0 | | -1778510302 | 2014-07-31 04:11:53.0 | | 1346460386 | 2014-07-31 04:11:53.0 | +++ 10 rows selected (0.072 seconds) 0: jdbc:drill:schema=dfs select . . . . . . . . . . . . concat(cast(extract(day from b1) as varchar(10)), '-', cast(extract(month from b1) as varchar(10)),'-',cast(extract(year from b1) as varchar(10))), . . . . . . . . . . . . sum(a1) . . . . . . . . . . . . from . . . . . . . . . . . . test . . . . . . . . . . . . group by . . . . . . . . . . . . concat(cast(extract(day from b1) as varchar(10)), '-', cast(extract(month from b1) as varchar(10)),'-',cast(extract(year from b1) as varchar(10))); +++ | EXPR$0 | EXPR$1 | +++ | 1-1-1970 | 51237400 | | 23-12-1969 | -691523338 | | 10-12-1969 | -1843395360 | | 13-1-1970 | 1095015454 | | 31-12-1969 | -48817354 | | 1-1-1970 | 59892266 | | 22-12-1969 | -827965492 | | 17-12-1969 | -1281245980 | | 26-12-1969 | -432049916 | +++ 9 rows selected (0.103 seconds) {code} Query plan: {code} 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) 00-02Project(EXPR$0=[concat(CAST(EXTRACT(FLAG(DAY), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(MONTH), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], EXPR$1=[$1]) 00-03 HashAgg(group=[{0}], EXPR$1=[SUM($1)]) 00-04Project($f0=[CONCAT(CAST(EXTRACT(FLAG(DAY), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(MONTH), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], a1=[$1]) 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/test]], selectionRoot=/aggregation/test, numFiles=1, columns=[`b1`, `a1`]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2168) Wrong result on grouping by expression involving CONCAT function
[ https://issues.apache.org/jira/browse/DRILL-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318789#comment-14318789 ] Jinfeng Ni commented on DRILL-2168: --- Some additional comments: 1. The incorrect plan / query results only happens when the query uses drill build-in functions, or drill UDF. It will not happen to Optiq/Calcite built-in function. Part of the reason is the function resolution logic for drill functions and Optiq/Calcite built-in functions happen at different stages. 2. If the query uses upper-case in both the group by expressions and select expression, then the query should get correct plan / query results ; indicating the issue was caused by the function naming resolution / validation / conversion logic. Wrong result on grouping by expression involving CONCAT function Key: DRILL-2168 URL: https://issues.apache.org/jira/browse/DRILL-2168 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Critical Fix For: 0.9.0 Attachments: 0001-CALCITE-593-SqlValidator-in-Frameworks-should-by-def.patch, 0001-DRILL-2168-Fix-incorrect-query-result-issue-when-gro.patch, 0002-Make-sure-expression-in-the-select-list-is-expanded-.patch, 0003-Bump-to-version-r18.1.patch, 0_0_0.parquet {code} 0: jdbc:drill:schema=dfs select * from test; +++ | a1 | b1 | +++ | 51237400 | 2014-03-16 03:55:21.0 | | -691523338 | 2014-02-13 15:47:22.0 | | -1843395360 | 2014-01-14 03:31:27.0 | | 1095015454 | 2014-09-27 17:35:21.0 | | -48817354 | 2014-01-17 16:14:25.0 | | 59892266 | 2014-04-23 05:08:34.0 | | -827965492 | 2014-06-21 07:29:12.0 | | -1281245980 | 2014-10-25 15:49:46.0 | | -1778510302 | 2014-07-31 04:11:53.0 | | 1346460386 | 2014-07-31 04:11:53.0 | +++ 10 rows selected (0.072 seconds) 0: jdbc:drill:schema=dfs select . . . . . . . . . . . . concat(cast(extract(day from b1) as varchar(10)), '-', cast(extract(month from b1) as varchar(10)),'-',cast(extract(year from b1) as varchar(10))), . . . . . . . . . . . . sum(a1) . . . . . . . . . . . . from . . . . . . . . . . . . test . . . . . . . . . . . . group by . . . . . . . . . . . . concat(cast(extract(day from b1) as varchar(10)), '-', cast(extract(month from b1) as varchar(10)),'-',cast(extract(year from b1) as varchar(10))); +++ | EXPR$0 | EXPR$1 | +++ | 1-1-1970 | 51237400 | | 23-12-1969 | -691523338 | | 10-12-1969 | -1843395360 | | 13-1-1970 | 1095015454 | | 31-12-1969 | -48817354 | | 1-1-1970 | 59892266 | | 22-12-1969 | -827965492 | | 17-12-1969 | -1281245980 | | 26-12-1969 | -432049916 | +++ 9 rows selected (0.103 seconds) {code} Query plan: {code} 00-01 Project(EXPR$0=[$0], EXPR$1=[$1]) 00-02Project(EXPR$0=[concat(CAST(EXTRACT(FLAG(DAY), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(MONTH), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], EXPR$1=[$1]) 00-03 HashAgg(group=[{0}], EXPR$1=[SUM($1)]) 00-04Project($f0=[CONCAT(CAST(EXTRACT(FLAG(DAY), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(MONTH), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], a1=[$1]) 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/test]], selectionRoot=/aggregation/test, numFiles=1, columns=[`b1`, `a1`]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2239) Confusing error message Column 't1' is ambiguous for _table_ name t1
[ https://issues.apache.org/jira/browse/DRILL-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319334#comment-14319334 ] Jinfeng Ni commented on DRILL-2239: --- The error message seems to be fine. In ON clause, expression ( t1 IS NULL AND t2 IS NULL) , t1 is interpreted as a column name. Unqualified identifier in the expression is interpreted as a column name. Qualified identifier like t1.key, the qualifier is interpreted as table name or alias, and the second part is interpreted as column name. Confusing error message Column 't1' is ambiguous for _table_ name t1 -- Key: DRILL-2239 URL: https://issues.apache.org/jira/browse/DRILL-2239 Project: Apache Drill Issue Type: Bug Components: SQL Parser Reporter: Daniel Barclay (Drill/MapR) Assignee: Aman Sinha Priority: Minor The error message about the erroneous use of t1 in t1 IS NULL in this query: select * from dfs_test.`xxx/jsoninput/nullable1U.json` t1 left outer join dfs_test.`xxx/jsoninput/nullable2U.json` t2 on ( t1.key = t2.key OR ( t1 IS NULL AND t2 IS NULL) ) says Column 't1' is ambiguous even though t1 is not a column name and isn't ambiguous. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2236) Optimize hash inner join by swapping inputs based on row count comparison.
Jinfeng Ni created DRILL-2236: - Summary: Optimize hash inner join by swapping inputs based on row count comparison. Key: DRILL-2236 URL: https://issues.apache.org/jira/browse/DRILL-2236 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Jinfeng Ni Currently, Drill's planner does not consider all the possible join order sequence during the planning phase, because one particular optimizer rule (SwapJoinrule) is not enabled. The reason of not enabling this rule is because it would increase the planning time significantly otherwise. This means that the join sequence for some queries might not be optimal; the sequence in the FROM clause would impact what the final join sequence the planner would get. For example, {code} select c.c_custkey, c.c_name, n.n_name from nation n, customer c where n.n_nationkey = c.c_nationkey; {code} The nation table contains 25 rows, while customer table contains 1.5 million rows. The optimal plan should put customer on the left side of hash inner join, and nation on the right side, since hash table is built on right side, and we would like to have hash table built on smaller dataset. {code} select count(*) from customer; ++ | EXPR$0 | ++ | 150 | ++ select count(*) from nation; ++ | EXPR$0 | ++ | 25 | ++ {code} However, currently Drill planner will get the following join sequence : NATION -- CUSTOMER. {code} 00-01 Project(c_custkey=[$0], c_name=[$1], n_name=[$2]) 00-02Project(c_custkey=[$3], c_name=[$4], n_name=[$1]) 00-03 HashJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Project(n_nationkey=[$1], n_name=[$0]) 00-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/tpch-sf10/nation]], selectionRoot=/Users/jni/work/data/tpch-sf10/nation, numFiles=1, columns=[`n_nationkey`, `n_name`]]]) 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/tpch-sf10/customer]], selectionRoot=/Users/jni/work/data/tpch-sf10/customer, numFiles=1, columns=[`c_nationkey`, `c_custkey`, `c_name`]]]) {code} Notice in the above plan, LEFT is nation table, while RIGHT is customer table. Before we resolve the increased planning time related to SwapJoinRule, as a workaround for now, I would like to propose that we swap the inputs for hash inner join, after the planner finishes the planning. That is, when we build the physical plan to be run on Drill's execution engine, we swap the inputs for hash inner join physical operators, based on row count comparison. The proposed workaround could cause performance regression for some queries, in particularly because the estimated row count is not accurate (especially after Filter / Join / Aggregation) due to lack of complete statistics. To remedy that regression risk, we will add a new planner option for this feature, so that user could turn on/off this feature, if they see performance regression. With this feature enabled, the above query will get the plan like : {code} 00-01 Project(c_custkey=[$0], c_name=[$1], n_name=[$2]) 00-02Project(c_custkey=[$3], c_name=[$4], n_name=[$1]) 00-03 HashJoin(condition=[=($0, $2)], joinType=[inner]) 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/tpch-sf10/customer]], selectionRoot=/Users/jni/work/data/tpch-sf10/customer, numFiles=1, columns=[`c_nationkey`, `c_custkey`, `c_name`]]]) 00-05Project(n_nationkey=[$1], n_name=[$0]) 00-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/tpch-sf10/nation]], selectionRoot=/Users/jni/work/data/tpch-sf10/nation, numFiles=1, columns=[`n_nationkey`, `n_name`]]]) {code} Please note that once we resolve the issue of SwapJoinRule, we should remove this workaround solution in Drill's code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2239) Confusing error message Column 't1' is ambiguous for _table_ name t1
[ https://issues.apache.org/jira/browse/DRILL-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319338#comment-14319338 ] Jinfeng Ni commented on DRILL-2239: --- Why ambiguous? The reason is that Drill is schema-less. In a join, any column reference should be qualified; otherwise Drill does not know which table contains this column. In the query, t1 in the expression of t1 IS NULL AND t2 IS NULL is interpreted as column name, and planner does not know which table has column t1, hence complain ambiguous . Confusing error message Column 't1' is ambiguous for _table_ name t1 -- Key: DRILL-2239 URL: https://issues.apache.org/jira/browse/DRILL-2239 Project: Apache Drill Issue Type: Bug Components: SQL Parser Reporter: Daniel Barclay (Drill/MapR) Assignee: Aman Sinha Priority: Minor The error message about the erroneous use of t1 in t1 IS NULL in this query: select * from dfs_test.`xxx/jsoninput/nullable1U.json` t1 left outer join dfs_test.`xxx/jsoninput/nullable2U.json` t2 on ( t1.key = t2.key OR ( t1 IS NULL AND t2 IS NULL) ) says Column 't1' is ambiguous even though t1 is not a column name and isn't ambiguous. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2236) Optimize hash inner join by swapping inputs based on row count comparison.
[ https://issues.apache.org/jira/browse/DRILL-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2236: -- Attachment: 0001-DRILL-2236-Optimize-hash-inner-join-by-swapping-inpu.patch Optimize hash inner join by swapping inputs based on row count comparison. --- Key: DRILL-2236 URL: https://issues.apache.org/jira/browse/DRILL-2236 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Jinfeng Ni Attachments: 0001-DRILL-2236-Optimize-hash-inner-join-by-swapping-inpu.patch, 0001-DRILL-2236-Optimize-hash-inner-join-by-swapping-inpu.patch Currently, Drill's planner does not consider all the possible join order sequence during the planning phase, because one particular optimizer rule (SwapJoinrule) is not enabled. The reason of not enabling this rule is because it would increase the planning time significantly otherwise. This means that the join sequence for some queries might not be optimal; the sequence in the FROM clause would impact what the final join sequence the planner would get. For example, {code} select c.c_custkey, c.c_name, n.n_name from nation n, customer c where n.n_nationkey = c.c_nationkey; {code} The nation table contains 25 rows, while customer table contains 1.5 million rows. The optimal plan should put customer on the left side of hash inner join, and nation on the right side, since hash table is built on right side, and we would like to have hash table built on smaller dataset. {code} select count(*) from customer; ++ | EXPR$0 | ++ | 150 | ++ select count(*) from nation; ++ | EXPR$0 | ++ | 25 | ++ {code} However, currently Drill planner will get the following join sequence : NATION -- CUSTOMER. {code} 00-01 Project(c_custkey=[$0], c_name=[$1], n_name=[$2]) 00-02Project(c_custkey=[$3], c_name=[$4], n_name=[$1]) 00-03 HashJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Project(n_nationkey=[$1], n_name=[$0]) 00-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/tpch-sf10/nation]], selectionRoot=/Users/jni/work/data/tpch-sf10/nation, numFiles=1, columns=[`n_nationkey`, `n_name`]]]) 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/tpch-sf10/customer]], selectionRoot=/Users/jni/work/data/tpch-sf10/customer, numFiles=1, columns=[`c_nationkey`, `c_custkey`, `c_name`]]]) {code} Notice in the above plan, LEFT is nation table, while RIGHT is customer table. Before we resolve the increased planning time related to SwapJoinRule, as a workaround for now, I would like to propose that we swap the inputs for hash inner join, after the planner finishes the planning. That is, when we build the physical plan to be run on Drill's execution engine, we swap the inputs for hash inner join physical operators, based on row count comparison. The proposed workaround could cause performance regression for some queries, in particularly because the estimated row count is not accurate (especially after Filter / Join / Aggregation) due to lack of complete statistics. To remedy that regression risk, we will add a new planner option for this feature, so that user could turn on/off this feature, if they see performance regression. With this feature enabled, the above query will get the plan like : {code} 00-01 Project(c_custkey=[$0], c_name=[$1], n_name=[$2]) 00-02Project(c_custkey=[$3], c_name=[$4], n_name=[$1]) 00-03 HashJoin(condition=[=($0, $2)], joinType=[inner]) 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/tpch-sf10/customer]], selectionRoot=/Users/jni/work/data/tpch-sf10/customer, numFiles=1, columns=[`c_nationkey`, `c_custkey`, `c_name`]]]) 00-05Project(n_nationkey=[$1], n_name=[$0]) 00-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/tpch-sf10/nation]], selectionRoot=/Users/jni/work/data/tpch-sf10/nation, numFiles=1, columns=[`n_nationkey`, `n_name`]]]) {code} Please note that once we resolve the issue of SwapJoinRule, we should remove this workaround solution in Drill's code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2236) Optimize hash inner join by swapping inputs based on row count comparison.
[ https://issues.apache.org/jira/browse/DRILL-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2236: -- Attachment: 0001-DRILL-2236-Optimize-hash-inner-join-by-swapping-inpu.patch Optimize hash inner join by swapping inputs based on row count comparison. --- Key: DRILL-2236 URL: https://issues.apache.org/jira/browse/DRILL-2236 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Jinfeng Ni Attachments: 0001-DRILL-2236-Optimize-hash-inner-join-by-swapping-inpu.patch Currently, Drill's planner does not consider all the possible join order sequence during the planning phase, because one particular optimizer rule (SwapJoinrule) is not enabled. The reason of not enabling this rule is because it would increase the planning time significantly otherwise. This means that the join sequence for some queries might not be optimal; the sequence in the FROM clause would impact what the final join sequence the planner would get. For example, {code} select c.c_custkey, c.c_name, n.n_name from nation n, customer c where n.n_nationkey = c.c_nationkey; {code} The nation table contains 25 rows, while customer table contains 1.5 million rows. The optimal plan should put customer on the left side of hash inner join, and nation on the right side, since hash table is built on right side, and we would like to have hash table built on smaller dataset. {code} select count(*) from customer; ++ | EXPR$0 | ++ | 150 | ++ select count(*) from nation; ++ | EXPR$0 | ++ | 25 | ++ {code} However, currently Drill planner will get the following join sequence : NATION -- CUSTOMER. {code} 00-01 Project(c_custkey=[$0], c_name=[$1], n_name=[$2]) 00-02Project(c_custkey=[$3], c_name=[$4], n_name=[$1]) 00-03 HashJoin(condition=[=($0, $2)], joinType=[inner]) 00-05Project(n_nationkey=[$1], n_name=[$0]) 00-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/tpch-sf10/nation]], selectionRoot=/Users/jni/work/data/tpch-sf10/nation, numFiles=1, columns=[`n_nationkey`, `n_name`]]]) 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/tpch-sf10/customer]], selectionRoot=/Users/jni/work/data/tpch-sf10/customer, numFiles=1, columns=[`c_nationkey`, `c_custkey`, `c_name`]]]) {code} Notice in the above plan, LEFT is nation table, while RIGHT is customer table. Before we resolve the increased planning time related to SwapJoinRule, as a workaround for now, I would like to propose that we swap the inputs for hash inner join, after the planner finishes the planning. That is, when we build the physical plan to be run on Drill's execution engine, we swap the inputs for hash inner join physical operators, based on row count comparison. The proposed workaround could cause performance regression for some queries, in particularly because the estimated row count is not accurate (especially after Filter / Join / Aggregation) due to lack of complete statistics. To remedy that regression risk, we will add a new planner option for this feature, so that user could turn on/off this feature, if they see performance regression. With this feature enabled, the above query will get the plan like : {code} 00-01 Project(c_custkey=[$0], c_name=[$1], n_name=[$2]) 00-02Project(c_custkey=[$3], c_name=[$4], n_name=[$1]) 00-03 HashJoin(condition=[=($0, $2)], joinType=[inner]) 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/tpch-sf10/customer]], selectionRoot=/Users/jni/work/data/tpch-sf10/customer, numFiles=1, columns=[`c_nationkey`, `c_custkey`, `c_name`]]]) 00-05Project(n_nationkey=[$1], n_name=[$0]) 00-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/tpch-sf10/nation]], selectionRoot=/Users/jni/work/data/tpch-sf10/nation, numFiles=1, columns=[`n_nationkey`, `n_name`]]]) {code} Please note that once we resolve the issue of SwapJoinRule, we should remove this workaround solution in Drill's code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2271) count(distinct ) against SystemTablePlugin will hit plan serialization/deserialization issue when executed in distributed mode.
[ https://issues.apache.org/jira/browse/DRILL-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni resolved DRILL-2271. --- Resolution: Duplicate count(distinct ) against SystemTablePlugin will hit plan serialization/deserialization issue when executed in distributed mode. --- Key: DRILL-2271 URL: https://issues.apache.org/jira/browse/DRILL-2271 Project: Apache Drill Issue Type: Bug Reporter: Jinfeng Ni Priority: Minor The following query run fine in single node. {code} select count(distinct name) as cnt from `sys`.`options`; {code} However, if change the slice_target =1 and make it run in distributed mode, it hit error : {code} alter session set `planner.slice_target` = 1; +++ | ok | summary | +++ | true | planner.slice_target updated. | +++ select count(distinct name) as cnt from `sys`.`options`; Query failed: RemoteRpcException: Failure while trying to start remote fragment, No suitable constructor found for type [simple type, class org.apache.drill.exec.store.sys.SystemTablePlugin]: can not instantiate from JSON object (need to add/enable type information?) {code} Seems the serialization/deserialization of the plan for SystemTablePlugin does not work correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2242) Wrong result (more rows) when outer query groups by subset of columns that inner query groups by
[ https://issues.apache.org/jira/browse/DRILL-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2242: -- Attachment: 0002-DRILL-2242-Propagate-distribution-trait-when-Project.patch [~amansinha100], could you please review the patch? Thanks! Wrong result (more rows) when outer query groups by subset of columns that inner query groups by Key: DRILL-2242 URL: https://issues.apache.org/jira/browse/DRILL-2242 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.7.0 Reporter: Aman Sinha Assignee: Jinfeng Ni Priority: Critical Attachments: 0002-DRILL-2242-Propagate-distribution-trait-when-Project.patch The following query has a subquery that groups on 2 columns and outer query that queries on 1 of those columns. With slice_target = 1 to force exchanges, it produces incorrect result: {code} alter session set `planner.slice_target` = 1; select count(*) from (select l_partksy from (select l_partkey, l_suppkey from cp.`tpch/lineitem.parquet` group by l_partkey, l_suppkey) group by l_partkey ); ++ | EXPR$0 | ++ | 6227 | ++ 1 row selected (1.522 seconds) {code} Correct result (from Postgres): {code} count --- 2000 (1 row) {code} The cause appears to be related to distribution trait propagation. Here's the EXPLAIN plan: {code} +++ |text|json| +++ | 00-00Screen 00-01 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) 00-02UnionExchange 01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) 01-02Project($f0=[0]) 01-03 HashAgg(group=[{0}]) 01-04Project(l_partkey=[$0]) 01-05 HashAgg(group=[{0, 1}]) 01-06HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) 02-01 HashAgg(group=[{0, 1}]) 02-02Project(l_partkey=[$1], l_suppkey=[$0]) 02-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/lineitem.parquet]], selectionRoot=/tpch/lineitem.parquet, numFiles=1, columns=[`l_partkey`, `l_suppkey`]]]) {code} Note that the HashExchange operator 06 does a distribute on 2 columns l_partkey and l_suppkey in order to perform the 2phase aggregation. These are the group-by columns. However, in the outer query's HashAgg, there is no re-distribution being done. It assumes that data is already hash distributed on l_partkey which is not correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2269) Provide default implementation for getting cost of evaluating an expression in base class of LogicalExpression, in stead of throw Exception
[ https://issues.apache.org/jira/browse/DRILL-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2269: -- Attachment: 0001-DRILL-2269-Add-default-implementation-for-estimating.patch [~amansinha100], could you please review the patch? Thanks. Provide default implementation for getting cost of evaluating an expression in base class of LogicalExpression, in stead of throw Exception Key: DRILL-2269 URL: https://issues.apache.org/jira/browse/DRILL-2269 Project: Apache Drill Issue Type: Bug Reporter: Jinfeng Ni Assignee: Jinfeng Ni Attachments: 0001-DRILL-2269-Add-default-implementation-for-estimating.patch Today, the abstract class of LogicalExpression will throw Exception for two methods related to estimate the cost of evaluating an expression, in the hope that sub classes will override those two methods. However, if someone forgets to override, compiler and mvn build process will still be successful, yet will hit Exception in execution-time, complaining the sub class does not implement these methods. As Drill's adding more sub classes of LogicalExpression to the code base, it's more likely for someone to forget to override those two methods, making it error-prone to just throw Exception in the base class. In stead, we should provide a default implementation for those two methods in the base class. Certain sub classes will override to change the behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2087) Some planner option setting might produce a wrong plan and wrong query result for join query.
Jinfeng Ni created DRILL-2087: - Summary: Some planner option setting might produce a wrong plan and wrong query result for join query. Key: DRILL-2087 URL: https://issues.apache.org/jira/browse/DRILL-2087 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Jinfeng Ni I have the following parquet data ( essentially tpch's nation/region tables, but break into multiple parts). {code} ls -l nation total 24 -rw-r--r--@ 1 jni staff 1376 Jan 27 16:47 01.parquet -rw-r--r--@ 1 jni staff 1400 Jan 27 16:47 02.parquet -rw-r--r--@ 1 jni staff 1279 Jan 27 16:47 03.parquet ls -l region total 24 -rw-r--r--@ 1 jni staff 564 Jan 27 16:54 01.parquet -rw-r--r--@ 1 jni staff 491 Jan 27 16:54 02.parquet -rw-r--r--@ 1 jni staff 506 Jan 27 16:54 03.parquet {code} With the default planner setting, the following query produce the correct result : {code} 0: jdbc:drill:zk=local select t1.n_nationkey from dfs.`/Users/jni/work/data/parquet/region` t2 join dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = t2.r_regionkey; SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. +-+ | n_nationkey | +-+ | 18 | | 12 | | 9 | | 8 | | 21 | | 19 | | 7 | | 6 | | 23 | | 22 | | 20 | | 13 | | 11 | | 10 | | 4 | | 0 | | 16 | | 15 | | 14 | | 5 | | 24 | | 17 | | 3 | | 2 | | 1 | +-+ 25 rows selected (2.057 seconds) {code} However, if I modify the planner option (`planner.slice_target` and `planner.join.row_count_estimate_factor` ) as following, then the same query would return 0 row, which is not correct. {code} 0: jdbc:drill:zk=local alter session set `planner.slice_target` = 10; +++ | ok | summary | +++ | true | planner.slice_target updated. | +++ 1 row selected (0.093 seconds) 0: jdbc:drill:zk=local alter session set `planner.join.row_count_estimate_factor` = 0.1; +++ | ok | summary | +++ | true | planner.join.row_count_estimate_factor updated. | +++ 1 row selected (0.041 seconds) 0: jdbc:drill:zk=local select t1.n_nationkey from dfs.`/Users/jni/work/data/parquet/region` t2 join dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = t2.r_regionkey; +-+ | n_nationkey | +-+ +-+ No rows selected (0.71 seconds) {code} If we look at explain plan result, after changing the planner option, we will see for the JOIN operator, the LEFT does not have any EXCHANGE operator, while the RIGHT has HashToRandomExchange operator. That seems to be not a valid plan. {code} 00-00Screen 00-01 UnionExchange 01-01Project(n_nationkey=[$2]) 01-02 HashJoin(condition=[=($1, $0)], joinType=[inner]) 01-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/region]], selectionRoot=/Users/jni/work/data/parquet/region, numFiles=1, columns=[`r_regionkey`]]]) 01-03HashToRandomExchange(dist0=[[$0]]) 02-01 Project(n_regionkey=[$1], n_nationkey=[$0]) 02-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/nation]], selectionRoot=/Users/jni/work/data/parquet/nation, numFiles=1, columns=[`n_regionkey`, `n_nationkey`]]]) {code} The cause of this problem seems to be that Drill will remove EXCHANGE operator under some conditions. That condition does not guarantee that a JOIN operator always either has EXCHANGE on both sides, or none. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2107) Hash Join throw IOBE for a query with exists subquery.
Jinfeng Ni created DRILL-2107: - Summary: Hash Join throw IOBE for a query with exists subquery. Key: DRILL-2107 URL: https://issues.apache.org/jira/browse/DRILL-2107 Project: Apache Drill Issue Type: New Feature Components: Execution - Operators Reporter: Jinfeng Ni Assignee: Chris Westin Priority: Critical I hit an IOBE for TestTpchDistributed Q4, when I tried to enable an optimizer rule. Then, I simplified Q4 to the following, and still re-produce the same IOBE. {code} select o.o_orderpriority from cp.`tpch/orders.parquet` o where exists ( select * from cp.`tpch/lineitem.parquet` l where l.l_orderkey = o.o_orderkey ) ; {code} Stack trace of the exception: {code} java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_45] at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_45] at org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:232) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById(RecordBatchLoader.java:149) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getValueAccessorById(UnorderedReceiverBatch.java:132) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.test.generated.HashTableGen307.doSetup(HashTableTemplate.java:71) ~[na:na] at org.apache.drill.exec.test.generated.HashTableGen307.updateBatches(HashTableTemplate.java:473) ~[na:na] at org.apache.drill.exec.test.generated.HashJoinProbeGen313.executeProbePhase(HashJoinProbeTemplate.java:139) ~[na:na] at org.apache.drill.exec.test.generated.HashJoinProbeGen313.probeAndProject(HashJoinProbeTemplate.java:223) ~[na:na] at org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:227) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] {code} The physical plan seems to be correct, after enabling the new rule. Actually, if I disable HashJoin, and use merge join for the query, it works fine. So, seems the IOBE exposes some bug in HashJoin. To re-produce this issue, two options: 1 ) - Modify DrillRuleSets.java, remove the comment before SwapJoinRule - alter session set `planner.slice_target` = 10; - run the query 2) use the attached physical plan in json file, and use submitplan to submit the physical plan. For comparison, I also attached the physical plan when disabling hashjoin (use merge join), and the explain plan at physical operator level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2107) Hash Join throw IOBE for a query with exists subquery.
[ https://issues.apache.org/jira/browse/DRILL-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2107: -- Attachment: q4_1_mj.json q4_1_mj_phy.txt q4_1_hj.json q4_1_hj_phy.txt q4_1_hj_phy.txt : the explain plan output when HashJoin is enabled. q4_1_hj.json : the physical plan in JSON format when HashJoin is enabled. q4_1_mj_phy.txt: the explain plan output when MergeJoin is used. q4_1_mj.json: the physical plan in JSON format when MergeJoin is used. To submit the physical plan in JSON format directly, use the following command: { distribution/target/apache-drill-0.8.0-SNAPSHOT/apache-drill-0.8.0-SNAPSHOT/bin/submit_plan -f q4_1_hj.json -t physical -l } Hash Join throw IOBE for a query with exists subquery. --- Key: DRILL-2107 URL: https://issues.apache.org/jira/browse/DRILL-2107 Project: Apache Drill Issue Type: New Feature Components: Execution - Operators Reporter: Jinfeng Ni Assignee: Chris Westin Priority: Critical Attachments: q4_1_hj.json, q4_1_hj_phy.txt, q4_1_mj.json, q4_1_mj_phy.txt I hit an IOBE for TestTpchDistributed Q4, when I tried to enable an optimizer rule. Then, I simplified Q4 to the following, and still re-produce the same IOBE. {code} select o.o_orderpriority from cp.`tpch/orders.parquet` o where exists ( select * from cp.`tpch/lineitem.parquet` l where l.l_orderkey = o.o_orderkey ) ; {code} Stack trace of the exception: {code} java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_45] at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_45] at org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:232) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById(RecordBatchLoader.java:149) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getValueAccessorById(UnorderedReceiverBatch.java:132) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.test.generated.HashTableGen307.doSetup(HashTableTemplate.java:71) ~[na:na] at org.apache.drill.exec.test.generated.HashTableGen307.updateBatches(HashTableTemplate.java:473) ~[na:na] at org.apache.drill.exec.test.generated.HashJoinProbeGen313.executeProbePhase(HashJoinProbeTemplate.java:139) ~[na:na] at org.apache.drill.exec.test.generated.HashJoinProbeGen313.probeAndProject(HashJoinProbeTemplate.java:223) ~[na:na] at org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:227) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] {code} The physical plan seems to be correct, after enabling the new rule. Actually, if I disable HashJoin, and use merge join for the query, it works fine. So, seems the IOBE exposes some bug in HashJoin. To re-produce this issue, two options: 1 ) - Modify DrillRuleSets.java, remove the comment before SwapJoinRule - alter session set `planner.slice_target` = 10; - run the query 2) use the attached physical plan in json file, and use submitplan to submit the physical plan. For comparison, I also attached the physical plan when disabling hashjoin (use merge join), and the explain plan at physical operator level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2107) Hash Join throw IOBE for a query with exists subquery.
[ https://issues.apache.org/jira/browse/DRILL-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2107: -- Assignee: Mehant Baid (was: Chris Westin) Hash Join throw IOBE for a query with exists subquery. --- Key: DRILL-2107 URL: https://issues.apache.org/jira/browse/DRILL-2107 Project: Apache Drill Issue Type: New Feature Components: Execution - Operators Reporter: Jinfeng Ni Assignee: Mehant Baid Priority: Critical Attachments: q4_1_hj.json, q4_1_hj_phy.txt, q4_1_mj.json, q4_1_mj_phy.txt I hit an IOBE for TestTpchDistributed Q4, when I tried to enable an optimizer rule. Then, I simplified Q4 to the following, and still re-produce the same IOBE. {code} select o.o_orderpriority from cp.`tpch/orders.parquet` o where exists ( select * from cp.`tpch/lineitem.parquet` l where l.l_orderkey = o.o_orderkey ) ; {code} Stack trace of the exception: {code} java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_45] at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_45] at org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:232) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById(RecordBatchLoader.java:149) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getValueAccessorById(UnorderedReceiverBatch.java:132) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.test.generated.HashTableGen307.doSetup(HashTableTemplate.java:71) ~[na:na] at org.apache.drill.exec.test.generated.HashTableGen307.updateBatches(HashTableTemplate.java:473) ~[na:na] at org.apache.drill.exec.test.generated.HashJoinProbeGen313.executeProbePhase(HashJoinProbeTemplate.java:139) ~[na:na] at org.apache.drill.exec.test.generated.HashJoinProbeGen313.probeAndProject(HashJoinProbeTemplate.java:223) ~[na:na] at org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:227) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] {code} The physical plan seems to be correct, after enabling the new rule. Actually, if I disable HashJoin, and use merge join for the query, it works fine. So, seems the IOBE exposes some bug in HashJoin. To re-produce this issue, two options: 1 ) - Modify DrillRuleSets.java, remove the comment before SwapJoinRule - alter session set `planner.slice_target` = 10; - run the query 2) use the attached physical plan in json file, and use submitplan to submit the physical plan. For comparison, I also attached the physical plan when disabling hashjoin (use merge join), and the explain plan at physical operator level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2074) Queries fail with OutOfMemory Exception when Hash Join Agg are turned off
[ https://issues.apache.org/jira/browse/DRILL-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2074: -- Component/s: (was: Query Planning Optimization) Execution - Operators Assignee: Chris Westin (was: Jinfeng Ni) Queries fail with OutOfMemory Exception when Hash Join Agg are turned off --- Key: DRILL-2074 URL: https://issues.apache.org/jira/browse/DRILL-2074 Project: Apache Drill Issue Type: Bug Components: Execution - Operators Reporter: Abhishek Girish Assignee: Chris Westin Attachments: 05_par1000.q, 05_par1000_d6e54ab.logical.plan, 05_par1000_d6e54ab.verbose.plan, drill-env.sh Query attached. Hash Join and Hash Agg were turned off. And the following property was added to drill-override.conf: sort: { purge.threshold : 100, external: { batch.size : 4000, spill: { batch.size : 4000, group.size : 100, threshold : 200, directories : [ /drill_spill ], fs : maprfs:/// } } } Query failed with the below error message: Query failed: RemoteRpcException: Failure while running fragment., Unable to allocate sv2 buffer after repeated attempts [ faf3044a-e14a-427b-b66d-7bcd7522ead5 on drone-42:31010 ] [ faf3044a-e14a-427b-b66d-7bcd7522ead5 on drone-42:31010 ] Log Snippets: 2015-01-26 20:07:33,239 atsqa8c42.qa.lab [2b396307-2c1e-3486-90bc-fbaf09fbeb3e:frag:15:51] ERROR o.a.d.e.w.f.AbstractStatusReporter - Error faf3044a-e14a-427b-b66d-7bcd7522ead5: Failure while running fragment. java.lang.RuntimeException: org.apache.drill.exec.memory.OutOfMemoryException: Unable to allocate sv2 buffer after repeated attempts at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:309) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:96) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.join.JoinStatus.nextRight(JoinStatus.java:80) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.join.JoinStatus.ensureInitial(JoinStatus.java:95) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.join.MergeJoinBatch.innerNext(MergeJoinBatch.java:147) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:96) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at
[jira] [Commented] (DRILL-2074) Queries fail with OutOfMemory Exception when Hash Join Agg are turned off
[ https://issues.apache.org/jira/browse/DRILL-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297340#comment-14297340 ] Jinfeng Ni commented on DRILL-2074: --- This exception is thrown in physical operator during execution time. Change the component to Execution-operators. Queries fail with OutOfMemory Exception when Hash Join Agg are turned off --- Key: DRILL-2074 URL: https://issues.apache.org/jira/browse/DRILL-2074 Project: Apache Drill Issue Type: Bug Components: Execution - Operators Reporter: Abhishek Girish Assignee: Chris Westin Attachments: 05_par1000.q, 05_par1000_d6e54ab.logical.plan, 05_par1000_d6e54ab.verbose.plan, drill-env.sh Query attached. Hash Join and Hash Agg were turned off. And the following property was added to drill-override.conf: sort: { purge.threshold : 100, external: { batch.size : 4000, spill: { batch.size : 4000, group.size : 100, threshold : 200, directories : [ /drill_spill ], fs : maprfs:/// } } } Query failed with the below error message: Query failed: RemoteRpcException: Failure while running fragment., Unable to allocate sv2 buffer after repeated attempts [ faf3044a-e14a-427b-b66d-7bcd7522ead5 on drone-42:31010 ] [ faf3044a-e14a-427b-b66d-7bcd7522ead5 on drone-42:31010 ] Log Snippets: 2015-01-26 20:07:33,239 atsqa8c42.qa.lab [2b396307-2c1e-3486-90bc-fbaf09fbeb3e:frag:15:51] ERROR o.a.d.e.w.f.AbstractStatusReporter - Error faf3044a-e14a-427b-b66d-7bcd7522ead5: Failure while running fragment. java.lang.RuntimeException: org.apache.drill.exec.memory.OutOfMemoryException: Unable to allocate sv2 buffer after repeated attempts at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:309) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:96) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.join.JoinStatus.nextRight(JoinStatus.java:80) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.join.JoinStatus.ensureInitial(JoinStatus.java:95) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.join.MergeJoinBatch.innerNext(MergeJoinBatch.java:147) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:96) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at
[jira] [Commented] (DRILL-2012) Flatten fails when we filter by a non-projected column
[ https://issues.apache.org/jira/browse/DRILL-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297342#comment-14297342 ] Jinfeng Ni commented on DRILL-2012: --- Sean, Can you please take a look at this issue? thx. Flatten fails when we filter by a non-projected column --- Key: DRILL-2012 URL: https://issues.apache.org/jira/browse/DRILL-2012 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Rahul Challapalli Assignee: Sean Hsuan-Yi Chu Priority: Critical Attachments: error.log git.commit.id.abbrev=b491cdb {code} { id: 0001, type: donut, name: Cake, ppu: 0.55, batters: { batter: [ { id: 1001, type: Regular }, { id: 1002, type: Chocolate }, { id: 1003, type: Blueberry }, { id: 1004, type: Devil's Food } ] }, topping: [ { id: 5001, type: None }, { id: 5002, type: Glazed }, { id: 5005, type: Sugar }, { id: 5007, type: Powdered Sugar }, { id: 5006, type: Chocolate with Sprinkles }, { id: 5003, type: Chocolate }, { id: 5004, type: Maple } ] } {code} The below query fails : {code} select j.id id, j.name name, flatten(j.topping) tt, flatten(j.batters.batter) bb from `sample.json` j where j.type = 'donut'; Query failed: Query failed: Failure while running fragment., Trying to flatten a non-repeated filed. [ 98ff5db0-4aa7-4f65-9a7a-93ae8bc41f6b on qa-node190.qa.lab:31010 ] [ 98ff5db0-4aa7-4f65-9a7a-93ae8bc41f6b on qa-node190.qa.lab:31010 ] {code} However if I include the filter column in the list of selected columns, then the query succeeds. {code} select j.type, j.id id, j.name name, flatten(j.topping) tt, flatten(j.batters.batter) bb from `user.json` j where j.type = 'donut'; ++++++ |type| id |name| tt | bb | ++++++ | donut | 0001 | Cake | {id:5001,type:None} | {id:1001,type:Regular} | | donut | 0001 | Cake | {id:5001,type:None} | {id:1002,type:Chocolate} | | donut | 0001 | Cake | {id:5001,type:None} | {id:1003,type:Blueberry} | | donut | 0001 | Cake | {id:5001,type:None} | {id:1004,type:Devil's Food} | | donut | 0001 | Cake | {id:5002,type:Glazed} | {id:1001,type:Regular} | | donut | 0001 | Cake | {id:5002,type:Glazed} | {id:1002,type:Chocolate} | | donut | 0001 | Cake | {id:5002,type:Glazed} | {id:1003,type:Blueberry} | | donut | 0001 | Cake | {id:5002,type:Glazed} | {id:1004,type:Devil's Food} | | donut | 0001 | Cake | {id:5005,type:Sugar} | {id:1001,type:Regular} | | donut | 0001 | Cake | {id:5005,type:Sugar} | {id:1002,type:Chocolate} | | donut | 0001 | Cake | {id:5005,type:Sugar} | {id:1003,type:Blueberry} | | donut | 0001 | Cake | {id:5005,type:Sugar} | {id:1004,type:Devil's Food} | | donut | 0001 | Cake | {id:5007,type:Powdered Sugar} | {id:1001,type:Regular} | | donut | 0001 | Cake | {id:5007,type:Powdered Sugar} | {id:1002,type:Chocolate} | | donut | 0001 | Cake | {id:5007,type:Powdered Sugar} | {id:1003,type:Blueberry} | | donut | 0001 | Cake | {id:5007,type:Powdered Sugar} | {id:1004,type:Devil's Food} | | donut | 0001 | Cake | {id:5006,type:Chocolate with Sprinkles} | {id:1001,type:Regular} | | donut | 0001 | Cake | {id:5006,type:Chocolate with Sprinkles} | {id:1002,type:Chocolate} | | donut | 0001 | Cake | {id:5006,type:Chocolate with Sprinkles} | {id:1003,type:Blueberry} | | donut | 0001 | Cake | {id:5006,type:Chocolate with Sprinkles} | {id:1004,type:Devil's Food} | | donut | 0001 | Cake | {id:5003,type:Chocolate} | {id:1001,type:Regular} | | donut | 0001 | Cake | {id:5003,type:Chocolate} | {id:1002,type:Chocolate} | | donut | 0001 | Cake | {id:5003,type:Chocolate} | {id:1003,type:Blueberry} | | donut | 0001 | Cake | {id:5003,type:Chocolate} | {id:1004,type:Devil's Food} | | donut | 0001 | Cake | {id:5004,type:Maple} | {id:1001,type:Regular} | | donut | 0001 | Cake | {id:5004,type:Maple} | {id:1002,type:Chocolate} | | donut
[jira] [Updated] (DRILL-2115) Disable function CARDINALITY in grammar
[ https://issues.apache.org/jira/browse/DRILL-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2115: -- Assignee: Sean Hsuan-Yi Chu (was: Jinfeng Ni) Disable function CARDINALITY in grammar --- Key: DRILL-2115 URL: https://issues.apache.org/jira/browse/DRILL-2115 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Sean Hsuan-Yi Chu Priority: Minor Since we don't support multiset type, we should disable CARDINALITY function as well: {code} 0: jdbc:drill:schema=dfs select cardinality(list) from `test.json`; Query failed: Query stopped., Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [cardinality(BIGINT-REPEATED)]. Full expression: --UNKNOWN EXPRESSION--.. [ db86cf79-6083-4ad7-afa1-fac534b942bc on atsqa4-134.qa.lab:31010 ] Error: exception while executing query: Failure while executing query. (state=,code=0) 0: jdbc:drill:schema=dfs select cardinality(id) from `test.json`; Query failed: Query stopped., Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [cardinality(BIGINT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. [ 1f209cb2-4d60-4c1d-b1d0-d921f8e8a913 on atsqa4-134.qa.lab:31010 ] Error: exception while executing query: Failure while executing query. (state=,code=0) 0: jdbc:drill:schema=dfs select cardinality() from `test.json`; Query failed: SqlValidatorException: Invalid number of arguments to function 'CARDINALITY'. Was expecting 1 arguments Error: exception while executing query: Failure while executing query. (state=,code=0) 0: jdbc:drill:schema=dfs select cardinality('aa') from `test.json`; Query failed: SqlValidatorException: Cannot apply 'CARDINALITY' to arguments of type 'CARDINALITY(CHAR(2))'. Supported form(s): 'CARDINALITY(MULTISET)' 'CARDINALITY(ARRAY)' 'CARDINALITY(MAP)' Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery
[ https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2069: -- Attachment: 0001-DRILL-2069-Fix-star-query-prefix-logic-it-s-used-wit.patch Star is not expanded correctly in the query with IN clause containing subquery -- Key: DRILL-2069 URL: https://issues.apache.org/jira/browse/DRILL-2069 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Critical Attachments: 0001-DRILL-2069-Fix-star-query-prefix-logic-it-s-used-wit.patch t1.json {code} { a1: aa, b1: 1 } { a1: bb, b1: 2 } { a1: cc, b1: 3 } {code} t2.json {code} { a2: aa, b2: 1 } { a2: bb, b2: 2 } { a2: xx, b2: 10 } {code} Star is expanded incorrectly, we should get only columns from `t1.json` {code} 0: jdbc:drill:schema=dfs select * from `t1.json` where a1 in (select a2 from `t2.json`); +++++ | a2 | a1 | b1 |a10 | +++++ | aa | aa | 1 | aa | | bb | bb | 2 | bb | +++++ 2 rows selected (0.172 seconds) {code} explain plan {code} 00-01 Project(*=[$0]) 00-02Project(*=[$0]) 00-03 HashJoin(condition=[=($1, $2)], joinType=[inner]) 00-05Project(*=[$0], a1=[$1]) 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], files=[maprfs:/test/t1.json]]]) 00-04HashAgg(group=[{0}]) 00-06 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], files=[maprfs:/test/t2.json]]]) {code} Workaround - specify columns explicitly {code} 0: jdbc:drill:schema=dfs select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 in (select t2.a2 from `t2.json` t2); +++ | a1 |a10 | +++ | aa | aa | | bb | bb | +++ 2 rows selected (0.24 seconds) {code} Note to myself: include cases like below during verification: {code} 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in (select a2, b2 from `t2.json`); +++++++ | a2 | b2 | a1 | b1 |a10 |b10 | +++++++ | aa | 1 | aa | 1 | aa | 1 | | bb | 2 | bb | 2 | bb | 2 | +++++++ 2 rows selected (0.323 seconds) 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in (select * from `t2.json`); Query failed: SqlValidatorException: Values passed to IN operator must have compatible types Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2016) Remove multiset data type from supported grammar
[ https://issues.apache.org/jira/browse/DRILL-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2016: -- Assignee: Sean Hsuan-Yi Chu (was: Jinfeng Ni) Remove multiset data type from supported grammar -- Key: DRILL-2016 URL: https://issues.apache.org/jira/browse/DRILL-2016 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Sean Hsuan-Yi Chu Should throw unsupported error {code} 0: jdbc:drill:schema=dfs select cast(a1 as multiset) from `t1.json`; Query failed: Query failed: Unexpected exception during fragment initialization: use createMultisetType() instead Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2008) select a || b from ... fails while select cast(a as varchar) || cast(b as varchar) from ...
[ https://issues.apache.org/jira/browse/DRILL-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2008: -- Assignee: Sean Hsuan-Yi Chu (was: Jinfeng Ni) select a || b from ... fails while select cast(a as varchar) || cast(b as varchar) from ... --- Key: DRILL-2008 URL: https://issues.apache.org/jira/browse/DRILL-2008 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Sean Hsuan-Yi Chu Assignee: Sean Hsuan-Yi Chu Priority: Critical The first query would fail at planning by optiq/calcite. For example, select n_name || n_name from cp.`tpch/nation.parquet`; org.apache.drill.exec.rpc.RpcException: Query failed: Unexpected exception during fragment initialization: null at org.apache.drill.exec.rpc.user.QueryResultHandler.batchArrived(QueryResultHandler.java:79) at org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:93) at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:52) at org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:34) at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:58) at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:194) at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:173) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:161) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1969) Error message thrown while querying Hive Views from Drill must be improved
[ https://issues.apache.org/jira/browse/DRILL-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271870#comment-14271870 ] Jinfeng Ni commented on DRILL-1969: --- +1. Fix looks good to me for this issue. Seems this is not the only place where we did not display the root cause of the exception. We may want to consider fixing the rest of them as well later. Error message thrown while querying Hive Views from Drill must be improved -- Key: DRILL-1969 URL: https://issues.apache.org/jira/browse/DRILL-1969 Project: Apache Drill Issue Type: Improvement Components: Storage - Hive Affects Versions: 0.7.0, 0.8.0 Reporter: Abhishek Girish Assignee: Venki Korukanti Fix For: 0.8.0 Attachments: 0001-DRILL-1969-Improve-the-error-message-displayed-when-.patch Currently querying Hive views is not supported via Drill. This may however not be clear to an end user as both hive tables and hive views can be listed via Drill. use hive; +++ | ok | summary | +++ | true | Default schema changed to 'hive' | +++ show tables; +--++ | TABLE_SCHEMA | TABLE_NAME | +--++ | hive.default | tab1_view | | hive.default | tab1 | +--++ Querying a hive table is successful. However querying a hive view fails with an error which isn't informative. select * from tab1; +++ | a1 | a2 | +++ | 1 | ABC| +++ select * from tab1_view; Query failed: Query failed: Unexpected exception during fragment initialization: Internal error: Error while applying rule DrillPushProjIntoScan, args [rel#5828:ProjectRel.NONE.ANY([]).[](child=rel#5827:Subset#0.ENUMERABLE.ANY([]).[],a1=$0,a2=$1), rel#5822:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[hive, tab1_view])] Error: exception while executing query: Failure while executing query. (state=,code=0) The error message thrown at the console must be improved such as to indicate that at present querying Hive Views is not supported. LOG: 2015-01-08 17:38:41,009 [2b50cede-b757-f6c3-6c5f-56f524796772:foreman] ERROR o.a.drill.exec.work.foreman.Foreman - Error 3efe02b8-19cd-4e42-a89c-190488762993: Query failed: Unexpected exception during fragment initialization: Internal error: Error while applying rule DrillPushProjIntoScan, args [rel#5828:ProjectRel.NONE.ANY([]).[](child=rel#5827:Subset#0.ENUMERABLE.ANY([]).[],a1=$0,a2=$1), rel#5822:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[hive, tab1_view])] org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Internal error: Error while applying rule DrillPushProjIntoScan, args [rel#5828:ProjectRel.NONE.ANY([]).[](child=rel#5827:Subset#0.ENUMERABLE.ANY([]).[],a1=$0,a2=$1), rel#5822:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[hive, tab1_view])] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:194) [drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT] at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254) [drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] Caused by: java.lang.AssertionError: Internal error: Error while applying rule DrillPushProjIntoScan, args [rel#5828:ProjectRel.NONE.ANY([]).[](child=rel#5827:Subset#0.ENUMERABLE.ANY([]).[],a1=$0,a2=$1), rel#5822:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[hive, tab1_view])] at org.eigenbase.util.Util.newInternal(Util.java:750) ~[optiq-core-0.9-drill-r12.jar:na] at org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:246) ~[optiq-core-0.9-drill-r12.jar:na] at org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661) ~[optiq-core-0.9-drill-r12.jar:na] at net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) ~[optiq-core-0.9-drill-r12.jar:na] at net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:276) ~[optiq-core-0.9-drill-r12.jar:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:155)
[jira] [Commented] (DRILL-2087) Some planner option setting might produce a wrong plan and wrong query result for join query.
[ https://issues.apache.org/jira/browse/DRILL-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318512#comment-14318512 ] Jinfeng Ni commented on DRILL-2087: --- Talked to [~vkorukanti] about the conflict with Local Exchange. Did not find issue with both changes applied. Some planner option setting might produce a wrong plan and wrong query result for join query. - Key: DRILL-2087 URL: https://issues.apache.org/jira/browse/DRILL-2087 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Jacques Nadeau Fix For: 0.9.0 Attachments: 0001-DRILL-2087-Fix-an-incorrect-join-plan-by-ensure-eith.patch I have the following parquet data ( essentially tpch's nation/region tables, but break into multiple parts). {code} ls -l nation total 24 -rw-r--r--@ 1 jni staff 1376 Jan 27 16:47 01.parquet -rw-r--r--@ 1 jni staff 1400 Jan 27 16:47 02.parquet -rw-r--r--@ 1 jni staff 1279 Jan 27 16:47 03.parquet ls -l region total 24 -rw-r--r--@ 1 jni staff 564 Jan 27 16:54 01.parquet -rw-r--r--@ 1 jni staff 491 Jan 27 16:54 02.parquet -rw-r--r--@ 1 jni staff 506 Jan 27 16:54 03.parquet {code} With the default planner setting, the following query produce the correct result : {code} 0: jdbc:drill:zk=local select t1.n_nationkey from dfs.`/Users/jni/work/data/parquet/region` t2 join dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = t2.r_regionkey; SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. +-+ | n_nationkey | +-+ | 18 | | 12 | | 9 | | 8 | | 21 | | 19 | | 7 | | 6 | | 23 | | 22 | | 20 | | 13 | | 11 | | 10 | | 4 | | 0 | | 16 | | 15 | | 14 | | 5 | | 24 | | 17 | | 3 | | 2 | | 1 | +-+ 25 rows selected (2.057 seconds) {code} However, if I modify the planner option (`planner.slice_target` and `planner.join.row_count_estimate_factor` ) as following, then the same query would return 0 row, which is not correct. {code} 0: jdbc:drill:zk=local alter session set `planner.slice_target` = 10; +++ | ok | summary | +++ | true | planner.slice_target updated. | +++ 1 row selected (0.093 seconds) 0: jdbc:drill:zk=local alter session set `planner.join.row_count_estimate_factor` = 0.1; +++ | ok | summary | +++ | true | planner.join.row_count_estimate_factor updated. | +++ 1 row selected (0.041 seconds) 0: jdbc:drill:zk=local select t1.n_nationkey from dfs.`/Users/jni/work/data/parquet/region` t2 join dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = t2.r_regionkey; +-+ | n_nationkey | +-+ +-+ No rows selected (0.71 seconds) {code} If we look at explain plan result, after changing the planner option, we will see for the JOIN operator, the LEFT does not have any EXCHANGE operator, while the RIGHT has HashToRandomExchange operator. That seems to be not a valid plan. {code} 00-00Screen 00-01 UnionExchange 01-01Project(n_nationkey=[$2]) 01-02 HashJoin(condition=[=($1, $0)], joinType=[inner]) 01-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/region]], selectionRoot=/Users/jni/work/data/parquet/region, numFiles=1, columns=[`r_regionkey`]]]) 01-03HashToRandomExchange(dist0=[[$0]]) 02-01 Project(n_regionkey=[$1], n_nationkey=[$0]) 02-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/nation]], selectionRoot=/Users/jni/work/data/parquet/nation, numFiles=1, columns=[`n_regionkey`, `n_nationkey`]]]) {code} The cause of this problem seems to be that Drill will remove EXCHANGE operator under some conditions. That condition does not guarantee that a JOIN operator always either has EXCHANGE on both sides, or none. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2250) Parquet reader hit ExecutionSetupException when reading decimal columns created using CTAS from another parquet files
Jinfeng Ni created DRILL-2250: - Summary: Parquet reader hit ExecutionSetupException when reading decimal columns created using CTAS from another parquet files Key: DRILL-2250 URL: https://issues.apache.org/jira/browse/DRILL-2250 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet, Storage - Writer Reporter: Jinfeng Ni Assignee: Steven Phillips Not sure if this is related to DRILL-2249, since the error seems to be different. On today's master commit: select commit_id from sys.version; ++ | commit_id | ++ | 4ed0a8d68ec5ef344fb54ff7c9d857f7f3f153aa | ++ If I create a parquet file containing decimal types, using the following CTAS {code} create table my_dec_table as select cast(l_quantity as decimal(10,2)) dec1, cast(l_discount as decimal(10,2)) dec2, cast(l_tax as decimal(10,2)) as dec3 from cp.`tpch/lineitem.parquet` ; ++---+ | Fragment | Number of records written | ++---+ | 0_0| 60175 | ++---+ 1 row selected (1.71 seconds) {code} Then, if I want to query this new parquet file with : {code} select * from my_dec_table; RemoteRpcException: Failure while running fragment., org.apache.drill.exec.vector.Decimal18Vector cannot be cast to org.apache.drill.exec.vector.BigIntVector [ de164ae4-2d2b-4257-bafb-3ca5165678f9 on 10.250.0.8:31010 ] [ de164ae4-2d2b-4257-bafb-3ca5165678f9 on 10.250.0.8:31010 ] select dec1, dec2, dec3 from my_dec_table; Query failed: RemoteRpcException: Failure while running fragment., org.apache.drill.exec.vector.Decimal18Vector cannot be cast to org.apache.drill.exec.vector.BigIntVector [ 0f5b41c3-1ab3-43b4-9742-cea622d3f476 on 10.250.0.8:31010 ] [ 0f5b41c3-1ab3-43b4-9742-cea622d3f476 on 10.250.0.8:31010 ] {code} parquet-tools shows the following schema information for the generated parquet file: {code} message root { required int64 dec1 (DECIMAL(10,2)); required int64 dec2 (DECIMAL(10,2)); required int64 dec3 (DECIMAL(10,2)); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2250) Parquet reader hit ExecutionSetupException when reading decimal columns created using CTAS from another parquet files
[ https://issues.apache.org/jira/browse/DRILL-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321181#comment-14321181 ] Jinfeng Ni commented on DRILL-2250: --- Here is part of stack trace for the error: {code} org.apache.drill.common.exceptions.ExecutionSetupException: java.lang.ClassCastException: org.apache.drill.exec.vector.Decimal18Vector cannot be cast to org.apache.drill.exec.vector.BigIntVector at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.setup(ParquetRecordReader.java:325) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScanBatch.init(ScanBatch.java:99) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:156) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:56) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitSubScan(AbstractPhysicalVisitor.java:125) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.store.parquet.ParquetRowGroupScan.accept(ParquetRowGroupScan.java:107) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] .. {code} Parquet reader hit ExecutionSetupException when reading decimal columns created using CTAS from another parquet files - Key: DRILL-2250 URL: https://issues.apache.org/jira/browse/DRILL-2250 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet, Storage - Writer Reporter: Jinfeng Ni Assignee: Steven Phillips Not sure if this is related to DRILL-2249, since the error seems to be different. On today's master commit: select commit_id from sys.version; ++ | commit_id | ++ | 4ed0a8d68ec5ef344fb54ff7c9d857f7f3f153aa | ++ If I create a parquet file containing decimal types, using the following CTAS {code} create table my_dec_table as select cast(l_quantity as decimal(10,2)) dec1, cast(l_discount as decimal(10,2)) dec2, cast(l_tax as decimal(10,2)) as dec3 from cp.`tpch/lineitem.parquet` ; ++---+ | Fragment | Number of records written | ++---+ | 0_0| 60175 | ++---+ 1 row selected (1.71 seconds) {code} Then, if I want to query this new parquet file with : {code} select * from my_dec_table; RemoteRpcException: Failure while running fragment., org.apache.drill.exec.vector.Decimal18Vector cannot be cast to org.apache.drill.exec.vector.BigIntVector [ de164ae4-2d2b-4257-bafb-3ca5165678f9 on 10.250.0.8:31010 ] [ de164ae4-2d2b-4257-bafb-3ca5165678f9 on 10.250.0.8:31010 ] select dec1, dec2, dec3 from my_dec_table; Query failed: RemoteRpcException: Failure while running fragment., org.apache.drill.exec.vector.Decimal18Vector cannot be cast to org.apache.drill.exec.vector.BigIntVector [ 0f5b41c3-1ab3-43b4-9742-cea622d3f476 on 10.250.0.8:31010 ] [ 0f5b41c3-1ab3-43b4-9742-cea622d3f476 on 10.250.0.8:31010 ] {code} parquet-tools shows the following schema information for the generated parquet file: {code} message root { required int64 dec1 (DECIMAL(10,2)); required int64 dec2 (DECIMAL(10,2)); required int64 dec3 (DECIMAL(10,2)); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2087) Some planner option setting might produce a wrong plan and wrong query result for join query.
[ https://issues.apache.org/jira/browse/DRILL-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2087: -- Attachment: 0001-DRILL-2087-Fix-an-incorrect-join-plan-by-ensure-eith.patch Some planner option setting might produce a wrong plan and wrong query result for join query. - Key: DRILL-2087 URL: https://issues.apache.org/jira/browse/DRILL-2087 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Jinfeng Ni Attachments: 0001-DRILL-2087-Fix-an-incorrect-join-plan-by-ensure-eith.patch I have the following parquet data ( essentially tpch's nation/region tables, but break into multiple parts). {code} ls -l nation total 24 -rw-r--r--@ 1 jni staff 1376 Jan 27 16:47 01.parquet -rw-r--r--@ 1 jni staff 1400 Jan 27 16:47 02.parquet -rw-r--r--@ 1 jni staff 1279 Jan 27 16:47 03.parquet ls -l region total 24 -rw-r--r--@ 1 jni staff 564 Jan 27 16:54 01.parquet -rw-r--r--@ 1 jni staff 491 Jan 27 16:54 02.parquet -rw-r--r--@ 1 jni staff 506 Jan 27 16:54 03.parquet {code} With the default planner setting, the following query produce the correct result : {code} 0: jdbc:drill:zk=local select t1.n_nationkey from dfs.`/Users/jni/work/data/parquet/region` t2 join dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = t2.r_regionkey; SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. +-+ | n_nationkey | +-+ | 18 | | 12 | | 9 | | 8 | | 21 | | 19 | | 7 | | 6 | | 23 | | 22 | | 20 | | 13 | | 11 | | 10 | | 4 | | 0 | | 16 | | 15 | | 14 | | 5 | | 24 | | 17 | | 3 | | 2 | | 1 | +-+ 25 rows selected (2.057 seconds) {code} However, if I modify the planner option (`planner.slice_target` and `planner.join.row_count_estimate_factor` ) as following, then the same query would return 0 row, which is not correct. {code} 0: jdbc:drill:zk=local alter session set `planner.slice_target` = 10; +++ | ok | summary | +++ | true | planner.slice_target updated. | +++ 1 row selected (0.093 seconds) 0: jdbc:drill:zk=local alter session set `planner.join.row_count_estimate_factor` = 0.1; +++ | ok | summary | +++ | true | planner.join.row_count_estimate_factor updated. | +++ 1 row selected (0.041 seconds) 0: jdbc:drill:zk=local select t1.n_nationkey from dfs.`/Users/jni/work/data/parquet/region` t2 join dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = t2.r_regionkey; +-+ | n_nationkey | +-+ +-+ No rows selected (0.71 seconds) {code} If we look at explain plan result, after changing the planner option, we will see for the JOIN operator, the LEFT does not have any EXCHANGE operator, while the RIGHT has HashToRandomExchange operator. That seems to be not a valid plan. {code} 00-00Screen 00-01 UnionExchange 01-01Project(n_nationkey=[$2]) 01-02 HashJoin(condition=[=($1, $0)], joinType=[inner]) 01-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/region]], selectionRoot=/Users/jni/work/data/parquet/region, numFiles=1, columns=[`r_regionkey`]]]) 01-03HashToRandomExchange(dist0=[[$0]]) 02-01 Project(n_regionkey=[$1], n_nationkey=[$0]) 02-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/nation]], selectionRoot=/Users/jni/work/data/parquet/nation, numFiles=1, columns=[`n_regionkey`, `n_nationkey`]]]) {code} The cause of this problem seems to be that Drill will remove EXCHANGE operator under some conditions. That condition does not guarantee that a JOIN operator always either has EXCHANGE on both sides, or none. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-2165) Assert when grouping on expression with extract and concat functions
[ https://issues.apache.org/jira/browse/DRILL-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni reassigned DRILL-2165: - Assignee: Sean Hsuan-Yi Chu Assert when grouping on expression with extract and concat functions Key: DRILL-2165 URL: https://issues.apache.org/jira/browse/DRILL-2165 Project: Apache Drill Issue Type: Bug Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Sean Hsuan-Yi Chu Attachments: 0_0_0.parquet EXTRACT with date data type: {code} 0: jdbc:drill:schema=dfs select concat(extract(day from c_date), extract(month from c_date)) from t1 group by concat(extract(day from c_date), extract(month from c_date)); Query failed: AssertionError: Error: exception while executing query: Failure while executing query. (state=,code=0) {code} EXTRACT with timestamp data type {code} 0: jdbc:drill:schema=dfs select concat(extract(day from c_timestamp), extract(month from c_timestamp)) from t1 group by concat(extract(day from c_timestamp), extract(month from c_timestamp)); Query failed: AssertionError: Error: exception while executing query: Failure while executing query. (state=,code=0) {code} If you add aggregate function SUM, query runs: {code} select concat(extract(day from c_date), extract(month from c_date)), sum(c_integer) from t1 group by concat(extract(day from c_date), extract(month from c_date)); {code} drillbit.log {code} 2015-02-04 20:06:15,103 [2b2d8447-b333-5df4-f49a-b2111f875a67:foreman] ERROR o.a.drill.exec.work.foreman.Foreman - Error 8cff16d9-b34e-4ce3-a00b-18eaf3f64e66: AssertionError: org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Internal error: while converting `concat`(EXTRACT(DAY FROM `c_timestamp`), EXTRACT(MONTH FROM `c_timestamp`)) at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:197) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] Caused by: java.lang.AssertionError: Internal error: while converting `concat`(EXTRACT(DAY FROM `c_timestamp`), EXTRACT(MONTH FROM `c_timestamp`)) at org.eigenbase.util.Util.newInternal(Util.java:750) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.ReflectiveConvertletTable$2.convertCall(ReflectiveConvertletTable.java:149) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:52) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4099) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:1) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql.SqlCall.accept(SqlCall.java:125) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:3994) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:2402) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:2191) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:514) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:474) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2657) ~[optiq-core-0.9-drill-r18.jar:na] at org.eigenbase.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:432) ~[optiq-core-0.9-drill-r18.jar:na] at net.hydromatic.optiq.prepare.PlannerImpl.convert(PlannerImpl.java:186) ~[optiq-core-0.9-drill-r18.jar:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:149) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:126)
[jira] [Updated] (DRILL-2458) Extra hash column added when running CTAS with order by
[ https://issues.apache.org/jira/browse/DRILL-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2458: -- Fix Version/s: 0.9.0 Extra hash column added when running CTAS with order by --- Key: DRILL-2458 URL: https://issues.apache.org/jira/browse/DRILL-2458 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Jinfeng Ni Fix For: 0.9.0 Attachments: 0001-DRILL-2458-Ensure-no-unwanted-columns-are-added-to-C.patch I created the table via the command: {code:sql} create table dfs.tmp.lineitem_sort as select * from dfs.`/drill/SF10/lineitem` order by l_extendedprice desc {code} This resulted in an extra column when reading the data back: {code} 0: jdbc:drill: select * from `lineitem_sort/0_0_0.parquet` limit 1; +---++--++-+--+--++++---+--++++++--+ | E_X_P_R_H_A_S_H_F_I_E_L_D | L_COMMENT | L_COMMITDATE | L_DISCOUNT | L_EXTENDEDPRICE | L_LINENUMBER | L_LINESTATUS | L_ORDERKEY | L_PARTKEY | L_QUANTITY | L_RECEIPTDATE | L_RETURNFLAG | L_SHIPDATE | L_SHIPINSTRUCT | L_SHIPMODE | L_SUPPKEY | L_TAX| l_extendedprice0 | +---++--++-+--+--++++---+--++++++--+ | -1909175176 | [B@187a06b6 | [B@734ea347 | 0.02 | 104949.5| 2| [B@2fc1c575 | 16734176 | 19 | 50.0 | [B@5a8a9cd1 | [B@423d8bc7 | [B@56a3d7ca | [B@1eac3b36| [B@3d6365f5 | 50002 | 0.05 | 104949.5 | +---++--++-+--+--++++---+--++++++--+ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2458) Extra hash column added when running CTAS with order by
[ https://issues.apache.org/jira/browse/DRILL-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370069#comment-14370069 ] Jinfeng Ni commented on DRILL-2458: --- The extra column added seems not to be related to the code of LocalExchange; it could happen to any other regular column. I could re-produce similar issue as following. {code} create table dfs.tmp.region_sort as select *, r_regionkey + 1 from cp.`tpch/region.parquet` order by r_name; ++---+ | Fragment | Number of records written | ++---+ | 0_0| 5 | ++---+ 1 row selected (2.556 seconds) {code} Then, run query against the created parquet file. {code} 0: jdbc:drill:zk=local select * from dfs.tmp.region_sort; +-++++++ | r_regionkey | r_name | r_comment | EXPR$1 | r_name0 | EXPR$10 | +-++++++ | 0 | AFRICA | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to | 1 | AFRICA | 1 | | 1 | AMERICA| hs use ironic, even requests. s | 2 | AMERICA| 2 | | 2 | ASIA | ges. thinly even pinto beans ca | 3 | ASIA | 3 | | 3 | EUROPE | ly final courts cajole furiously final excuse | 4 | EUROPE | 4 | | 4 | MIDDLE EAST | uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl | 5 | MIDDLE EAST | 5 | +-++++++ 5 rows selected (0.108 seconds) {code} We can see two additional columns were added to the output parquet files : r_name0 | EXPR$10. The root cause of this problem is in the star column handling, where we should add prefix to the star, so that execution operator would be able to distinguish the regular columns expanded from the star column, from the column/expression referenced in the query. Extra hash column added when running CTAS with order by --- Key: DRILL-2458 URL: https://issues.apache.org/jira/browse/DRILL-2458 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Jinfeng Ni I created the table via the command: {code:sql} create table dfs.tmp.lineitem_sort as select * from dfs.`/drill/SF10/lineitem` order by l_extendedprice desc {code} This resulted in an extra column when reading the data back: {code} 0: jdbc:drill: select * from `lineitem_sort/0_0_0.parquet` limit 1; +---++--++-+--+--++++---+--++++++--+ | E_X_P_R_H_A_S_H_F_I_E_L_D | L_COMMENT | L_COMMITDATE | L_DISCOUNT | L_EXTENDEDPRICE | L_LINENUMBER | L_LINESTATUS | L_ORDERKEY | L_PARTKEY | L_QUANTITY | L_RECEIPTDATE | L_RETURNFLAG | L_SHIPDATE | L_SHIPINSTRUCT | L_SHIPMODE | L_SUPPKEY | L_TAX| l_extendedprice0 | +---++--++-+--+--++++---+--++++++--+ | -1909175176 | [B@187a06b6 | [B@734ea347 | 0.02 | 104949.5| 2| [B@2fc1c575 | 16734176 | 19 | 50.0 | [B@5a8a9cd1 | [B@423d8bc7 | [B@56a3d7ca | [B@1eac3b36| [B@3d6365f5 | 50002 | 0.05 | 104949.5 | +---++--++-+--+--++++---+--++++++--+ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2458) Extra hash column added when running CTAS with order by
[ https://issues.apache.org/jira/browse/DRILL-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2458: -- Attachment: 0001-DRILL-2458-Ensure-no-unwanted-columns-are-added-to-C.patch Extra hash column added when running CTAS with order by --- Key: DRILL-2458 URL: https://issues.apache.org/jira/browse/DRILL-2458 Project: Apache Drill Issue Type: Bug Reporter: Steven Phillips Assignee: Jinfeng Ni Attachments: 0001-DRILL-2458-Ensure-no-unwanted-columns-are-added-to-C.patch I created the table via the command: {code:sql} create table dfs.tmp.lineitem_sort as select * from dfs.`/drill/SF10/lineitem` order by l_extendedprice desc {code} This resulted in an extra column when reading the data back: {code} 0: jdbc:drill: select * from `lineitem_sort/0_0_0.parquet` limit 1; +---++--++-+--+--++++---+--++++++--+ | E_X_P_R_H_A_S_H_F_I_E_L_D | L_COMMENT | L_COMMITDATE | L_DISCOUNT | L_EXTENDEDPRICE | L_LINENUMBER | L_LINESTATUS | L_ORDERKEY | L_PARTKEY | L_QUANTITY | L_RECEIPTDATE | L_RETURNFLAG | L_SHIPDATE | L_SHIPINSTRUCT | L_SHIPMODE | L_SUPPKEY | L_TAX| l_extendedprice0 | +---++--++-+--+--++++---+--++++++--+ | -1909175176 | [B@187a06b6 | [B@734ea347 | 0.02 | 104949.5| 2| [B@2fc1c575 | 16734176 | 19 | 50.0 | [B@5a8a9cd1 | [B@423d8bc7 | [B@56a3d7ca | [B@1eac3b36| [B@3d6365f5 | 50002 | 0.05 | 104949.5 | +---++--++-+--+--++++---+--++++++--+ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2411) Scalar SUM/AVG over empty result set returns no rows instead of NULL
[ https://issues.apache.org/jira/browse/DRILL-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2411: -- Component/s: (was: Query Planning Optimization) Execution - Relational Operators Scalar SUM/AVG over empty result set returns no rows instead of NULL Key: DRILL-2411 URL: https://issues.apache.org/jira/browse/DRILL-2411 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Victoria Markman Assignee: Jinfeng Ni Queries below should return NULL: {code} 0: jdbc:drill:schema=dfs select sum(a2) from t2 where 1=0; ++ | EXPR$0 | ++ ++ No rows selected (0.08 seconds) 0: jdbc:drill:schema=dfs select avg(a2) from t2 where 1=0; ++ | EXPR$0 | ++ ++ No rows selected (0.074 seconds) {code} When grouped, result is correct: {code} 0: jdbc:drill:schema=dfs select a2, sum(a2) from t2 where 1=0 group by a2; +++ | a2 | EXPR$1 | +++ +++ No rows selected (0.11 seconds) {code} I'm not convinced and it is not very intuitive that correct result should be NULL, but this is what postgres returns and Aman thinks NULL is the correct behavior :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2411) Scalar SUM/AVG over empty result set returns no rows instead of NULL
[ https://issues.apache.org/jira/browse/DRILL-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2411: -- Assignee: Steven Phillips (was: Jinfeng Ni) Scalar SUM/AVG over empty result set returns no rows instead of NULL Key: DRILL-2411 URL: https://issues.apache.org/jira/browse/DRILL-2411 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Victoria Markman Assignee: Steven Phillips Queries below should return NULL: {code} 0: jdbc:drill:schema=dfs select sum(a2) from t2 where 1=0; ++ | EXPR$0 | ++ ++ No rows selected (0.08 seconds) 0: jdbc:drill:schema=dfs select avg(a2) from t2 where 1=0; ++ | EXPR$0 | ++ ++ No rows selected (0.074 seconds) {code} When grouped, result is correct: {code} 0: jdbc:drill:schema=dfs select a2, sum(a2) from t2 where 1=0 group by a2; +++ | a2 | EXPR$1 | +++ +++ No rows selected (0.11 seconds) {code} I'm not convinced and it is not very intuitive that correct result should be NULL, but this is what postgres returns and Aman thinks NULL is the correct behavior :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2411) Scalar SUM/AVG over empty result set returns no rows instead of NULL
[ https://issues.apache.org/jira/browse/DRILL-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353860#comment-14353860 ] Jinfeng Ni commented on DRILL-2411: --- The plan for the query seems to be fine, and I feel the issue comes from execution operator. {code} select sum(n_nationkey) from cp.`tpch/nation.parquet` where n_nationkey 0; {code} {code} 00-00Screen 00-01 StreamAgg(group=[{}], EXPR$0=[SUM($0)]) 00-02Filter(condition=[($0, 0)]) 00-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_nationkey`]]]) {code} Scalar SUM/AVG over empty result set returns no rows instead of NULL Key: DRILL-2411 URL: https://issues.apache.org/jira/browse/DRILL-2411 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Reporter: Victoria Markman Assignee: Steven Phillips Queries below should return NULL: {code} 0: jdbc:drill:schema=dfs select sum(a2) from t2 where 1=0; ++ | EXPR$0 | ++ ++ No rows selected (0.08 seconds) 0: jdbc:drill:schema=dfs select avg(a2) from t2 where 1=0; ++ | EXPR$0 | ++ ++ No rows selected (0.074 seconds) {code} When grouped, result is correct: {code} 0: jdbc:drill:schema=dfs select a2, sum(a2) from t2 where 1=0 group by a2; +++ | a2 | EXPR$1 | +++ +++ No rows selected (0.11 seconds) {code} I'm not convinced and it is not very intuitive that correct result should be NULL, but this is what postgres returns and Aman thinks NULL is the correct behavior :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-2313) Query fails when one of the operands is a DATE literal without an explicit cast
[ https://issues.apache.org/jira/browse/DRILL-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni reassigned DRILL-2313: - Assignee: Sean Hsuan-Yi Chu (was: Jinfeng Ni) Query fails when one of the operands is a DATE literal without an explicit cast --- Key: DRILL-2313 URL: https://issues.apache.org/jira/browse/DRILL-2313 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Sean Hsuan-Yi Chu For operations involving the date datatype, when one of the operands is a DATE literal without a cast, query fails. *The following query fails to validate:* {code:sql} SELECT * FROM date_dim WHEREd_date BETWEEN '2002-3-01' AND cast('2002-3-01' AS DATE) LIMIT 1; {code} Query failed: SqlValidatorException: Cannot apply 'BETWEEN' to arguments of type 'ANY BETWEEN CHAR(9) AND DATE'. Supported form(s): 'COMPARABLE_TYPE BETWEEN COMPARABLE_TYPE AND COMPARABLE_TYPE' *The following query executes fine:* {code:sql} SELECT * FROM date_dim WHEREd_date BETWEEN '2002-3-01' AND '2002-3-01' LIMIT 1; {code} Both the queries execute fine on postgres -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-2317) UNION ALL on sub-queries with single column projection fails on GROUP BY
[ https://issues.apache.org/jira/browse/DRILL-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni reassigned DRILL-2317: - Assignee: Sean Hsuan-Yi Chu (was: Jinfeng Ni) UNION ALL on sub-queries with single column projection fails on GROUP BY Key: DRILL-2317 URL: https://issues.apache.org/jira/browse/DRILL-2317 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Sean Hsuan-Yi Chu A UNION ALL on two sub-queries with only a single column projected, and GROUP BY on the whole query fails with an IndexOutOfBoundsException *The following query fails to execute:* {code:sql} SELECT c1 FROM ((SELECT ss_quantity AS c1 FROM store_sales limit 1) UNION ALL (SELECT ws_quantity AS c2 FROM web_sales limit 1) ) foo GROUP BY c1; {code} *The following query also fails to execute:* {code:sql} SELECT distinct(c) FROM ((SELECT ss_quantity AS c FROM store_sales limit 1) UNION ALL (SELECT ws_quantity AS c FROM web_sales limit 1) ) ; {code} Error: Query failed: RemoteRpcException: Failure while running fragment., index: 0, length: 1 (expected: range(0, 0)) LOG message indicates java.lang.IndexOutOfBoundsException *The following query executes fine:* {code:sql} SELECT c1 FROM ((SELECT ss_quantity AS c1, ss_item_sk c3 FROM store_sales limit 1) UNION ALL (SELECT ws_quantity AS c2, ws_item_sk c4 FROM web_sales limit 1) ) foo GROUP BY c1; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-2400) Drill's query planner did not merge multiple filters into one.
Jinfeng Ni created DRILL-2400: - Summary: Drill's query planner did not merge multiple filters into one. Key: DRILL-2400 URL: https://issues.apache.org/jira/browse/DRILL-2400 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Jinfeng Ni Let's say we have a view defined with a WHERE clause as the following: {code} create or replace view MyViewWithFilter as + SELECT first_name, + last_name, + full_name, + salary, + employee_id, + store_id, + position_id, + position_title, + education_level + FROM cp.`employee.json` + WHERE position_id in (1, 2, 3 ) {code} For the following query which refers to the above view in two places, one in out query, one in the subquery, each reference adds an additional filter, {code} select dat.store_id , sum(dat.store_cost) as total_cost from ( select store_id, position_id , sum( salary) as store_cost from MyViewWithFilter where full_name in ( select n_name from cp.`tpch/nation.parquet`) and education_level = 'GRADUATE DEGREE' and position_id in ( select position_id from MyViewWithFilter where position_title like '%VP%' ) group by store_id, position_id ) dat group by dat.store_id order by dat.store_id {code} The current plan will be like the following : {code} 00-00Screen 00-01 Project(store_id=[$0], total_cost=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], dir0=[ASC]) 00-04HashAgg(group=[{0}], total_cost=[SUM($1)]) 00-05 Project(store_id=[$0], store_cost=[$2]) 00-06HashAgg(group=[{0, 1}], store_cost=[SUM($2)]) 00-07 Project(store_id=[$1], position_id=[$2], $f3=[$0]) 00-08HashJoin(condition=[=($4, $5)], joinType=[inner]) 00-10 Project($f3=[$0], $f5=[$1], $f6=[$2], $f8=[$3], $f11=[$2]) 00-12HashJoin(condition=[=($4, $5)], joinType=[inner]) 00-15 SelectionVectorRemover 00-18Filter(condition=[=($3, 'GRADUATE DEGREE')]) 00-20 Project($f3=[CAST($2):ANY NOT NULL], $f5=[CAST($3):ANY NOT NULL], $f6=[CAST($0):ANY NOT NULL], $f8=[CAST($4):ANY NOT NULL], $f9=[CAST($1):ANY NOT NULL]) 00-22SelectionVectorRemover 00-23 Filter(condition=[OR(=($0, 1), =($0, 2), =($0, 3))]) 00-24Project(position_id=[$0], full_name=[$4], salary=[$1], store_id=[$2], education_level=[$3]) 00-25 Scan(groupscan=[EasyGroupScan [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, `full_name`, `salary`, `store_id`, `education_level`], files=[/employee.json]]]) 00-14 HashAgg(group=[{0}]) 00-17Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]]) 00-09 HashAgg(group=[{0}]) 00-11Project(position_id=[CAST($0):ANY NOT NULL]) 00-13 SelectionVectorRemover 00-16Filter(condition=[LIKE(CAST($1):ANY NOT NULL, '%VP%')]) 00-19 Filter(condition=[OR(=($0, 1), =($0, 2), =($0, 3))]) 00-21Scan(groupscan=[EasyGroupScan [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, `position_title`], files=[/employee.json]]]) {code} In the plan, the View's own filter and the additional filter were not merged, and we could see either in the out query or subquery, there are two filters, like : {code} 00-16Filter(condition=[LIKE(CAST($1):ANY NOT NULL, '%VP%')]) 00-19 Filter(condition=[OR(=($0, 1), =($0, 2), =($0, 3))]) {code} It is desired to have filter merged, because 1) it could improve the possibility of partition pruning with the merged filter, since the partition pruning will match a filter directly on top of SCAN. Filter (dir0 = ' ') \ Filter ( colA = '...') \ SCAN If the filter with (dir0) is not merged with the second filter, then partition pruning will not kick in 2) Merged filter would allow short circuit evaluation, which could mean better performance, compared with the case
[jira] [Commented] (DRILL-2400) Drill's query planner did not merge multiple filters into one.
[ https://issues.apache.org/jira/browse/DRILL-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350970#comment-14350970 ] Jinfeng Ni commented on DRILL-2400: --- [~amansinha100], could you please review the patch for DRILL-2240, in particular, the first part where the estimation of filter's cpu cost is modified? Thanks. Drill's query planner did not merge multiple filters into one. --- Key: DRILL-2400 URL: https://issues.apache.org/jira/browse/DRILL-2400 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Jinfeng Ni Attachments: 0005-DRILL-2240-Part-1-Change-cpu-cost-estimation-formula.patch, 0006-DRILL-2240-Part-2-Add-merge-filter-rule-to-logical-r.patch Let's say we have a view defined with a WHERE clause as the following: {code} create or replace view MyViewWithFilter as + SELECT first_name, + last_name, + full_name, + salary, + employee_id, + store_id, + position_id, + position_title, + education_level + FROM cp.`employee.json` + WHERE position_id in (1, 2, 3 ) {code} For the following query which refers to the above view in two places, one in out query, one in the subquery, each reference adds an additional filter, {code} select dat.store_id , sum(dat.store_cost) as total_cost from ( select store_id, position_id , sum( salary) as store_cost from MyViewWithFilter where full_name in ( select n_name from cp.`tpch/nation.parquet`) and education_level = 'GRADUATE DEGREE' and position_id in ( select position_id from MyViewWithFilter where position_title like '%VP%' ) group by store_id, position_id ) dat group by dat.store_id order by dat.store_id {code} The current plan will be like the following : {code} 00-00Screen 00-01 Project(store_id=[$0], total_cost=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], dir0=[ASC]) 00-04HashAgg(group=[{0}], total_cost=[SUM($1)]) 00-05 Project(store_id=[$0], store_cost=[$2]) 00-06HashAgg(group=[{0, 1}], store_cost=[SUM($2)]) 00-07 Project(store_id=[$1], position_id=[$2], $f3=[$0]) 00-08HashJoin(condition=[=($4, $5)], joinType=[inner]) 00-10 Project($f3=[$0], $f5=[$1], $f6=[$2], $f8=[$3], $f11=[$2]) 00-12HashJoin(condition=[=($4, $5)], joinType=[inner]) 00-15 SelectionVectorRemover 00-18Filter(condition=[=($3, 'GRADUATE DEGREE')]) 00-20 Project($f3=[CAST($2):ANY NOT NULL], $f5=[CAST($3):ANY NOT NULL], $f6=[CAST($0):ANY NOT NULL], $f8=[CAST($4):ANY NOT NULL], $f9=[CAST($1):ANY NOT NULL]) 00-22SelectionVectorRemover 00-23 Filter(condition=[OR(=($0, 1), =($0, 2), =($0, 3))]) 00-24Project(position_id=[$0], full_name=[$4], salary=[$1], store_id=[$2], education_level=[$3]) 00-25 Scan(groupscan=[EasyGroupScan [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, `full_name`, `salary`, `store_id`, `education_level`], files=[/employee.json]]]) 00-14 HashAgg(group=[{0}]) 00-17Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]]) 00-09 HashAgg(group=[{0}]) 00-11Project(position_id=[CAST($0):ANY NOT NULL]) 00-13 SelectionVectorRemover 00-16Filter(condition=[LIKE(CAST($1):ANY NOT NULL, '%VP%')]) 00-19 Filter(condition=[OR(=($0, 1), =($0, 2), =($0, 3))]) 00-21Scan(groupscan=[EasyGroupScan [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, `position_title`], files=[/employee.json]]]) {code} In the plan, the View's own filter and the additional filter were not merged, and we could see either in the out query or subquery, there are two filters, like : {code} 00-16Filter(condition=[LIKE(CAST($1):ANY NOT NULL,
[jira] [Updated] (DRILL-2400) Drill's query planner did not merge multiple filters into one.
[ https://issues.apache.org/jira/browse/DRILL-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2400: -- Attachment: 0006-DRILL-2240-Part-2-Add-merge-filter-rule-to-logical-r.patch 0005-DRILL-2240-Part-1-Change-cpu-cost-estimation-formula.patch Drill's query planner did not merge multiple filters into one. --- Key: DRILL-2400 URL: https://issues.apache.org/jira/browse/DRILL-2400 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Jinfeng Ni Attachments: 0005-DRILL-2240-Part-1-Change-cpu-cost-estimation-formula.patch, 0006-DRILL-2240-Part-2-Add-merge-filter-rule-to-logical-r.patch Let's say we have a view defined with a WHERE clause as the following: {code} create or replace view MyViewWithFilter as + SELECT first_name, + last_name, + full_name, + salary, + employee_id, + store_id, + position_id, + position_title, + education_level + FROM cp.`employee.json` + WHERE position_id in (1, 2, 3 ) {code} For the following query which refers to the above view in two places, one in out query, one in the subquery, each reference adds an additional filter, {code} select dat.store_id , sum(dat.store_cost) as total_cost from ( select store_id, position_id , sum( salary) as store_cost from MyViewWithFilter where full_name in ( select n_name from cp.`tpch/nation.parquet`) and education_level = 'GRADUATE DEGREE' and position_id in ( select position_id from MyViewWithFilter where position_title like '%VP%' ) group by store_id, position_id ) dat group by dat.store_id order by dat.store_id {code} The current plan will be like the following : {code} 00-00Screen 00-01 Project(store_id=[$0], total_cost=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], dir0=[ASC]) 00-04HashAgg(group=[{0}], total_cost=[SUM($1)]) 00-05 Project(store_id=[$0], store_cost=[$2]) 00-06HashAgg(group=[{0, 1}], store_cost=[SUM($2)]) 00-07 Project(store_id=[$1], position_id=[$2], $f3=[$0]) 00-08HashJoin(condition=[=($4, $5)], joinType=[inner]) 00-10 Project($f3=[$0], $f5=[$1], $f6=[$2], $f8=[$3], $f11=[$2]) 00-12HashJoin(condition=[=($4, $5)], joinType=[inner]) 00-15 SelectionVectorRemover 00-18Filter(condition=[=($3, 'GRADUATE DEGREE')]) 00-20 Project($f3=[CAST($2):ANY NOT NULL], $f5=[CAST($3):ANY NOT NULL], $f6=[CAST($0):ANY NOT NULL], $f8=[CAST($4):ANY NOT NULL], $f9=[CAST($1):ANY NOT NULL]) 00-22SelectionVectorRemover 00-23 Filter(condition=[OR(=($0, 1), =($0, 2), =($0, 3))]) 00-24Project(position_id=[$0], full_name=[$4], salary=[$1], store_id=[$2], education_level=[$3]) 00-25 Scan(groupscan=[EasyGroupScan [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, `full_name`, `salary`, `store_id`, `education_level`], files=[/employee.json]]]) 00-14 HashAgg(group=[{0}]) 00-17Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]]) 00-09 HashAgg(group=[{0}]) 00-11Project(position_id=[CAST($0):ANY NOT NULL]) 00-13 SelectionVectorRemover 00-16Filter(condition=[LIKE(CAST($1):ANY NOT NULL, '%VP%')]) 00-19 Filter(condition=[OR(=($0, 1), =($0, 2), =($0, 3))]) 00-21Scan(groupscan=[EasyGroupScan [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, `position_title`], files=[/employee.json]]]) {code} In the plan, the View's own filter and the additional filter were not merged, and we could see either in the out query or subquery, there are two filters, like : {code} 00-16Filter(condition=[LIKE(CAST($1):ANY NOT NULL, '%VP%')]) 00-19
[jira] [Commented] (DRILL-2400) Drill's query planner did not merge multiple filters into one.
[ https://issues.apache.org/jira/browse/DRILL-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351813#comment-14351813 ] Jinfeng Ni commented on DRILL-2400: --- With the patch, here is the new plan, where the two filters are merged together. {code} 00-00Screen 00-01 Project(store_id=[$0], total_cost=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], dir0=[ASC]) 00-04HashAgg(group=[{0}], total_cost=[SUM($1)]) 00-05 Project(store_id=[$0], store_cost=[$2]) 00-06HashAgg(group=[{0, 1}], store_cost=[SUM($2)]) 00-07 Project(store_id=[$1], position_id=[$2], $f3=[$0]) 00-08HashJoin(condition=[=($3, $4)], joinType=[inner]) 00-10 Project($f3=[$0], $f5=[$1], $f6=[$2], $f11=[$2]) 00-12HashJoin(condition=[=($3, $4)], joinType=[inner]) 00-15 Project($f3=[CAST($2):ANY NOT NULL], $f5=[CAST($3):ANY NOT NULL], $f6=[CAST($0):ANY NOT NULL], $f9=[CAST($1):ANY NOT NULL]) 00-18SelectionVectorRemover 00-20 Filter(condition=[AND(OR(=($0, 1), =($0, 2), =($0, 3)), =(CAST($4):ANY NOT NULL, 'GRADUATE DEGREE'))]) 00-21Project(position_id=[$0], full_name=[$4], salary=[$1], store_id=[$2], education_level=[$3]) 00-22 Scan(groupscan=[EasyGroupScan [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, `full_name`, `salary`, `store_id`, `education_level`], files=[/employee.json]]]) 00-14 HashAgg(group=[{0}]) 00-17Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]]) 00-09 HashAgg(group=[{0}]) 00-11Project(position_id=[CAST($0):ANY NOT NULL]) 00-13 SelectionVectorRemover 00-16Filter(condition=[AND(OR(=($0, 1), =($0, 2), =($0, 3)), LIKE(CAST($1):ANY NOT NULL, '%VP%'))]) 00-19 Scan(groupscan=[EasyGroupScan [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, `position_title`], files=[/employee.json]]]) {code} Drill's query planner did not merge multiple filters into one. --- Key: DRILL-2400 URL: https://issues.apache.org/jira/browse/DRILL-2400 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Jinfeng Ni Attachments: 0005-DRILL-2240-Part-1-Change-cpu-cost-estimation-formula.patch, 0006-DRILL-2240-Part-2-Add-merge-filter-rule-to-logical-r.patch Let's say we have a view defined with a WHERE clause as the following: {code} create or replace view MyViewWithFilter as + SELECT first_name, + last_name, + full_name, + salary, + employee_id, + store_id, + position_id, + position_title, + education_level + FROM cp.`employee.json` + WHERE position_id in (1, 2, 3 ) {code} For the following query which refers to the above view in two places, one in out query, one in the subquery, each reference adds an additional filter, {code} select dat.store_id , sum(dat.store_cost) as total_cost from ( select store_id, position_id , sum( salary) as store_cost from MyViewWithFilter where full_name in ( select n_name from cp.`tpch/nation.parquet`) and education_level = 'GRADUATE DEGREE' and position_id in ( select position_id from MyViewWithFilter where position_title like '%VP%' ) group by store_id, position_id ) dat group by dat.store_id order by dat.store_id {code} The current plan will be like the following : {code} 00-00Screen 00-01 Project(store_id=[$0], total_cost=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], dir0=[ASC]) 00-04HashAgg(group=[{0}], total_cost=[SUM($1)]) 00-05 Project(store_id=[$0], store_cost=[$2]) 00-06HashAgg(group=[{0, 1}], store_cost=[SUM($2)]) 00-07 Project(store_id=[$1], position_id=[$2], $f3=[$0]) 00-08HashJoin(condition=[=($4, $5)], joinType=[inner]) 00-10 Project($f3=[$0], $f5=[$1], $f6=[$2], $f8=[$3],
[jira] [Resolved] (DRILL-2400) Drill's query planner did not merge multiple filters into one.
[ https://issues.apache.org/jira/browse/DRILL-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni resolved DRILL-2400. --- Resolution: Fixed Fixed in commit 6be9e3b and 6e6f789 Drill's query planner did not merge multiple filters into one. --- Key: DRILL-2400 URL: https://issues.apache.org/jira/browse/DRILL-2400 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Jinfeng Ni Attachments: 0005-DRILL-2240-Part-1-Change-cpu-cost-estimation-formula.patch, 0006-DRILL-2240-Part-2-Add-merge-filter-rule-to-logical-r.patch Let's say we have a view defined with a WHERE clause as the following: {code} create or replace view MyViewWithFilter as + SELECT first_name, + last_name, + full_name, + salary, + employee_id, + store_id, + position_id, + position_title, + education_level + FROM cp.`employee.json` + WHERE position_id in (1, 2, 3 ) {code} For the following query which refers to the above view in two places, one in out query, one in the subquery, each reference adds an additional filter, {code} select dat.store_id , sum(dat.store_cost) as total_cost from ( select store_id, position_id , sum( salary) as store_cost from MyViewWithFilter where full_name in ( select n_name from cp.`tpch/nation.parquet`) and education_level = 'GRADUATE DEGREE' and position_id in ( select position_id from MyViewWithFilter where position_title like '%VP%' ) group by store_id, position_id ) dat group by dat.store_id order by dat.store_id {code} The current plan will be like the following : {code} 00-00Screen 00-01 Project(store_id=[$0], total_cost=[$1]) 00-02SelectionVectorRemover 00-03 Sort(sort0=[$0], dir0=[ASC]) 00-04HashAgg(group=[{0}], total_cost=[SUM($1)]) 00-05 Project(store_id=[$0], store_cost=[$2]) 00-06HashAgg(group=[{0, 1}], store_cost=[SUM($2)]) 00-07 Project(store_id=[$1], position_id=[$2], $f3=[$0]) 00-08HashJoin(condition=[=($4, $5)], joinType=[inner]) 00-10 Project($f3=[$0], $f5=[$1], $f6=[$2], $f8=[$3], $f11=[$2]) 00-12HashJoin(condition=[=($4, $5)], joinType=[inner]) 00-15 SelectionVectorRemover 00-18Filter(condition=[=($3, 'GRADUATE DEGREE')]) 00-20 Project($f3=[CAST($2):ANY NOT NULL], $f5=[CAST($3):ANY NOT NULL], $f6=[CAST($0):ANY NOT NULL], $f8=[CAST($4):ANY NOT NULL], $f9=[CAST($1):ANY NOT NULL]) 00-22SelectionVectorRemover 00-23 Filter(condition=[OR(=($0, 1), =($0, 2), =($0, 3))]) 00-24Project(position_id=[$0], full_name=[$4], salary=[$1], store_id=[$2], education_level=[$3]) 00-25 Scan(groupscan=[EasyGroupScan [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, `full_name`, `salary`, `store_id`, `education_level`], files=[/employee.json]]]) 00-14 HashAgg(group=[{0}]) 00-17Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]]) 00-09 HashAgg(group=[{0}]) 00-11Project(position_id=[CAST($0):ANY NOT NULL]) 00-13 SelectionVectorRemover 00-16Filter(condition=[LIKE(CAST($1):ANY NOT NULL, '%VP%')]) 00-19 Filter(condition=[OR(=($0, 1), =($0, 2), =($0, 3))]) 00-21Scan(groupscan=[EasyGroupScan [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, `position_title`], files=[/employee.json]]]) {code} In the plan, the View's own filter and the additional filter were not merged, and we could see either in the out query or subquery, there are two filters, like : {code} 00-16Filter(condition=[LIKE(CAST($1):ANY NOT NULL, '%VP%')]) 00-19 Filter(condition=[OR(=($0, 1), =($0, 2), =($0, 3))]) {code} It is desired to have filter merged, because 1)
[jira] [Updated] (DRILL-2311) Create table with same columns of different case results in a java.lang.IllegalStateException
[ https://issues.apache.org/jira/browse/DRILL-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2311: -- Assignee: Sean Hsuan-Yi Chu (was: Aman Sinha) Create table with same columns of different case results in a java.lang.IllegalStateException - Key: DRILL-2311 URL: https://issues.apache.org/jira/browse/DRILL-2311 Project: Apache Drill Issue Type: Bug Components: SQL Parser Affects Versions: 0.8.0 Reporter: Ramana Inukonda Nagaraj Assignee: Sean Hsuan-Yi Chu Doing a create table with same column in different case results in a runtime exception. This query should fail at planning or parsing. CREATE TABLE drill_parquet_mulCaseColumns3 as select cast( BIGINT_col as bigint) BIGINT_col,cast( DECIMAL9_col as decimal) bigint_col FROM dfs.`/user/root/alltypes.json`; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2329) TPCDS query 95 and simplified variant fail to execute
[ https://issues.apache.org/jira/browse/DRILL-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2329: -- Fix Version/s: 1.1.0 TPCDS query 95 and simplified variant fail to execute - Key: DRILL-2329 URL: https://issues.apache.org/jira/browse/DRILL-2329 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Jinfeng Ni Fix For: 1.1.0 Attachments: drillbit.log, query95_stream0_sf1.sql TPCDS query 95 (attached) fails to validate. *A simplified variant of the query (may not have much semantics) that fails:* {code:sql} WITH abc AS ( SELECT * FROM web_sales) SELECT Count(DISTINCT ws_order_number) AS a1 , Sum(ws_ext_ship_cost) AS a2 FROM web_sales ws WHERE ws.ws_ship_addr_sk = 1 AND ws.ws_web_site_sk = web_site_sk ORDER BY count(DISTINCT ws_order_number) LIMIT 100; {code} *Error:* Query failed: SqlValidatorException: Aggregate expression is illegal in ORDER BY clause of non-aggregating SELECT -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2346) Star is not expanded correctly in create view if view fields are specified
[ https://issues.apache.org/jira/browse/DRILL-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2346: -- Fix Version/s: (was: 0.9.0) 1.1.0 Star is not expanded correctly in create view if view fields are specified -- Key: DRILL-2346 URL: https://issues.apache.org/jira/browse/DRILL-2346 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Minor Fix For: 1.1.0 {code} 0: jdbc:drill:schema=dfs select * from t2 limit 1; ++++ | a2 | b2 | c2 | ++++ | 0 | zzz| 2014-12-31 | ++++ 1 row selected (0.065 seconds) {code} We get an error, even though table t2 has 3 columns: {code} 0: jdbc:drill:schema=dfs create view v2(a2, b2, c2) as select * from t2; +++ | ok | summary | +++ | false | View's field list and View's query field list have different counts. | +++ 1 row selected (0.038 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2346) Star is not expanded correctly in create view if view fields are specified
[ https://issues.apache.org/jira/browse/DRILL-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2346: -- Fix Version/s: 0.9.0 Star is not expanded correctly in create view if view fields are specified -- Key: DRILL-2346 URL: https://issues.apache.org/jira/browse/DRILL-2346 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni Priority: Minor Fix For: 0.9.0 {code} 0: jdbc:drill:schema=dfs select * from t2 limit 1; ++++ | a2 | b2 | c2 | ++++ | 0 | zzz| 2014-12-31 | ++++ 1 row selected (0.065 seconds) {code} We get an error, even though table t2 has 3 columns: {code} 0: jdbc:drill:schema=dfs create view v2(a2, b2, c2) as select * from t2; +++ | ok | summary | +++ | false | View's field list and View's query field list have different counts. | +++ 1 row selected (0.038 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2398) IS NOT DISTINCT FROM predicate returns incorrect result when used as a join filter
[ https://issues.apache.org/jira/browse/DRILL-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2398: -- Assignee: Aman Sinha (was: Jinfeng Ni) IS NOT DISTINCT FROM predicate returns incorrect result when used as a join filter -- Key: DRILL-2398 URL: https://issues.apache.org/jira/browse/DRILL-2398 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Victoria Markman Assignee: Aman Sinha Priority: Critical Fix For: 0.9.0 Attachments: j1.parquet, j2.parquet count(*) should return 0 and not NULL {code} 0: jdbc:drill:schema=dfs select . . . . . . . . . . . . count(*) . . . . . . . . . . . . from . . . . . . . . . . . . j1 INNER JOIN j2 ON . . . . . . . . . . . . ( j1.c_double = j2.c_double) . . . . . . . . . . . . where . . . . . . . . . . . . j1.c_bigint IS NOT DISTINCT FROM j2.c_bigint . . . . . . . . . . . . ; ++ | EXPR$0 | ++ ++ {code} These are the values in the table {code} 0: jdbc:drill:schema=dfs select j1.c_bigint, j2.c_bigint, count(*) from j1 INNER JOIN j2 ON (j1.c_double = j2.c_double) group by j1.c_bigint, j2.c_bigint; ++++ | c_bigint | c_bigint1 | EXPR$1 | ++++ | 460194667 | -498749284 | 1 | | 464547172 | -498828740 | 1 | | 467451850 | -498966611 | 2 | | 471050029 | -499154096 | 3 | | 472873799 | -499233550 | 3 | | 475698977 | -499395929 | 2 | | 478986584 | -499564607 | 1 | | 488139464 | -499763274 | 3 | | 498214699 | -499871720 | 2 | ++++ 9 rows selected (0.339 seconds) {code} IS DISTINCT FROM predicate returns correct result {code} select count(*) from j1 INNER JOIN j2 ON ( j1.c_double = j2.c_double) where j1.c_bigint IS DISTINCT FROM j2.c_bigint {code} Explain plan for query that returns incorrect result: {code} 00-01 StreamAgg(group=[{}], EXPR$0=[COUNT()]) 00-02Project($f0=[0]) 00-03 SelectionVectorRemover 00-04Filter(condition=[CAST(CASE(IS NULL($1), IS NULL($3), IS NULL($3), IS NULL($1), =($1, $3))):BOOLEAN NOT NULL]) 00-05 HashJoin(condition=[=($0, $2)], joinType=[inner]) 00-07Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/joins/j1]], selectionRoot=/joins/j1, numFiles=1, columns=[`c_double`, `c_bigint`]]]) 00-06Project(c_double0=[$0], c_bigint0=[$1]) 00-08 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/joins/j2]], selectionRoot=/joins/j2, numFiles=1, columns=[`c_double`, `c_bigint`]]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2318) Query fails when an ORDER BY clause is used with CTEs
[ https://issues.apache.org/jira/browse/DRILL-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni updated DRILL-2318: -- Fix Version/s: 0.9.0 Query fails when an ORDER BY clause is used with CTEs -- Key: DRILL-2318 URL: https://issues.apache.org/jira/browse/DRILL-2318 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 0.8.0 Reporter: Abhishek Girish Assignee: Jinfeng Ni Fix For: 0.9.0 Attachments: drillbit.log Adding a WITH clause with a simple CTE causes a query with an ORDER BY to fail. This happens even when the CTE is unrelated to the main query. *The following query fails to execute:* {code:sql} WITH x AS (SELECT ss_sold_date_sk a1 FROM store_sales) SELECT x.a1 FROM x ORDER BY x.a1; {code} Error: Query failed: SqlValidatorException: Table 'x' not found Log attached. *The following query executes fine:* {code:sql} WITH x AS (SELECT ss_sold_date_sk a1 FROM store_sales) SELECT x.a1 FROM x {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)