from:"Jinfeng Ni \(JIRA\)"

[jira] [Assigned] (DRILL-1867) Case sensitivity of query identifiers is inconstistent

2014-12-16 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-1867:
-

Assignee: Jinfeng Ni

 Case sensitivity of query identifiers is inconstistent
 --

 Key: DRILL-1867
 URL: https://issues.apache.org/jira/browse/DRILL-1867
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Affects Versions: 0.7.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni

 git.commit.id.abbrev=c65928f
 In this  query it looks like identifies are case insensitive.
 0: jdbc:drill:schema=dfs select Eventdate from `test.json` where eventdate 
 is not null;
 ++
 | Eventdate  |
 ++
 | 2014-01-01 |
 | 2014-01-01 |
 | 2014-02-01 |
 ++
 3 rows selected (0.107 seconds)
 However, in subquery, they suddenly change the behavior and become case 
 sensitive.
 0: jdbc:drill:schema=dfs select EVENTDATE from ( select Eventdate from 
 `test.json` where eventdate is not null );
 Query failed: Query failed: Failure validating SQL. 
 org.eigenbase.util.EigenbaseContextException: From line 1, column 8 to line 
 1, column 16: Column 'EVENTDATE' not found in any table
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 This happens not only when querying unstructured data, but Drill tables as 
 well:
 0: jdbc:drill:schema=dfs select Customerid from test where customerid = 100;
 ++
 | Customerid |
 ++
 | 100|
 ++



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-1913) Drill did not correctly preserve output field's case sensitivity

2014-12-30 Thread Jinfeng Ni (JIRA)

Jinfeng Ni created DRILL-1913:
-

 Summary: Drill did not correctly preserve output field's case 
sensitivity
 Key: DRILL-1913
 URL: https://issues.apache.org/jira/browse/DRILL-1913
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jinfeng Ni
Priority: Minor


For the following query,

{code}
select EMPLOYEE_ID from cp.`employee.json` limit 2;
+-+
| employee_id |
+-+
| 1   |
| 2   |
+-+
{code}

Even though upper-case EMPLOYEE_ID matches the field 'employee_id' in the input 
JSON file, Drill did not preset the upper-case in the output result, which 
seems to be not right.

The plan coming out of optiq/calcite seems right, as it has a project which 
will do the renaming. But Drill's logical planning will remove such project, 
hence not being able to preserve the upper-case in the output.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1867) Case sensitivity of query identifiers is inconstistent

2014-12-30 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261787#comment-14261787
 ] 

Jinfeng Ni commented on DRILL-1867:
---

Add test case for parquet / cdv data sources.

Open two new JIRA to track existing issues related to case sensitivity handling 
: DRILL-1911 and DRILL-1913.  In particular, DRILL-1911 is the memory leak and 
correctness issue  when projection of same column with different case 
sensitivities. 



 Case sensitivity of query identifiers is inconstistent
 --

 Key: DRILL-1867
 URL: https://issues.apache.org/jira/browse/DRILL-1867
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Affects Versions: 0.7.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
 Attachments: 
 0001-DRILL-1867-Fix-case-insensitive-matching-for-subquer.patch


 git.commit.id.abbrev=c65928f
 In this  query it looks like identifies are case insensitive.
 0: jdbc:drill:schema=dfs select Eventdate from `test.json` where eventdate 
 is not null;
 ++
 | Eventdate  |
 ++
 | 2014-01-01 |
 | 2014-01-01 |
 | 2014-02-01 |
 ++
 3 rows selected (0.107 seconds)
 However, in subquery, they suddenly change the behavior and become case 
 sensitive.
 0: jdbc:drill:schema=dfs select EVENTDATE from ( select Eventdate from 
 `test.json` where eventdate is not null );
 Query failed: Query failed: Failure validating SQL. 
 org.eigenbase.util.EigenbaseContextException: From line 1, column 8 to line 
 1, column 16: Column 'EVENTDATE' not found in any table
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 This happens not only when querying unstructured data, but Drill tables as 
 well:
 0: jdbc:drill:schema=dfs select Customerid from test where customerid = 100;
 ++
 | Customerid |
 ++
 | 100|
 ++



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1905) UNION ALL of two identical json files fails with exception

2014-12-31 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262460#comment-14262460
 ] 

Jinfeng Ni commented on DRILL-1905:
---

+1.

Looks good to me.

For the change in Calcite, do we need to make sure 'ANY' is NULLable? Seems 
there is a recent issue where 'ANY' type is not treated as Nullable.

 

 UNION ALL of two identical json files fails with exception
 --

 Key: DRILL-1905
 URL: https://issues.apache.org/jira/browse/DRILL-1905
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.7.0
Reporter: Victoria Markman
Assignee: Aman Sinha
 Fix For: 0.8.0

 Attachments: 
 0001-Create-result-type-for-ANY-in-leastRestrictiveSqlTyp.patch, 
 0001-DRILL-1905-There-are-3-parts-of-the-fix-1-Check-comp.patch


 {code} git.commit.id.abbrev=e3ab2c1 {code}
 {code}
 0: jdbc:drill:schema=dfs select * from `logs/2014/02/t.json`;
 ++++
 | a1 | b1 | c1 |
 ++++
 | 0  | 0  | true   |
 | 0  | 0  | false  |
 | 0  | 0  | false  |
 | 1  | 1  | true   |
 | 1  | 1  | true   |
 ++++
 5 rows selected (0.099 seconds)
 0: jdbc:drill:schema=dfs select * from `logs/2014/03/t.json`;
 ++++
 | a1 | b1 | c1 |
 ++++
 | 0  | 0  | true   |
 | 0  | 0  | false  |
 | 0  | 0  | false  |
 | 1  | 1  | true   |
 | 1  | 1  | true   |
 ++++
 5 rows selected (0.065 seconds)
 0: jdbc:drill:schema=dfs select * from `logs/2014/02/t.json` union all 
 select * from `logs/2014/03/t.json`;
 Query failed: Query failed: Unexpected exception during fragment 
 initialization: DrillUnionRel#1680
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 2014-12-20 01:27:01,939 [2b6b2f9a-7e4e-1670-5210-c2a8dabea33b:foreman] ERROR 
 o.a.drill.exec.work.foreman.Foreman - Error 
 4505ba6b-a2b6-4355-aac1-1edd74bc347d: Query failed: Unexpected exception 
 during fragment initialization: DrillUnionRel#1680
 org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception 
 during fragment initialization: DrillUnionRel#1680
 at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:194) 
 [drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT]
 at 
 org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254)
  [drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_71]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_71]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
 Caused by: java.lang.AssertionError: DrillUnionRel#1680
 at 
 org.eigenbase.rel.AbstractRelNode.getRowType(AbstractRelNode.java:211) 
 ~[optiq-core-0.9-drill-r12.jar:na]
 at 
 org.apache.drill.exec.planner.common.DrillUnionRelBase.isCompatible(DrillUnionRelBase.java:44)
  ~[drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT]
 at 
 org.apache.drill.exec.planner.common.DrillUnionRelBase.init(DrillUnionRelBase.java:38)
  ~[drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT]
 at 
 org.apache.drill.exec.planner.logical.DrillUnionRel.init(DrillUnionRel.java:42)
  ~[drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT]
 at 
 org.apache.drill.exec.planner.logical.DrillUnionRel.copy(DrillUnionRel.java:49)
  ~[drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT]
 at 
 org.apache.drill.exec.planner.logical.DrillUnionRel.copy(DrillUnionRel.java:38)
  ~[drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT]
 at org.eigenbase.rel.SetOpRel.copy(SetOpRel.java:73) 
 ~[optiq-core-0.9-drill-r12.jar:na]
 at org.eigenbase.rel.SetOpRel.copy(SetOpRel.java:33) 
 ~[optiq-core-0.9-drill-r12.jar:na]
 at 
 org.eigenbase.relopt.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:472)
  ~[optiq-core-0.9-drill-r12.jar:na]
 at 
 org.eigenbase.relopt.volcano.RelSubset.buildCheapestPlan(RelSubset.java:287) 
 ~[optiq-core-0.9-drill-r12.jar:na]
 at 
 org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:677)
  ~[optiq-core-0.9-drill-r12.jar:na]
 at

[jira] [Commented] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery

2015-01-26 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292184#comment-14292184
 ] 

Jinfeng Ni commented on DRILL-2069:
---

Upgrade to critical, since it's incorrect result, and it represents one quite 
typical query pattern.

 Star is not expanded correctly in the query with IN clause containing subquery
 --

 Key: DRILL-2069
 URL: https://issues.apache.org/jira/browse/DRILL-2069
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Critical

 t1.json
 {code}
 { a1: aa, b1: 1 }
 { a1: bb, b1: 2 }
 { a1: cc, b1: 3 }
 {code}
 t2.json
 {code}
 { a2: aa, b2: 1 }
 { a2: bb, b2: 2 }
 { a2: xx, b2: 10 }
 {code}
 Star is expanded incorrectly, we should get only columns from `t1.json`
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` where a1 in (select a2 from 
 `t2.json`);
 +++++
 | a2 | a1 | b1 |a10 |
 +++++
 | aa | aa | 1  | aa |
 | bb | bb | 2  | bb |
 +++++
 2 rows selected (0.172 seconds)
 {code}
 explain plan
 {code}
 00-01  Project(*=[$0])
 00-02Project(*=[$0])
 00-03  HashJoin(condition=[=($1, $2)], joinType=[inner])
 00-05Project(*=[$0], a1=[$1])
 00-07  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], 
 files=[maprfs:/test/t1.json]]])
 00-04HashAgg(group=[{0}])
 00-06  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], 
 files=[maprfs:/test/t2.json]]])
 {code}
 Workaround - specify columns explicitly
 {code}
 0: jdbc:drill:schema=dfs select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 
 in (select t2.a2 from `t2.json` t2);
 +++
 | a1 |a10 |
 +++
 | aa | aa |
 | bb | bb |
 +++
 2 rows selected (0.24 seconds)
 {code}
 Note to myself: include cases like below during verification:
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in 
 (select a2, b2 from `t2.json`);
 +++++++
 | a2 | b2 | a1 | b1 |a10 |b10 
 |
 +++++++
 | aa | 1  | aa | 1  | aa | 1  
 |
 | bb | 2  | bb | 2  | bb | 2  
 |
 +++++++
 2 rows selected (0.323 seconds)
 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in 
 (select * from `t2.json`);
 Query failed: SqlValidatorException: Values passed to IN operator must have 
 compatible types
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery

2015-01-26 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2069:
--
Priority: Critical  (was: Major)

 Star is not expanded correctly in the query with IN clause containing subquery
 --

 Key: DRILL-2069
 URL: https://issues.apache.org/jira/browse/DRILL-2069
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Critical

 t1.json
 {code}
 { a1: aa, b1: 1 }
 { a1: bb, b1: 2 }
 { a1: cc, b1: 3 }
 {code}
 t2.json
 {code}
 { a2: aa, b2: 1 }
 { a2: bb, b2: 2 }
 { a2: xx, b2: 10 }
 {code}
 Star is expanded incorrectly, we should get only columns from `t1.json`
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` where a1 in (select a2 from 
 `t2.json`);
 +++++
 | a2 | a1 | b1 |a10 |
 +++++
 | aa | aa | 1  | aa |
 | bb | bb | 2  | bb |
 +++++
 2 rows selected (0.172 seconds)
 {code}
 explain plan
 {code}
 00-01  Project(*=[$0])
 00-02Project(*=[$0])
 00-03  HashJoin(condition=[=($1, $2)], joinType=[inner])
 00-05Project(*=[$0], a1=[$1])
 00-07  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], 
 files=[maprfs:/test/t1.json]]])
 00-04HashAgg(group=[{0}])
 00-06  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], 
 files=[maprfs:/test/t2.json]]])
 {code}
 Workaround - specify columns explicitly
 {code}
 0: jdbc:drill:schema=dfs select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 
 in (select t2.a2 from `t2.json` t2);
 +++
 | a1 |a10 |
 +++
 | aa | aa |
 | bb | bb |
 +++
 2 rows selected (0.24 seconds)
 {code}
 Note to myself: include cases like below during verification:
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in 
 (select a2, b2 from `t2.json`);
 +++++++
 | a2 | b2 | a1 | b1 |a10 |b10 
 |
 +++++++
 | aa | 1  | aa | 1  | aa | 1  
 |
 | bb | 2  | bb | 2  | bb | 2  
 |
 +++++++
 2 rows selected (0.323 seconds)
 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in 
 (select * from `t2.json`);
 Query failed: SqlValidatorException: Values passed to IN operator must have 
 compatible types
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-1911) Querying same field multiple times with different case would hit memory leak and return incorrect result.

2015-02-04 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-1911:
-

Assignee: Jinfeng Ni  (was: Deneche A. Hakim)

 Querying same field multiple times with different case would hit memory leak 
 and return incorrect result. 
 --

 Key: DRILL-1911
 URL: https://issues.apache.org/jira/browse/DRILL-1911
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni
 Fix For: 0.8.0


 git.commit.id.abbrev=309e1be
 If query the same field twice, with different case, Drill will throw memory 
 assertion error. 
  select employee_id, Employee_id from cp.`employee.json` limit 2;
 +-+
 | employee_id |
 +-+
 | 1   |
 | 2   |
 Query failed: Query failed: Failure while running fragment., Attempted to 
 close accountor with 2 buffer(s) still allocatedfor QueryId: 
 2b5cc8eb-2817-aadb-e0fa-49272796592a, MajorFragmentId: 0, MinorFragmentId: 0.
  Total 1 allocation(s) of byte size(s): 4096, at stack location:
   
 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:212)
   
 org.apache.drill.exec.vector.UInt1Vector.allocateNewSafe(UInt1Vector.java:137)
   
 org.apache.drill.exec.vector.NullableBigIntVector.allocateNewSafe(NullableBigIntVector.java:173)
   
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doAlloc(ProjectRecordBatch.java:229)
   
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:167)
   
 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
   
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132)
   
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
   
 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
   
 org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67)
   
 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:97)
   
 org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57)
   
 org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:114)
   
 org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254)
   
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   java.lang.Thread.run(Thread.java:744)
 Also, notice that the query result only contains one field; the second field 
 is missing. 
 The plan looks fine.
 Drill Physical : 
 00-00Screen: rowcount = 463.0, cumulative cost = {1900.3 rows, 996.3 cpu, 
 0.0 io, 0.0 network, 0.0 memory}, id = 103
 00-01  Project(employee_id=[$0], Employee_id=[$1]): rowcount = 463.0, 
 cumulative cost = {1854.0 rows, 950.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
 id = 102
 00-02SelectionVectorRemover: rowcount = 463.0, cumulative cost = 
 {1391.0 rows, 942.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 101
 00-03  Limit(fetch=[2]): rowcount = 463.0, cumulative cost = {928.0 
 rows, 479.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 100
 00-04Project(employee_id=[$0], Employee_id=[$0]): rowcount = 
 463.0, cumulative cost = {926.0 rows, 471.0 cpu, 0.0 io, 0.0 network, 0.0 
 memory}, id = 99
 00-05  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/employee.json, numFiles=1, columns=[`employee_id`], 
 files=[/employee.json]]]): rowcount = 463.0, cumulative cost = {463.0 rows, 
 463.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 98



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2168) Wrong result on grouping by expression involving CONCAT function

2015-02-04 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306347#comment-14306347
 ] 

Jinfeng Ni commented on DRILL-2168:
---

The plan looks invalid.  The $1 in the following Project operator is referring 
to the sum() aggregation result, which is not right.  

00-02Project(EXPR$0=[concat(CAST(EXTRACT(FLAG(DAY), $1))

00-03  HashAgg(group=[{0}], EXPR$1=[SUM($1)])

 Wrong result on grouping by expression involving CONCAT function
 

 Key: DRILL-2168
 URL: https://issues.apache.org/jira/browse/DRILL-2168
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Critical

 {code}
 0: jdbc:drill:schema=dfs select * from test;
 +++
 | a1 | b1 |
 +++
 | 51237400   | 2014-03-16 03:55:21.0 |
 | -691523338 | 2014-02-13 15:47:22.0 |
 | -1843395360 | 2014-01-14 03:31:27.0 |
 | 1095015454 | 2014-09-27 17:35:21.0 |
 | -48817354  | 2014-01-17 16:14:25.0 |
 | 59892266   | 2014-04-23 05:08:34.0 |
 | -827965492 | 2014-06-21 07:29:12.0 |
 | -1281245980 | 2014-10-25 15:49:46.0 |
 | -1778510302 | 2014-07-31 04:11:53.0 |
 | 1346460386 | 2014-07-31 04:11:53.0 |
 +++
 10 rows selected (0.072 seconds)
 0: jdbc:drill:schema=dfs select
 . . . . . . . . . . . .  concat(cast(extract(day from b1) as 
 varchar(10)), '-', cast(extract(month from b1) as 
 varchar(10)),'-',cast(extract(year from b1) as varchar(10))),
 . . . . . . . . . . . .  sum(a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  test
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  concat(cast(extract(day from b1) as 
 varchar(10)), '-', cast(extract(month from b1) as 
 varchar(10)),'-',cast(extract(year from b1) as varchar(10)));
 +++
 |   EXPR$0   |   EXPR$1   |
 +++
 | 1-1-1970   | 51237400   |
 | 23-12-1969 | -691523338 |
 | 10-12-1969 | -1843395360 |
 | 13-1-1970  | 1095015454 |
 | 31-12-1969 | -48817354  |
 | 1-1-1970   | 59892266   |
 | 22-12-1969 | -827965492 |
 | 17-12-1969 | -1281245980 |
 | 26-12-1969 | -432049916 |
 +++
 9 rows selected (0.103 seconds)
 {code}
 Query plan:
 {code}
 00-01  Project(EXPR$0=[$0], EXPR$1=[$1])
 00-02Project(EXPR$0=[concat(CAST(EXTRACT(FLAG(DAY), $1)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', 
 CAST(EXTRACT(FLAG(MONTH), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE 
 ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $1)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], EXPR$1=[$1])
 00-03  HashAgg(group=[{0}], EXPR$1=[SUM($1)])
 00-04Project($f0=[CONCAT(CAST(EXTRACT(FLAG(DAY), $0)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', 
 CAST(EXTRACT(FLAG(MONTH), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE 
 ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $0)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], a1=[$1])
 00-05  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/aggregation/test]], 
 selectionRoot=/aggregation/test, numFiles=1, columns=[`b1`, `a1`]]])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2107) Hash Join throw IOBE for a query with exists subquery.

2015-02-02 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2107:
--
Priority: Blocker  (was: Critical)

 Hash Join throw IOBE for a query with exists subquery. 
 ---

 Key: DRILL-2107
 URL: https://issues.apache.org/jira/browse/DRILL-2107
 Project: Apache Drill
  Issue Type: New Feature
  Components: Execution - Operators
Reporter: Jinfeng Ni
Assignee: Mehant Baid
Priority: Blocker
 Attachments: q4_1_hj.json, q4_1_hj_phy.txt, q4_1_mj.json, 
 q4_1_mj_phy.txt


 I hit an IOBE for TestTpchDistributed Q4, when I tried to enable an optimizer 
 rule.  Then, I simplified Q4 to the following, and still re-produce the same 
 IOBE.
 {code}
 select
   o.o_orderpriority
 from
   cp.`tpch/orders.parquet` o
 where
   exists (
 select
   *
 from
   cp.`tpch/lineitem.parquet` l
 where
   l.l_orderkey = o.o_orderkey
   )
 ;
 {code}
 Stack trace of the exception:
 {code}
 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
  at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_45]
  at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_45]
  at 
 org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:232)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
  at 
 org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById(RecordBatchLoader.java:149)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
  at 
 org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getValueAccessorById(UnorderedReceiverBatch.java:132)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
  at 
 org.apache.drill.exec.test.generated.HashTableGen307.doSetup(HashTableTemplate.java:71)
  ~[na:na]
  at 
 org.apache.drill.exec.test.generated.HashTableGen307.updateBatches(HashTableTemplate.java:473)
  ~[na:na]
  at 
 org.apache.drill.exec.test.generated.HashJoinProbeGen313.executeProbePhase(HashJoinProbeTemplate.java:139)
  ~[na:na]
  at 
 org.apache.drill.exec.test.generated.HashJoinProbeGen313.probeAndProject(HashJoinProbeTemplate.java:223)
  ~[na:na]
  at 
 org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:227)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 
 {code}
 The physical plan seems to be correct, after enabling the new rule. Actually, 
 if I disable HashJoin, and use merge join for the query, it works fine. So, 
 seems the IOBE exposes some bug in HashJoin.
 To re-produce this issue,  two options:
  1 )  - Modify DrillRuleSets.java, remove the comment before SwapJoinRule 
- alter session set `planner.slice_target` = 10;
- run the query
  
  2) use the attached physical plan in json file, and use submitplan to 
 submit the physical plan.
 For comparison, I also attached the physical plan when disabling hashjoin 
 (use merge join), and the explain plan at physical operator level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery

2015-02-02 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301871#comment-14301871
 ] 

Jinfeng Ni commented on DRILL-2069:
---

Revise patch based on review comments. Add two new unit test cases. 

 Star is not expanded correctly in the query with IN clause containing subquery
 --

 Key: DRILL-2069
 URL: https://issues.apache.org/jira/browse/DRILL-2069
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Critical
 Attachments: 
 0001-DRILL-2069-Fix-star-query-prefix-logic-it-s-used-wit.patch


 t1.json
 {code}
 { a1: aa, b1: 1 }
 { a1: bb, b1: 2 }
 { a1: cc, b1: 3 }
 {code}
 t2.json
 {code}
 { a2: aa, b2: 1 }
 { a2: bb, b2: 2 }
 { a2: xx, b2: 10 }
 {code}
 Star is expanded incorrectly, we should get only columns from `t1.json`
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` where a1 in (select a2 from 
 `t2.json`);
 +++++
 | a2 | a1 | b1 |a10 |
 +++++
 | aa | aa | 1  | aa |
 | bb | bb | 2  | bb |
 +++++
 2 rows selected (0.172 seconds)
 {code}
 explain plan
 {code}
 00-01  Project(*=[$0])
 00-02Project(*=[$0])
 00-03  HashJoin(condition=[=($1, $2)], joinType=[inner])
 00-05Project(*=[$0], a1=[$1])
 00-07  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], 
 files=[maprfs:/test/t1.json]]])
 00-04HashAgg(group=[{0}])
 00-06  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], 
 files=[maprfs:/test/t2.json]]])
 {code}
 Workaround - specify columns explicitly
 {code}
 0: jdbc:drill:schema=dfs select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 
 in (select t2.a2 from `t2.json` t2);
 +++
 | a1 |a10 |
 +++
 | aa | aa |
 | bb | bb |
 +++
 2 rows selected (0.24 seconds)
 {code}
 Note to myself: include cases like below during verification:
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in 
 (select a2, b2 from `t2.json`);
 +++++++
 | a2 | b2 | a1 | b1 |a10 |b10 
 |
 +++++++
 | aa | 1  | aa | 1  | aa | 1  
 |
 | bb | 2  | bb | 2  | bb | 2  
 |
 +++++++
 2 rows selected (0.323 seconds)
 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in 
 (select * from `t2.json`);
 Query failed: SqlValidatorException: Values passed to IN operator must have 
 compatible types
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery

2015-02-02 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2069:
--
Attachment: (was: 
0001-DRILL-2069-Fix-star-query-prefix-logic-it-s-used-wit.patch)

 Star is not expanded correctly in the query with IN clause containing subquery
 --

 Key: DRILL-2069
 URL: https://issues.apache.org/jira/browse/DRILL-2069
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Critical

 t1.json
 {code}
 { a1: aa, b1: 1 }
 { a1: bb, b1: 2 }
 { a1: cc, b1: 3 }
 {code}
 t2.json
 {code}
 { a2: aa, b2: 1 }
 { a2: bb, b2: 2 }
 { a2: xx, b2: 10 }
 {code}
 Star is expanded incorrectly, we should get only columns from `t1.json`
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` where a1 in (select a2 from 
 `t2.json`);
 +++++
 | a2 | a1 | b1 |a10 |
 +++++
 | aa | aa | 1  | aa |
 | bb | bb | 2  | bb |
 +++++
 2 rows selected (0.172 seconds)
 {code}
 explain plan
 {code}
 00-01  Project(*=[$0])
 00-02Project(*=[$0])
 00-03  HashJoin(condition=[=($1, $2)], joinType=[inner])
 00-05Project(*=[$0], a1=[$1])
 00-07  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], 
 files=[maprfs:/test/t1.json]]])
 00-04HashAgg(group=[{0}])
 00-06  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], 
 files=[maprfs:/test/t2.json]]])
 {code}
 Workaround - specify columns explicitly
 {code}
 0: jdbc:drill:schema=dfs select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 
 in (select t2.a2 from `t2.json` t2);
 +++
 | a1 |a10 |
 +++
 | aa | aa |
 | bb | bb |
 +++
 2 rows selected (0.24 seconds)
 {code}
 Note to myself: include cases like below during verification:
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in 
 (select a2, b2 from `t2.json`);
 +++++++
 | a2 | b2 | a1 | b1 |a10 |b10 
 |
 +++++++
 | aa | 1  | aa | 1  | aa | 1  
 |
 | bb | 2  | bb | 2  | bb | 2  
 |
 +++++++
 2 rows selected (0.323 seconds)
 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in 
 (select * from `t2.json`);
 Query failed: SqlValidatorException: Values passed to IN operator must have 
 compatible types
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery

2015-02-02 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2069:
--
Attachment: 0001-DRILL-2069-Fix-star-query-prefix-logic-it-s-used-wit.patch

 Star is not expanded correctly in the query with IN clause containing subquery
 --

 Key: DRILL-2069
 URL: https://issues.apache.org/jira/browse/DRILL-2069
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Critical
 Attachments: 
 0001-DRILL-2069-Fix-star-query-prefix-logic-it-s-used-wit.patch


 t1.json
 {code}
 { a1: aa, b1: 1 }
 { a1: bb, b1: 2 }
 { a1: cc, b1: 3 }
 {code}
 t2.json
 {code}
 { a2: aa, b2: 1 }
 { a2: bb, b2: 2 }
 { a2: xx, b2: 10 }
 {code}
 Star is expanded incorrectly, we should get only columns from `t1.json`
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` where a1 in (select a2 from 
 `t2.json`);
 +++++
 | a2 | a1 | b1 |a10 |
 +++++
 | aa | aa | 1  | aa |
 | bb | bb | 2  | bb |
 +++++
 2 rows selected (0.172 seconds)
 {code}
 explain plan
 {code}
 00-01  Project(*=[$0])
 00-02Project(*=[$0])
 00-03  HashJoin(condition=[=($1, $2)], joinType=[inner])
 00-05Project(*=[$0], a1=[$1])
 00-07  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], 
 files=[maprfs:/test/t1.json]]])
 00-04HashAgg(group=[{0}])
 00-06  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], 
 files=[maprfs:/test/t2.json]]])
 {code}
 Workaround - specify columns explicitly
 {code}
 0: jdbc:drill:schema=dfs select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 
 in (select t2.a2 from `t2.json` t2);
 +++
 | a1 |a10 |
 +++
 | aa | aa |
 | bb | bb |
 +++
 2 rows selected (0.24 seconds)
 {code}
 Note to myself: include cases like below during verification:
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in 
 (select a2, b2 from `t2.json`);
 +++++++
 | a2 | b2 | a1 | b1 |a10 |b10 
 |
 +++++++
 | aa | 1  | aa | 1  | aa | 1  
 |
 | bb | 2  | bb | 2  | bb | 2  
 |
 +++++++
 2 rows selected (0.323 seconds)
 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in 
 (select * from `t2.json`);
 Query failed: SqlValidatorException: Values passed to IN operator must have 
 compatible types
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2095) Order by on a repeated index inside a sub query results in an NPE

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2095:
--
Fix Version/s: 1.0.0

 Order by on a repeated index inside a sub query results in an NPE
 -

 Key: DRILL-2095
 URL: https://issues.apache.org/jira/browse/DRILL-2095
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Jinfeng Ni
Priority: Critical
 Fix For: 1.0.0


 git.commit.id.abbrev=3e33880
 Data Set :
 {code}
 {
   id : 1,
   list : [1,2]
 }
 {code}
 The below query succeeds
 {code}
 0: jdbc:drill:schema=dfs.drillTestDir select d.id id from `data.json` d 
 order by d.list[0];
 ++
 | id |
 ++
 | 1  |
 ++
 1 row selected (0.12 seconds)
 {code}
 However when we use the same query inside a sub-query we get an NPE
 {code}
 0: jdbc:drill:schema=dfs.drillTestDir select s.id from (select d.id id from 
 `data.json` d order by d.list[0]) s;
 Query failed: NullPointerException: 
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}
 Explain plan also results in the same NPE for the above query



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2087) Some planner option setting might produce a wrong plan and wrong query result for join query.

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2087:
--
Fix Version/s: 0.9.0

 Some planner option setting might produce a wrong plan and wrong query result 
 for join query.
 -

 Key: DRILL-2087
 URL: https://issues.apache.org/jira/browse/DRILL-2087
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Jacques Nadeau
 Fix For: 0.9.0

 Attachments: 
 0001-DRILL-2087-Fix-an-incorrect-join-plan-by-ensure-eith.patch


 I have the following parquet data ( essentially tpch's nation/region tables, 
 but break into multiple parts).
 {code}
 ls -l nation
 total 24
 -rw-r--r--@ 1 jni  staff  1376 Jan 27 16:47 01.parquet
 -rw-r--r--@ 1 jni  staff  1400 Jan 27 16:47 02.parquet
 -rw-r--r--@ 1 jni  staff  1279 Jan 27 16:47 03.parquet
 ls -l region
 total 24
 -rw-r--r--@ 1 jni  staff  564 Jan 27 16:54 01.parquet
 -rw-r--r--@ 1 jni  staff  491 Jan 27 16:54 02.parquet
 -rw-r--r--@ 1 jni  staff  506 Jan 27 16:54 03.parquet
 {code}
 With the default planner setting, the following query produce the correct 
 result :
 {code}
 0: jdbc:drill:zk=local select t1.n_nationkey from 
 dfs.`/Users/jni/work/data/parquet/region` t2 join 
 dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = 
 t2.r_regionkey;
 SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.
 SLF4J: Defaulting to no-operation (NOP) logger implementation
 SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
 details.
 +-+
 | n_nationkey |
 +-+
 | 18  |
 | 12  |
 | 9   |
 | 8   |
 | 21  |
 | 19  |
 | 7   |
 | 6   |
 | 23  |
 | 22  |
 | 20  |
 | 13  |
 | 11  |
 | 10  |
 | 4   |
 | 0   |
 | 16  |
 | 15  |
 | 14  |
 | 5   |
 | 24  |
 | 17  |
 | 3   |
 | 2   |
 | 1   |
 +-+
 25 rows selected (2.057 seconds)
 {code}
 However, if I modify the planner option (`planner.slice_target` and 
 `planner.join.row_count_estimate_factor`  ) as following, then the same query 
 would return 0 row, which is not correct. 
 {code}
 0: jdbc:drill:zk=local alter session set `planner.slice_target` = 10;
 +++
 | ok |  summary   |
 +++
 | true   | planner.slice_target updated. |
 +++
 1 row selected (0.093 seconds)
 0: jdbc:drill:zk=local alter session set 
 `planner.join.row_count_estimate_factor` = 0.1;
 +++
 | ok |  summary   |
 +++
 | true   | planner.join.row_count_estimate_factor updated. |
 +++
 1 row selected (0.041 seconds)
 0: jdbc:drill:zk=local select t1.n_nationkey from 
 dfs.`/Users/jni/work/data/parquet/region` t2 join 
 dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = 
 t2.r_regionkey;
 +-+
 | n_nationkey |
 +-+
 +-+
 No rows selected (0.71 seconds)
 {code}
 If we look at explain plan result, after changing the planner option, we will 
 see for the JOIN operator, the LEFT does not have any EXCHANGE operator, 
 while the RIGHT has HashToRandomExchange operator.  That seems to be not a 
 valid plan.
 {code}
 00-00Screen
 00-01  UnionExchange
 01-01Project(n_nationkey=[$2])
 01-02  HashJoin(condition=[=($1, $0)], joinType=[inner])
 01-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
 [path=file:/Users/jni/work/data/parquet/region]], 
 selectionRoot=/Users/jni/work/data/parquet/region, numFiles=1, 
 columns=[`r_regionkey`]]])
 01-03HashToRandomExchange(dist0=[[$0]])
 02-01  Project(n_regionkey=[$1], n_nationkey=[$0])
 02-02Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/nation]], 
 selectionRoot=/Users/jni/work/data/parquet/nation, numFiles=1, 
 columns=[`n_regionkey`, `n_nationkey`]]])
 {code} 
 The cause of this problem seems to be that Drill will  remove EXCHANGE 
 operator under some conditions.  That condition does not guarantee that a 
 JOIN operator always either has EXCHANGE on both sides, or none. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2081) Filter not being pushed down into the sub query when we use flatten

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2081:
--
Fix Version/s: 1.0.0

 Filter not being pushed down into the sub query when we use flatten
 ---

 Key: DRILL-2081
 URL: https://issues.apache.org/jira/browse/DRILL-2081
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Jinfeng Ni
Priority: Minor
 Fix For: 1.0.0


 git.commit.id.abbrev=3c6d0ef
 The plan indicates that the query is not being pushed down into the subquery
 {code}
 0: jdbc:drill:schema=dfs.drillTestDir explain plan for select  
 s.evnts.evnt_id  from (select d.type type, flatten(d.events) evnts from 
 `data.json` d) s where s.evnts.type = 'cmpgn4' and s.type='web';
 +++
 |text|json|
 +++
 | 00-00Screen
 00-01  Project(EXPR$0=[ITEM($1, 'evnt_id')])
 00-02SelectionVectorRemover
 00-03  Filter(condition=[AND(=(ITEM($1, 'type'), 'cmpgn4'), =($0, 
 'web'))])
 00-04Project(type=[$1], evnts=[$2])
 00-05  Flatten(flattenField=[$2])
 00-06Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$0])
 00-07  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/drill/testdata/flatten_operators/data.json, numFiles=1, 
 columns=[`type`, `events`], 
 files=[maprfs:/drill/testdata/flatten_operators/data.json]]])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2155) Subquery in projection list that returns scalar result throws an exception

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2155:
--
Fix Version/s: 0.9.0
 Assignee: Sean Hsuan-Yi Chu  (was: Jinfeng Ni)

 Subquery in projection list that returns scalar result throws an exception
 --

 Key: DRILL-2155
 URL: https://issues.apache.org/jira/browse/DRILL-2155
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.9.0


 Since there is no correlation, it shouldn't even plan ...
 I think we should throw the same error as in DRILL-1921 (cross join)
 {code}
 0: jdbc:drill:schema=dfs select c_integer, (select current_timestamp from t1 
 limit 1) from t1;
 Query failed: AssertionError: must call validate first
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 0: jdbc:drill:schema=dfs select c_integer, (select current_timestamp from t1 
 limit 1) b from t1;
 Query failed: AssertionError: must call validate first
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}
 drillbit.log
 {code}
 2015-02-03 22:33:16,008 [2b2eb354-2084-7da7-4e33-0debc7a90921:foreman] INFO  
 o.a.drill.exec.work.foreman.Foreman - State change requested.  PENDING -- 
 FAILED
 org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception 
 during fragment initialization: must call validate first
 at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:197) 
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_71]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_71]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
 Caused by: java.lang.AssertionError: must call validate first
 at 
 org.eigenbase.sql.validate.IdentifierNamespace.resolve(IdentifierNamespace.java:139)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:1686)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:494)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:474)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2657)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertQueryOrInList(SqlToRelConverter.java:1267)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertExists(SqlToRelConverter.java:1240)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.substituteSubquery(SqlToRelConverter.java:1003)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.replaceSubqueries(SqlToRelConverter.java:859)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelConverter.java:3279)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:519)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:474)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2657)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:432)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 net.hydromatic.optiq.prepare.PlannerImpl.convert(PlannerImpl.java:186) 
 ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:149)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:126)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at

[jira] [Updated] (DRILL-2094) Drill has problems with reading json fields when used in a subquery

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2094:
--
Fix Version/s: 1.0.0

 Drill has problems with reading json fields when used in a subquery
 ---

 Key: DRILL-2094
 URL: https://issues.apache.org/jira/browse/DRILL-2094
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Jinfeng Ni
 Fix For: 1.0.0


 git.commit.id.abbrev=3e33880
 Data Set :
 {code}
 {
   id : 1,
   list : [1,2]
 }
 {code} 
 The below query works
 {code}
 0: jdbc:drill:schema=dfs.drillTestDir select id from `temp1.json` order by 
 list[0];
 ++
 | id |
 ++
 | 1  |
 ++
 1 row selected (0.146 seconds)
 {code}
 However when I used the same exact query as part of a sub-query, I get an 
 error from drill
 {code}
 0: jdbc:drill:schema=dfs.drillTestDir select s.id from (select id from 
 `temp1.json` order by list[0]) s;
 Query failed: SqlValidatorException: Table 'list' not found
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}
 Explain plan also does not work and it returns the same problem



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2173) Enable querying partition information without reading all data

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2173:
--
Fix Version/s: 0.8.0

 Enable querying partition information without reading all data
 --

 Key: DRILL-2173
 URL: https://issues.apache.org/jira/browse/DRILL-2173
 Project: Apache Drill
  Issue Type: New Feature
  Components: Query Planning  Optimization
Affects Versions: 0.7.0
Reporter: Jason Altekruse
Assignee: Jason Altekruse
 Fix For: 0.8.0

 Attachments: Drill-2173-partition-queries-with-pruning.patch


 When reading a series of files in nested directories, Drill currently adds 
 columns representing the directory structure that was traversed to reach the 
 file currently being read. These columns are stored as varchar under tha 
 names dir0, dir1, ...  As these are just regular columns, Drill allows 
 arbitrary queries against this data, in terms of aggregates, filter, sort, 
 etc. To allow optimizing reads, basic partition pruning has already been 
 added to prune in the case of an expression like dir0 = 2015 or a simple in 
 list, which is converted during planning to a series of ORs of equals 
 expressions. If users want to query the directory information dynamically, 
 and not include specific directory names in the query, this will prompt a 
 full table scan and filter operation on the dir columns. This enhancement is 
 to allow more complex queries to be run against directory metadata, and only 
 scanning the matching directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2073) Filter on a field in a nested repeated type throws an exception

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2073:
--
Component/s: (was: Query Planning  Optimization)
 Execution - Data Types

 Filter on a field in a nested repeated type throws an exception
 ---

 Key: DRILL-2073
 URL: https://issues.apache.org/jira/browse/DRILL-2073
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Reporter: Rahul Challapalli
Assignee: Mehant Baid
 Attachments: error.log


 git.commit.id.abbrev=3c6d0ef
 Data Set :
 {code}
 {
   rm: [
 {rptd: [{ a: foo},{b:boo}]}
   ],
   rm1:[{a:foo},{b:boo}]
 }
 {code}
 The below query tries to apply a filter on field which does not exist. 
 However the field is still present in a different element of the same array.
 {code}
 select rm[0].rptd[0] from `temp.json` where rm[0].rptd[0].b = 'boo';
 Query failed: Query failed: Failure while running fragment., index: -4, 
 length: 4 (expected: range(0, 16384)) [ 01887113-c758-41bf-96d1-5eede9b1e411 
 on qa-node191.qa.lab:31010 ]
 [ 01887113-c758-41bf-96d1-5eede9b1e411 on qa-node191.qa.lab:31010 ]
 {code}
 The above query should result in an empty result.The above error only happens 
 when we apply a filter on a nested array element. The below query works fine
 {code}
 0: jdbc:drill:schema=dfs.drillTestDir select rm1[0].a from `nested.json` 
 where rm1[0].b = 'boo';
 ++
 |   EXPR$0   |
 ++
 ++
 {code}
 Attached the log file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2146) Flatten Function on two layer deep field fails to plan

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2146:
--
Fix Version/s: 0.9.0

 Flatten Function on two layer deep field fails to plan
 --

 Key: DRILL-2146
 URL: https://issues.apache.org/jira/browse/DRILL-2146
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Sean Hsuan-Yi Chu
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.9.0


 Use the same data in DRILL-2012 to run this query:
 select flatten(j.batters.batter) bb 
 from dfs_test.`%s` j where j.type = 'donut'
 fails to plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2053) Column names are case sensitive if column is coming from WITH clause

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2053:
--
Fix Version/s: 1.0.0

 Column names are case sensitive if column is coming from WITH clause
 

 Key: DRILL-2053
 URL: https://issues.apache.org/jira/browse/DRILL-2053
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
 Fix For: 1.0.0


 test.json
 {code}
 {customerid:100,customername:AAA}
 {customerid:101,customername:BBB}
 {customerid:102,customername:CCC}
 {code}
 Wrong result:
 {code}
 0: jdbc:drill:schema=dfs with a as ( select * from `test.json` ) select * 
 from a, `test.json` b where a.CUSTOMERID = b.CUSTOMERID;
 ++--+-+---+
 | customerid | customername | customerid0 | customername0 |
 ++--+-+---+
 ++--+-+---+
 No rows selected (0.202 seconds)
 {code}
 Correct result, when column name matches the case of the column name in the 
 json file:
 {code}
 0: jdbc:drill:schema=dfs with a as ( select * from `test.json` ) select * 
 from a, `test.json` b where a.customerid = b.customerid;
 ++--+-+---+
 | customerid | customername | customerid0 | customername0 |
 ++--+-+---+
 | 100| AAA  | 100 | AAA   |
 | 101| BBB  | 101 | BBB   |
 | 102| CCC  | 102 | CCC   |
 ++--+-+---+
 3 rows selected (0.204 seconds)
 {code}
 Correct result when column does not match case, but is coming directly from 
 the table:
 {code}
 0: jdbc:drill:schema=dfs with a as ( select * from `test.json` ) select * 
 from a, `test.json` b where a.customerid = b.customerID;
 ++--+-+---+
 | customerid | customername | customerid0 | customername0 |
 ++--+-+---+
 | 100| AAA  | 100 | AAA   |
 | 101| BBB  | 101 | BBB   |
 | 102| CCC  | 102 | CCC   |
 ++--+-+---+
 3 rows selected (0.197 seconds)
 {code}
 If you change case of a column name that comes from subquery (WITH clause), 
 this is where it goes all wrong:
 {code}
 0: jdbc:drill:schema=dfs with a as ( select * from `test.json` ) select * 
 from a, `test.json` b where a.Customerid = b.customerid;
 ++--+-+---+
 | customerid | customername | customerid0 | customername0 |
 ++--+-+---+
 ++--+-+---+
 No rows selected (0.186 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2157) Directory pruning on subdirectories only and data type conversions for directory filters

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2157:
--
Fix Version/s: 1.0.0
 Assignee: Aman Sinha  (was: Jinfeng Ni)

 Directory pruning on subdirectories only and data type conversions for 
 directory filters
 

 Key: DRILL-2157
 URL: https://issues.apache.org/jira/browse/DRILL-2157
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning  Optimization
Affects Versions: Future
Reporter: Andries Engelbrecht
Assignee: Aman Sinha
Priority: Minor
 Fix For: 1.0.0


 Drill will scan all files and directories when using only a subdirectory as a 
 predicate. Additionally if the data type for the directory filter is not a 
 string and is converted Drill will also first scan all the subdirectories adn 
 files before applying the filter.
 My current observation is that for a directory structure as listed below,
 the pruning only works if the full tree is provided. If only a lower level
 directory is supplied in the filter condition Drill only uses it as a
 filter.
 With directory structure as below
 /2015
 /01
/10
/11
/12
/13
/14
 Query:
 select count(id) from `/foo` t where dir0='2015' and dir1='01' and
 dir2='10'
 Produces the correct pruning and query plan
 01-02Project(id=[$3]): rowcount = 3670316.0, cumulative cost =
 {1.1010948E7 rows, 1.4681284E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id =
 28434
 01-03  Project(dir0=[$0], dir1=[$3], dir2=[$2], id=[$1]):
 rowcount = 3670316.0, cumulative cost = {7340632.0 rows, 1.468128E7 cpu,
 0.0 io, 0.0 network, 0.0 memory}, id = 28433
 01-04Scan(groupscan=[EasyGroupScan [selectionRoot=/foo,
 numFiles=24, columns=[`dir0`, `dir1`, `dir2`, `id`]
 However:
 select count(id) from `/foo` t where dir2='10'
 Produces full scan of all sub directories and only applies a filter
 condition after the fact. Notice the numFiles between the 2, even though it
 lists columns in the base scan
 01-04Filter(condition=[=($0, '10')]): rowcount =
 9423761.7, cumulative cost = {1.88475234E8 rows, 3.76950476E8 cpu, 0.0 io,
 0.0 network, 0.0 memory}, id = 27470
 01-05  Project(dir2=[$1], id=[$0]): rowcount =
 6.2825078E7, cumulative cost = {1.25650156E8 rows, 1.25650164E8 cpu, 0.0
 io, 0.0 network, 0.0 memory}, id = 27469
 01-06Scan(groupscan=[EasyGroupScan
 [selectionRoot=/foo, numFiles=405, columns=[`dir2`, `id`]
 Also using the wrong data type for the filter produces a full scan
 select count(id) from `/foo` where dir_year=2015 and dir_month=01 and 
 dir_day=14
 Produces
 01-04Filter(condition=[AND(=(CAST($1):ANY NOT NULL, 2015), 
 =(CAST($2):ANY NOT NULL, 1), =(CAST($3):ANY NOT NULL, 10))]): rowcount = 
 212034.63825, cumulative cost = {1.88475234E8 rows, 1.005201264E9 cpu, 0.0 
 io, 0.0 network, 0.0 memory}, id = 34910
 01-05  Project(id=[$2], dir0=[$3], dir1=[$1], dir2=[$0]): 
 rowcount = 6.2825078E7, cumulative cost = {1.25650156E8 rows, 2.51300328E8 
 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 34909
 01-06Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, 
 numFiles=405, columns=[`id`, `dir0`, `dir1`, `dir2`],



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2081) Filter not being pushed down into the sub query when we use flatten

2015-02-06 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309710#comment-14309710
 ] 

Jinfeng Ni commented on DRILL-2081:
---

s.evnts.type = 'cmpgn4' can not pushed into subquery, since the column comes 
from FLATTEN operator.

Are you expecting that filter of s.type = 'web' is pushed down?

 Filter not being pushed down into the sub query when we use flatten
 ---

 Key: DRILL-2081
 URL: https://issues.apache.org/jira/browse/DRILL-2081
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Jinfeng Ni
Priority: Minor

 git.commit.id.abbrev=3c6d0ef
 The plan indicates that the query is not being pushed down into the subquery
 {code}
 0: jdbc:drill:schema=dfs.drillTestDir explain plan for select  
 s.evnts.evnt_id  from (select d.type type, flatten(d.events) evnts from 
 `data.json` d) s where s.evnts.type = 'cmpgn4' and s.type='web';
 +++
 |text|json|
 +++
 | 00-00Screen
 00-01  Project(EXPR$0=[ITEM($1, 'evnt_id')])
 00-02SelectionVectorRemover
 00-03  Filter(condition=[AND(=(ITEM($1, 'type'), 'cmpgn4'), =($0, 
 'web'))])
 00-04Project(type=[$1], evnts=[$2])
 00-05  Flatten(flattenField=[$2])
 00-06Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$0])
 00-07  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/drill/testdata/flatten_operators/data.json, numFiles=1, 
 columns=[`type`, `events`], 
 files=[maprfs:/drill/testdata/flatten_operators/data.json]]])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2170) Wrong result when joining to a subquery that has group-by, order-by and LIMIT

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2170:
--
Fix Version/s: 0.8.0

 Wrong result when joining to a subquery that has group-by, order-by and LIMIT
 -

 Key: DRILL-2170
 URL: https://issues.apache.org/jira/browse/DRILL-2170
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.7.0
Reporter: Aman Sinha
Assignee: Aman Sinha
Priority: Critical
 Fix For: 0.8.0


 On TPCH SF1:  first query gives right result, second produces wrong result.  
 The main difference is the order-by in the second query is on the aggregation 
 function, not the grouping key.  
 {code}
 // Correct result 
 : jdbc:drill:zk=local select count(*) from (select l_orderkey, 
 sum(l_quantity), sum(l_extendedprice) from lineitem group by l_orderkey order 
 by 1 limit 100) sq inner join orders o on sq.l_orderkey = o.o_orderkey;
 ++
 |   EXPR$0   |
 ++
 | 100|
 ++
 1 row selected (2.67 seconds)
 // Wrong result
 0: jdbc:drill:zk=local select count(*) from (select l_orderkey, 
 sum(l_quantity), sum(l_extendedprice) from lineitem group by l_orderkey order 
 by 3 limit 100) sq inner join orders o on sq.l_orderkey = o.o_orderkey;
 ++
 |   EXPR$0   |
 ++
 | 400|
 ++
 1 row selected (3.163 seconds)
 The plan for the second query shows that there's a missing 
 SingleMergeExchange after the TopN operation.  This is needed because before 
 the TopN we did a HashToRandomExchange and since we are doing a LIMIT, we 
 need to merge the output of TopN into a single stream. 
 {code}
 0: jdbc:drill:zk=local explain plan for select count(*) from (select 
 l_orderkey, sum(l_quantity), sum(l_extendedprice) from lineitem group by 
 l_orderkey order by 3 limit 100) sq inner join orders o on sq.l_orderkey = 
 o.o_orderkey;
 +++
 |text|json|
 +++
 | 00-00Screen
 00-01  StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
 00-02UnionExchange
 01-01  StreamAgg(group=[{}], EXPR$0=[COUNT()])
 01-02Project($f0=[0])
 01-03  HashJoin(condition=[=($0, $1)], joinType=[inner])
 01-05HashToRandomExchange(dist0=[[$0]])
 02-01  Project(l_orderkey=[$0])
 02-02SelectionVectorRemover
 02-03  Limit(fetch=[100])
 02-04SelectionVectorRemover
 02-05  TopN(limit=[100])
 02-06HashToRandomExchange(dist0=[[$2]])
 04-01  HashAgg(group=[{0}], EXPR$1=[SUM($1)], 
 EXPR$2=[SUM($2)])
 04-02HashToRandomExchange(dist0=[[$0]])
 05-01  HashAgg(group=[{0}], EXPR$1=[SUM($1)], 
 EXPR$2=[SUM($2)])
 05-02Project(l_orderkey=[$1], 
 l_quantity=[$2], l_extendedprice=[$0])
 05-03  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath 
 [path=file:/Users/asinha/data/tpch-sf1/lineitem]], 
 selectionRoot=/Users/asinha/data/tpch-sf1/lineitem, numFiles=1, 
 columns=[`l_orderkey`, `l_quantity`, `l_extendedprice`]]])
 01-04HashToRandomExchange(dist0=[[$0]])
 03-01  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpch-sf1/orders]], 
 selectionRoot=/Users/asinha/data/tpch-sf1/orders, numFiles=1, 
 columns=[`o_orderkey`]]])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2069:
--
Fix Version/s: 0.8.0

 Star is not expanded correctly in the query with IN clause containing subquery
 --

 Key: DRILL-2069
 URL: https://issues.apache.org/jira/browse/DRILL-2069
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Critical
 Fix For: 0.8.0

 Attachments: 
 0001-DRILL-2069-Fix-star-query-prefix-logic-it-s-used-wit.patch


 t1.json
 {code}
 { a1: aa, b1: 1 }
 { a1: bb, b1: 2 }
 { a1: cc, b1: 3 }
 {code}
 t2.json
 {code}
 { a2: aa, b2: 1 }
 { a2: bb, b2: 2 }
 { a2: xx, b2: 10 }
 {code}
 Star is expanded incorrectly, we should get only columns from `t1.json`
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` where a1 in (select a2 from 
 `t2.json`);
 +++++
 | a2 | a1 | b1 |a10 |
 +++++
 | aa | aa | 1  | aa |
 | bb | bb | 2  | bb |
 +++++
 2 rows selected (0.172 seconds)
 {code}
 explain plan
 {code}
 00-01  Project(*=[$0])
 00-02Project(*=[$0])
 00-03  HashJoin(condition=[=($1, $2)], joinType=[inner])
 00-05Project(*=[$0], a1=[$1])
 00-07  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], 
 files=[maprfs:/test/t1.json]]])
 00-04HashAgg(group=[{0}])
 00-06  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], 
 files=[maprfs:/test/t2.json]]])
 {code}
 Workaround - specify columns explicitly
 {code}
 0: jdbc:drill:schema=dfs select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 
 in (select t2.a2 from `t2.json` t2);
 +++
 | a1 |a10 |
 +++
 | aa | aa |
 | bb | bb |
 +++
 2 rows selected (0.24 seconds)
 {code}
 Note to myself: include cases like below during verification:
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in 
 (select a2, b2 from `t2.json`);
 +++++++
 | a2 | b2 | a1 | b1 |a10 |b10 
 |
 +++++++
 | aa | 1  | aa | 1  | aa | 1  
 |
 | bb | 2  | bb | 2  | bb | 2  
 |
 +++++++
 2 rows selected (0.323 seconds)
 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in 
 (select * from `t2.json`);
 Query failed: SqlValidatorException: Values passed to IN operator must have 
 compatible types
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1921) Throw unsupported error message some set operators that are not currently supported

2015-02-06 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310094#comment-14310094
 ] 

Jinfeng Ni commented on DRILL-1921:
---

+1

 Throw unsupported error message some set operators that are not currently 
 supported
 ---

 Key: DRILL-1921
 URL: https://issues.apache.org/jira/browse/DRILL-1921
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Reporter: Victoria Markman
Assignee: Jinfeng Ni
 Fix For: 0.8.0

 Attachments: DRILL-1921.1.patch, DRILL-1921.2.patch, 
 DRILL-1921.3.patch, DRILL-1921.4.patch, DRILL-1921.5.patch, 
 DRILL-1921.6.patch, DRILL-1921.7.patch


 Throw unsupported error message for these operators: (instead of Could not 
 be implemented error below)
 INTERSECT
 EXCEPT
 UNION
 CROSS JOIN
 FULL OUTER JOIN
 Query failed: Query failed: Unexpected exception during fragment 
 initialization: Node [rel#517133:Subset#3.LOGICAL.ANY([]).[]] could not be 
 implemented; planner state:
 Enhacement requests in Jira for these above mentioned operators:
 {code}
 INTERSECT   DRILL-1308
 MINUSNo enhancement request ( not recognized in grammar)
 EXCEPT No enhancement request
 UNIONDRILL-1169
 CROSS JOIN DRILL-786
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2139) Star is not expanded correctly in select distinct query

2015-02-06 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309675#comment-14309675
 ] 

Jinfeng Ni commented on DRILL-2139:
---

Probably we want to disable  select distinct *  in schema-less query.

 Star is not expanded correctly in select distinct query
 -

 Key: DRILL-2139
 URL: https://issues.apache.org/jira/browse/DRILL-2139
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
 Fix For: 1.0.0


 {code}
 0: jdbc:drill:schema=dfs select distinct * from t1;
 ++
 | *  |
 ++
 | null   |
 ++
 1 row selected (0.14 seconds)
 0: jdbc:drill:schema=dfs select distinct * from `test.json`;
 ++
 | *  |
 ++
 | null   |
 ++
 1 row selected (0.163 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2139) Star is not expanded correctly in select distinct query

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2139:
--
Fix Version/s: 1.0.0

 Star is not expanded correctly in select distinct query
 -

 Key: DRILL-2139
 URL: https://issues.apache.org/jira/browse/DRILL-2139
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
 Fix For: 1.0.0


 {code}
 0: jdbc:drill:schema=dfs select distinct * from t1;
 ++
 | *  |
 ++
 | null   |
 ++
 1 row selected (0.14 seconds)
 0: jdbc:drill:schema=dfs select distinct * from `test.json`;
 ++
 | *  |
 ++
 | null   |
 ++
 1 row selected (0.163 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2115) Disable function CARDINALITY in grammar

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2115:
--
Fix Version/s: 0.9.0

 Disable function CARDINALITY in grammar
 ---

 Key: DRILL-2115
 URL: https://issues.apache.org/jira/browse/DRILL-2115
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Sean Hsuan-Yi Chu
Priority: Minor
 Fix For: 0.9.0


 Since we don't support multiset type, we should disable CARDINALITY function 
 as well:
 {code}
 0: jdbc:drill:schema=dfs select cardinality(list) from `test.json`;
 Query failed: Query stopped., Failure while trying to materialize incoming 
 schema.  Errors:
  
 Error in expression at index -1.  Error: Missing function implementation: 
 [cardinality(BIGINT-REPEATED)].  Full expression: --UNKNOWN EXPRESSION--.. [ 
 db86cf79-6083-4ad7-afa1-fac534b942bc on atsqa4-134.qa.lab:31010 ]
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 0: jdbc:drill:schema=dfs select cardinality(id) from `test.json`;
 Query failed: Query stopped., Failure while trying to materialize incoming 
 schema.  Errors:
  
 Error in expression at index -1.  Error: Missing function implementation: 
 [cardinality(BIGINT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--.. [ 
 1f209cb2-4d60-4c1d-b1d0-d921f8e8a913 on atsqa4-134.qa.lab:31010 ]
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 0: jdbc:drill:schema=dfs select cardinality() from `test.json`;
 Query failed: SqlValidatorException: Invalid number of arguments to function 
 'CARDINALITY'. Was expecting 1 arguments
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 0: jdbc:drill:schema=dfs select cardinality('aa') from `test.json`;
 Query failed: SqlValidatorException: Cannot apply 'CARDINALITY' to arguments 
 of type 'CARDINALITY(CHAR(2))'. Supported form(s): 'CARDINALITY(MULTISET)'
 'CARDINALITY(ARRAY)'
 'CARDINALITY(MAP)'
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2116) Add non-reserved keywords to non-reserved keyword list in parser

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2116:
--
Fix Version/s: 0.9.0
 Assignee: Sean Hsuan-Yi Chu  (was: Aman Sinha)

 Add non-reserved keywords to non-reserved keyword list in parser
 

 Key: DRILL-2116
 URL: https://issues.apache.org/jira/browse/DRILL-2116
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Reporter: Jacques Nadeau
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.9.0


 There are a number of keywords in Drill that shouldn't be considered reserved 
 when parsing.  Calcite allows us to customize the list of un-reserved 
 keywords and we should update the list to allow more words.  Things that I've 
 run across include value, user, left, etc.  
 This is a very common usability problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2055) Drill should error out for Invalid json file if it has the same map key names.

2015-02-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2055:
--
Component/s: (was: Query Planning  Optimization)
 Storage - JSON
   Assignee: Parth Chandra  (was: Jinfeng Ni)

 Drill should error out for Invalid json file if it has the same map key names.
 --

 Key: DRILL-2055
 URL: https://issues.apache.org/jira/browse/DRILL-2055
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 0.7.0
Reporter: Hao Zhu
Assignee: Parth Chandra
Priority: Minor

 For json file with same map key names:
 {
 a : x,
 a : y
 }
 Should we consider it invalid json format and error out?
 Ref:
 http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object#answer-23195243



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-1921) Throw unsupported error message some set operators that are not currently supported

2015-02-07 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-1921:
-

Assignee: Sean Hsuan-Yi Chu  (was: Jinfeng Ni)

 Throw unsupported error message some set operators that are not currently 
 supported
 ---

 Key: DRILL-1921
 URL: https://issues.apache.org/jira/browse/DRILL-1921
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Reporter: Victoria Markman
Assignee: Sean Hsuan-Yi Chu
 Fix For: 0.8.0

 Attachments: DRILL-1921.8.patch


 Throw unsupported error message for these operators: (instead of Could not 
 be implemented error below)
 INTERSECT
 EXCEPT
 UNION
 CROSS JOIN
 Query failed: Query failed: Unexpected exception during fragment 
 initialization: Node [rel#517133:Subset#3.LOGICAL.ANY([]).[]] could not be 
 implemented; planner state:
 Enhacement requests in Jira for these above mentioned operators:
 {code}
 INTERSECT   DRILL-1308
 MINUSNo enhancement request ( not recognized in grammar)
 EXCEPT No enhancement request
 UNIONDRILL-1169
 CROSS JOIN DRILL-786
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2054) Support `||` as concat operator for varchar inputs

2015-02-03 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304262#comment-14304262
 ] 

Jinfeng Ni commented on DRILL-2054:
---

Minor comments:

1. For Junit test, add combination of   c1 || c2,   cast(c1 as varchar(30)) || 
cast(c2 as varchar(30)), where c1/c2 is any type when it goes through 
Calcite/optiq, but actually a varchar column in a schemaless-table.  These two 
cases covers cases of any || any,  varchar || varchar. 

The test you have only tests  the case of any || varchar.




 Support `||` as concat operator for varchar inputs
 

 Key: DRILL-2054
 URL: https://issues.apache.org/jira/browse/DRILL-2054
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Sean Hsuan-Yi Chu
Assignee: Jinfeng Ni
 Fix For: Future

 Attachments: DRILL-2054.1.patch, DRILL-2054.2.patch, 
 DRILL-2054ForOptiq.1.patch


 For example, the query:
 select n_nationkey || ' + ' || n_name || ' = ' as CONCAT, n_nationkey, ' + ' 
 as PLUS, n_name from cp.`tpch/nation.parquet`
 should print out 
 CONCATn_nationkey PLUSn_name
 0 + ALGERIA = 0+  ALGERIA
 1 + ARGENTINA =   1+  ARGENTINA
 2 + BRAZIL =  2+  BRAZIL
 3 + CANADA =  3+  CANADA
 ... ... ... (rows/records are omitted due to space limitation)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1921) Throw unsupported error message some set operators that are not currently supported

2015-02-03 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304360#comment-14304360
 ] 

Jinfeng Ni commented on DRILL-1921:
---

Couple of additional comments:

1. VisitingUnsupportedOperators:64. You had better use 
 sqlCall.getKind == SqlKind.UNION   

 SqlSetOperator op = (SqlSetOperator) sqlCall.getOperator();
 if (! op.isAll()) {
throw exception;
 }

   We should try to avoid using function/operator's name as the comparison 
condition, when we have other way.

2.  Why call UNION as NOT-ALL in the error message?  Use UNION makes more 
sense to me. 

3.  Use SqlTypeName.TINYINT.name() etc  in disabledTypes.  

4.  DefaultSqlHandler.validateNode(), in stead of create a new 
VisitingUnsupportedOperators each time, try to create a put static method in 
that visitor and call the static method, if the visitor does not require any 
customization for each run.



 Throw unsupported error message some set operators that are not currently 
 supported
 ---

 Key: DRILL-1921
 URL: https://issues.apache.org/jira/browse/DRILL-1921
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Reporter: Victoria Markman
Assignee: Jinfeng Ni
 Fix For: 0.8.0

 Attachments: DRILL-1921.1.patch, DRILL-1921.2.patch, 
 DRILL-1921.3.patch, DRILL-1921.4.patch


 Throw unsupported error message for these operators: (instead of Could not 
 be implemented error below)
 INTERSECT
 EXCEPT
 UNION
 CROSS JOIN
 FULL OUTER JOIN
 Query failed: Query failed: Unexpected exception during fragment 
 initialization: Node [rel#517133:Subset#3.LOGICAL.ANY([]).[]] could not be 
 implemented; planner state:
 Enhacement requests in Jira for these above mentioned operators:
 {code}
 INTERSECT   DRILL-1308
 MINUSNo enhancement request ( not recognized in grammar)
 EXCEPT No enhancement request
 UNIONDRILL-1169
 CROSS JOIN DRILL-786
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-1889) when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns

2015-01-15 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-1889:
--
Attachment: (was: 
0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch.2)

 when 'select *' is used along with an order by on length of a column, Drill 
 is adding the computed length to the list of columns
 

 Key: DRILL-1889
 URL: https://issues.apache.org/jira/browse/DRILL-1889
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Jinfeng Ni
Priority: Critical
 Fix For: 0.9.0

 Attachments: 
 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch.3


 git.commit.id.abbrev=9dfa4a1
 Dataset :
 {code}
 {
  col1:1,
  col2:a
 }
 {
  col1:2,
  col2:b
 }
 {
  col1:2,
  col2:abc
 }
 {code}
 Query :
 {code}
  select * from `b.json` order by length(col2);
 ++++
 |col1|col2|   EXPR$1   |
 ++++
 | 1  | a  | 1  |
 | 2  | b  | 1  |
 | 2  | abc| 3  |
 ++++
 {code}
 Drill adds the length column. (EXPR$1) Not sure if this is intended behavior 
 since postgres does not do this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1889) when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns

2015-01-15 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279086#comment-14279086
 ] 

Jinfeng Ni commented on DRILL-1889:
---

Revise the code based on the comments. Add the comments to the code. Also, 
rename method isRegularColumn() to isRegularColumnOrExp() to make it cleaner. 

Also, there is incorrect result issue for query like select *, col, *, col 
from t.  The last col will be missing in the output.  The plan for this 
query looks OK and the bug seems to be in execution operator 
(ProjectRecordBatch). Husan will open a different JIRA and submit a patch, as 
he has prototyped a fix for this issue.





 when 'select *' is used along with an order by on length of a column, Drill 
 is adding the computed length to the list of columns
 

 Key: DRILL-1889
 URL: https://issues.apache.org/jira/browse/DRILL-1889
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Jinfeng Ni
Priority: Critical
 Fix For: 0.9.0

 Attachments: 
 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch.3


 git.commit.id.abbrev=9dfa4a1
 Dataset :
 {code}
 {
  col1:1,
  col2:a
 }
 {
  col1:2,
  col2:b
 }
 {
  col1:2,
  col2:abc
 }
 {code}
 Query :
 {code}
  select * from `b.json` order by length(col2);
 ++++
 |col1|col2|   EXPR$1   |
 ++++
 | 1  | a  | 1  |
 | 2  | b  | 1  |
 | 2  | abc| 3  |
 ++++
 {code}
 Drill adds the length column. (EXPR$1) Not sure if this is intended behavior 
 since postgres does not do this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1889) when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns

2015-01-14 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277756#comment-14277756
 ] 

Jinfeng Ni commented on DRILL-1889:
---

The cause of this wrong result for select * query is that Drill's plan 
currently does not distinguish the columns expanded from the * column, and the 
expressions referenced in the query.

DRILL-931 introduces the concept of prefix table columns for join query when it 
references * column.  Turns out that for a single table having * column, we 
also have to add the prefix for columns expanded from * column, so that the 
execution time would be able to differentiate them.

Per discussion with Aman/Hsuan, the main idea of the fix is as follows:

1. Add prefix to table columns, when the query references at least one * 
column, plus other regular column / expressions.  The prefix would be removed 
right before the results are outputted to SCREEN operator. 
2.  Star column will not subsume a regular column.
3.  Regular column reference will not be prefixed; so that it would be able to 
distinguished from prefixed column from star column expansion. We have to keep 
those regular columns, since the parent operator might have reference to them. 
4.  Star column will subsume another star column, if it comes from the same 
table with same prefix. 

[~amansinha100], could you please review the patch? 


 when 'select *' is used along with an order by on length of a column, Drill 
 is adding the computed length to the list of columns
 

 Key: DRILL-1889
 URL: https://issues.apache.org/jira/browse/DRILL-1889
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Jinfeng Ni
Priority: Critical
 Fix For: 0.9.0


 git.commit.id.abbrev=9dfa4a1
 Dataset :
 {code}
 {
  col1:1,
  col2:a
 }
 {
  col1:2,
  col2:b
 }
 {
  col1:2,
  col2:abc
 }
 {code}
 Query :
 {code}
  select * from `b.json` order by length(col2);
 ++++
 |col1|col2|   EXPR$1   |
 ++++
 | 1  | a  | 1  |
 | 2  | b  | 1  |
 | 2  | abc| 3  |
 ++++
 {code}
 Drill adds the length column. (EXPR$1) Not sure if this is intended behavior 
 since postgres does not do this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-1889) when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns

2015-01-14 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-1889:
--
Attachment: (was: 
0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch)

 when 'select *' is used along with an order by on length of a column, Drill 
 is adding the computed length to the list of columns
 

 Key: DRILL-1889
 URL: https://issues.apache.org/jira/browse/DRILL-1889
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Jinfeng Ni
Priority: Critical
 Fix For: 0.9.0

 Attachments: 
 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch.2


 git.commit.id.abbrev=9dfa4a1
 Dataset :
 {code}
 {
  col1:1,
  col2:a
 }
 {
  col1:2,
  col2:b
 }
 {
  col1:2,
  col2:abc
 }
 {code}
 Query :
 {code}
  select * from `b.json` order by length(col2);
 ++++
 |col1|col2|   EXPR$1   |
 ++++
 | 1  | a  | 1  |
 | 2  | b  | 1  |
 | 2  | abc| 3  |
 ++++
 {code}
 Drill adds the length column. (EXPR$1) Not sure if this is intended behavior 
 since postgres does not do this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-1889) when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns

2015-01-14 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-1889:
--
Attachment: 
0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch.2

Made small change to the patch. 

 when 'select *' is used along with an order by on length of a column, Drill 
 is adding the computed length to the list of columns
 

 Key: DRILL-1889
 URL: https://issues.apache.org/jira/browse/DRILL-1889
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Jinfeng Ni
Priority: Critical
 Fix For: 0.9.0

 Attachments: 
 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch.2


 git.commit.id.abbrev=9dfa4a1
 Dataset :
 {code}
 {
  col1:1,
  col2:a
 }
 {
  col1:2,
  col2:b
 }
 {
  col1:2,
  col2:abc
 }
 {code}
 Query :
 {code}
  select * from `b.json` order by length(col2);
 ++++
 |col1|col2|   EXPR$1   |
 ++++
 | 1  | a  | 1  |
 | 2  | b  | 1  |
 | 2  | abc| 3  |
 ++++
 {code}
 Drill adds the length column. (EXPR$1) Not sure if this is intended behavior 
 since postgres does not do this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-1889) when 'select *' is used along with an order by on length of a column, Drill is adding the computed length to the list of columns

2015-01-14 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-1889:
--
Attachment: 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch

 when 'select *' is used along with an order by on length of a column, Drill 
 is adding the computed length to the list of columns
 

 Key: DRILL-1889
 URL: https://issues.apache.org/jira/browse/DRILL-1889
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Jinfeng Ni
Priority: Critical
 Fix For: 0.9.0

 Attachments: 
 0001-DRILL-1889-Fix-star-column-prefix-and-subsume-logic-.patch


 git.commit.id.abbrev=9dfa4a1
 Dataset :
 {code}
 {
  col1:1,
  col2:a
 }
 {
  col1:2,
  col2:b
 }
 {
  col1:2,
  col2:abc
 }
 {code}
 Query :
 {code}
  select * from `b.json` order by length(col2);
 ++++
 |col1|col2|   EXPR$1   |
 ++++
 | 1  | a  | 1  |
 | 2  | b  | 1  |
 | 2  | abc| 3  |
 ++++
 {code}
 Drill adds the length column. (EXPR$1) Not sure if this is intended behavior 
 since postgres does not do this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1500) Partition filtering might lead to an unnecessary column in the result set.

2015-01-15 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279390#comment-14279390
 ] 

Jinfeng Ni commented on DRILL-1500:
---

With patch of DRILL-1889, the incorrect result problem is fixed.

{code}
select * from 
dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet`
 where dir0=1994 and dir1='Q1' order by dir0 limit 1;
++++++-++-+---++--+
|dir0|dir1|  o_clerk   | o_comment  | o_custkey  | o_orderdate 
| o_orderkey | o_orderpriority | o_orderstatus | o_shippriority | o_totalprice |
++++++-++-+---++--+
| 1994   | Q1 | Clerk#00743 | y pending requests integrate | 
1292   | 1994-01-20  | 66 | 5-LOW   | F | 0 
 | 104190.66|
++++++-++-+---++--+
1 row selected (2.066 seconds)

{code}


 Partition filtering might lead to an unnecessary column in the result set. 
 ---

 Key: DRILL-1500
 URL: https://issues.apache.org/jira/browse/DRILL-1500
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Aman Sinha
Priority: Critical
 Fix For: 0.8.0

 Attachments: 
 0001-DRILL-1500-Partial-fix-Don-t-overwrite-top-level-Pro.patch


 When partition filtering is used together with select * query, Drill might 
 return the partitioning column duplicately. 
 Q1 : 
 {code}
 select * from 
 dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet`
  where dir0=1994 and dir1='Q1' order by dir0 limit 1;
 +++++++-++-+---++--+
 |   dir00|dir0|dir1|  o_clerk   | o_comment  | o_custkey  
 | o_orderdate | o_orderkey | o_orderpriority | o_orderstatus | o_shippriority 
 | o_totalprice |
 +++++++-++-+---++--+
 | 1994   | 1994   | Q1 | Clerk#00743 | y pending requests 
 integrate | 1292   | 1994-01-20  | 66 | 5-LOW   | F   
   | 0  | 104190.66|
 +++++++-++-+---++--+
 1 row selected (2.097 seconds)
 {code}
 We can see that column dir0 appeared twice in the result set.  In 
 comparison, here is the query without partition filtering and the query 
 result:
 Q2:
 {code}
 select * from 
 dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet`
  order by dir0 limit 1;
 ++++++-++-+---++--+
 |dir0|dir1|  o_clerk   | o_comment  | o_custkey  | 
 o_orderdate | o_orderkey | o_orderpriority | o_orderstatus | o_shippriority | 
 o_totalprice |
 ++++++-++-+---++--+
 | 1994   | Q1 | Clerk#00743 | y pending requests integrate | 
 1292   | 1994-01-20  | 66 | 5-LOW   | F | 0   
| 104190.66|
 ++++++-++-+---++--+
 1 row selected (0.761 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2030) CTAS with SELECT * and expression creates column names with prefix

2015-01-20 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2030:
--
Attachment: 0001-DRILL-2030-fix-column-prefix-for-CTAS-when-query-has.patch

 CTAS with SELECT * and expression creates column names with prefix
 --

 Key: DRILL-2030
 URL: https://issues.apache.org/jira/browse/DRILL-2030
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.7.0
Reporter: Aman Sinha
Assignee: Jinfeng Ni
 Attachments: 
 0001-DRILL-2030-fix-column-prefix-for-CTAS-when-query-has.patch


 Doing a CTAS with the star column and an expression creates columns that 
 contain the table prefix 'T||' .  Looks like the top project did not strip 
 out the prefix. 
 {code}
 0: jdbc:drill:zk=local create table region4 as select *, r_regionkey + 1 
 from cp.`tpch/region.parquet`;
 ++---+
 |  Fragment  | Number of records written |
 ++---+
 | 0_0| 5 |
 ++---+
 0: jdbc:drill:zk=local select * from region4;
 +-++---++
 | T2¦¦r_regionkey | T2¦¦r_name | T2¦¦r_comment |   EXPR$1   |
 +-++---++
 {code}
 The column names are correct if there is a regular column with the star 
 column. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1934) Failure to plan queries containing scalar subqueries without correlation

2015-01-22 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287938#comment-14287938
 ] 

Jinfeng Ni commented on DRILL-1934:
---

The cause for the failiing cases is in the logic of Calcite's decorrelator.  We 
probably will revisit this once Drill rebases it's own Optiq/Calcite.

 Failure to plan queries containing scalar subqueries without correlation
 

 Key: DRILL-1934
 URL: https://issues.apache.org/jira/browse/DRILL-1934
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni

 {code}
 #Fri Jan 02 21:20:47 EST 2015
 git.commit.id.abbrev=b491cdb
 {code}
 -- Failing queries
 {code}
 select * from cp.`tpch/nation.parquet` where n_regionkey  ( select 
 min(r_regionkey) from cp.`tpch/region.parquet`);
 select * from cp.`tpch/nation.parquet` where n_regionkey = ( select 
 avg(r_regionkey) from cp.`tpch/region.parquet`);
 {code}
 -- Works with correlation
 {code}
 select * from cp.`tpch/nation.parquet` n where n.n_regionkey = ( select 
 min(r.r_regionkey) from cp.`tpch/region.parquet` r where n.n_regionkey = 
 r.r_regionkey);
 {code}
 All three queries above have the very similar logical planis produced by 
 Calcite:
 {code}
 Query failed: Query failed: Unexpected exception during fragment 
 initialization: Node [rel#14825:Subset#6.LOGICAL.ANY([]).[]] could not be 
 implemented; planner state:
 Root: rel#14825:Subset#6.LOGICAL.ANY([]).[]
 Original rel:
 AbstractConverter(subset=[rel#14825:Subset#6.LOGICAL.ANY([]).[]], 
 convention=[LOGICAL], DrillDistributionTraitDef=[ANY([])], sort=[[]]): 
 rowcount = 1.7976931348623157E308, cumulative cost = {inf}, id = 14826
   ProjectRel(subset=[rel#14824:Subset#6.NONE.ANY([]).[]], *=[$0]): rowcount = 
 1.7976931348623157E308, cumulative cost = {1.7976931348623157E308 rows, 
 1.7976931348623157E308 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14823
 FilterRel(subset=[rel#14822:Subset#5.NONE.ANY([]).[]], condition=[=($1, 
 $2)]): rowcount = 2.6965397022934733E307, cumulative cost = 
 {2.6965397022934733E307 rows, 1.7976931348623157E308 cpu, 0.0 io, 0.0 
 network, 0.0 memory}, id = 14821
   JoinRel(subset=[rel#14820:Subset#4.NONE.ANY([]).[]], condition=[true], 
 joinType=[left]): rowcount = 1.7976931348623157E308, cumulative cost = 
 {1.7976931348623157E308 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
 14819
 
 EnumerableTableAccessRel(subset=[rel#14813:Subset#0.ENUMERABLE.ANY([]).[]], 
 table=[[cp, tpch/nation.parquet]]): rowcount = 100.0, cumulative cost = 
 {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14794
 AggregateRel(subset=[rel#14818:Subset#3.NONE.ANY([]).[]], group=[{}], 
 EXPR$0=[AVG($0)]): rowcount = 1.7976931348623158E307, cumulative cost = 
 {1.7976931348623158E307 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
 14817
   ProjectRel(subset=[rel#14816:Subset#2.NONE.ANY([]).[]], 
 r_regionkey=[$1]): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 
 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14815
 
 EnumerableTableAccessRel(subset=[rel#14814:Subset#1.ENUMERABLE.ANY([]).[]], 
 table=[[cp, tpch/region.parquet]]): rowcount = 100.0, cumulative cost = 
 {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14795
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1655) Cannot plan simple join + group by + order + limit query

2015-01-22 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288580#comment-14288580
 ] 

Jinfeng Ni commented on DRILL-1655:
---

I could re-produce this same issue with a little bit simplied query:

{code}
select u.name, count(*) as number, count(u.average_stars) stars 
from mongo.yelp.users u, dfs.tmp.namegender n 
where u.name= n.name  
group by u.name 
order by stars desc limit 1;
{code}

However, a similar query to join one parquet file with the view defined on a 
CSV file works fine ( The query itself is not meanful).

{code}
select n.n_name, count(*) as num, avg(n.n_nationkey) as star 
from cp.`tpch/nation.parquet` n, dfs.tmp.namegender n2  
where n.n_name = n2.name 
group by n.n_name 
order by star desc limit 1;
++++
|   n_name   |num |star|
++++
++++
No rows selected (0.67 seconds){code}
{code}

From the error trace, the cause of this issue might be related to mongodb.

 Cannot plan simple join + group by + order  + limit query
 -

 Key: DRILL-1655
 URL: https://issues.apache.org/jira/browse/DRILL-1655
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Tomer Shiran
Assignee: Jinfeng Ni
 Fix For: 0.8.0


 The first query here works fine. As soon as I add another field 
 (avg(u.average_stars) as stars) and order by on that field, it no longer 
 works.
 {code}
 0: jdbc:drill:zk=local select u.name, n.gender, count(*) as number from 
 mongo.yelp.users u, dfs.mydata.namegender n where u.name = n.name group by 
 u.name, n.gender order by number desc limit 1;
 ++++
 |name|   gender   |   number   |
 ++++
 | David  | Male   | 2453   |
 ++++
 1 row selected (6.008 seconds)
 0: jdbc:drill:zk=local select u.name, n.gender, count(*) as number, 
 avg(u.average_stars) stars from mongo.yelp.users u, dfs.mydata.namegender n 
 where u.name = n.name group by u.name, n.gender order by stars desc limit 1;
 Query failed: Failure while parsing sql. Node 
 [rel#38935:Subset#7.LOGICAL.ANY([]).[3 DESC]] could not be implemented; 
 planner state:
 Root: rel#38935:Subset#7.LOGICAL.ANY([]).[3 DESC]
 Original rel:
 AbstractConverter(subset=[rel#38935:Subset#7.LOGICAL.ANY([]).[3 DESC]], 
 convention=[LOGICAL], DrillDistributionTraitDef=[ANY([])], sort=[[3 DESC]]): 
 rowcount = 1.7976931348623157E308, cumulative cost = {inf}, id = 38936
   SortRel(subset=[rel#38934:Subset#7.NONE.ANY([]).[3 DESC]], sort0=[$3], 
 dir0=[DESC], fetch=[1]): rowcount = 1.7976931348623157E308, cumulative cost = 
 {Infinity rows, 1.7976931348623157E308 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
 id = 38933
 AggregateRel(subset=[rel#38932:Subset#6.NONE.ANY([]).[]], group=[{0, 1}], 
 number=[COUNT()], stars=[AVG($2)]): rowcount = 1.7976931348623158E307, 
 cumulative cost = {1.7976931348623158E307 rows, 0.0 cpu, 0.0 io, 0.0 network, 
 0.0 memory}, id = 38931
   ProjectRel(subset=[rel#38930:Subset#5.NONE.ANY([]).[]], name=[$1], 
 gender=[$3], name0=[$2]): rowcount Error: exception while executing query: 
 Failure while trying to get next result batch. (state=,code=0)
 = 1.7976931348623157E308, cumulative cost = {1.7976931348623157E308 rows, 
 Infinity cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 38929
 FilterRel(subset=[rel#38928:Subset#4.NONE.ANY([]).[]], 
 condition=[=($1, $2)]): rowcount = 2.6965397022934733E307, cumulative cost = 
 {2.6965397022934733E307 rows, 1.7976931348623157E308 cpu, 0.0 io, 0.0 
 network, 0.0 memory}, id = 38927
   JoinRel(subset=[rel#38926:Subset#3.NONE.ANY([]).[]], 
 condition=[true], joinType=[inner]): rowcount = 1.7976931348623157E308, 
 cumulative cost = {1.7976931348623157E308 rows, 0.0 cpu, 0.0 io, 0.0 network, 
 0.0 memory}, id = 38925
 
 EnumerableTableAccessRel(subset=[rel#38921:Subset#0.ENUMERABLE.ANY([]).[]], 
 table=[[mongo, yelp, users]]): rowcount = 100.0, cumulative cost = {100.0 
 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 38897
 ProjectRel(subset=[rel#38924:Subset#2.NONE.ANY([]).[]], 
 name=[ITEM($1, 0)], gender=[ITEM($1, 4)]): rowcount = 100.0, cumulative cost 
 = {100.0 rows, 200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 38923
   
 EnumerableTableAccessRel(subset=[rel#38922:Subset#1.ENUMERABLE.ANY([]).[]], 
 table=[[dfs, mydata, names.csv]]): rowcount = 100.0, cumulative cost = {100.0 
 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 38900
 Sets:
 Set#0, type: (DrillRecordRow[*, name, average_stars])
   rel#38921:Subset#0.ENUMERABLE.ANY([]).[], best=rel#38897,

[jira] [Commented] (DRILL-2041) Throw unsupported error message on GROUP BY/ORDER BY array/map types

2015-01-20 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284965#comment-14284965
 ] 

Jinfeng Ni commented on DRILL-2041:
---

For this particular issue, I'm going to make the error message easier to 
understand for end user, so that they know it's not allowed to compare a 
map/array or repeated scalar type, when they put such types of expression or 
column in GROUP BY, ORDER BY, or JOIN condition. 


I think there is an agreement that effort will be made to improve the 
error/exception reporting issue in general. Whether this should be thrown ad 
RpcException or as UnSupportedOperationException will be addressed in that 
effort ( probably in operators component).

 Throw unsupported error message on GROUP BY/ORDER BY array/map types
 

 Key: DRILL-2041
 URL: https://issues.apache.org/jira/browse/DRILL-2041
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni

 test.json
 {code}
 { CustomerId: 100, cityId: 10 , phoneNumber: { areaCode: 622, 
 number: 1567845}, years: [1990, 1993, 1998, 2008]}{code}
 {code}
 0: jdbc:drill:schema=dfs select years from `test.json` order by years;
 ++
 |   years|
 ++
 Query failed: RemoteRpcException: Failure while running fragment., Failure 
 finding function that runtime code generation expected.  Signature: 
 compare_to( VARCHAR:REPEATEDVARCHAR:REPEATED,  ) returns INT:REQUIRED [ 
 5e4a9574-1f9b-4753-ab1d-ef056639281f on atsqa4-133.qa.lab:31010 ]
 [ 5e4a9574-1f9b-4753-ab1d-ef056639281f on atsqa4-133.qa.lab:31010 ]
 java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
 query.
   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
   at sqlline.SqlLine.print(SqlLine.java:1809)
   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
   at sqlline.SqlLine.dispatch(SqlLine.java:889)
   at sqlline.SqlLine.begin(SqlLine.java:763)
   at sqlline.SqlLine.start(SqlLine.java:498)
   at sqlline.SqlLine.main(SqlLine.java:460)
 0: jdbc:drill:schema=dfs select years from `test.json` group by years;
 Query failed: RemoteRpcException: Failure while running fragment., Failure 
 finding function that runtime code generation expected.  Signature: 
 compare_to( VARCHAR:REPEATEDVARCHAR:REPEATED,  ) returns INT:REQUIRED [ 
 be1d64d5-bb03-4d5d-87b0-5cd8aa32f007 on atsqa4-133.qa.lab:31010 ]
 [ be1d64d5-bb03-4d5d-87b0-5cd8aa32f007 on atsqa4-133.qa.lab:31010 ]
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}
 {code}
 0: jdbc:drill:schema=dfs select * from `test.json` order by phoneNumber;
 +++-++
 | CustomerId |   cityId   | phoneNumber |   years|
 +++-++
 Query failed: RemoteRpcException: Failure while running fragment., Failure 
 finding function that runtime code generation expected.  Signature: 
 compare_to( MAP:REQUIREDMAP:REQUIRED,  ) returns INT:REQUIRED [ 
 cbef0629-b393-4fbd-9c16-941ec8835d36 on atsqa4-133.qa.lab:31010 ]
 [ cbef0629-b393-4fbd-9c16-941ec8835d36 on atsqa4-133.qa.lab:31010 ]
 java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
 query.
   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
   at sqlline.SqlLine.print(SqlLine.java:1809)
   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
   at sqlline.SqlLine.dispatch(SqlLine.java:889)
   at sqlline.SqlLine.begin(SqlLine.java:763)
   at sqlline.SqlLine.start(SqlLine.java:498)
   at sqlline.SqlLine.main(SqlLine.java:460)
 0: jdbc:drill:schema=dfs select phoneNumber from `test.json` group by 
 phoneNumber;
 Query failed: RemoteRpcException: Failure while running fragment., Failure 
 finding function that runtime code generation expected.  Signature: 
 compare_to( MAP:REQUIREDMAP:REQUIRED,  ) returns INT:REQUIRED [ 
 fbd3ca20-5d5a-4831-a066-a2d780547e66 on atsqa4-133.qa.lab:31010 ]
 [ fbd3ca20-5d5a-4831-a066-a2d780547e66 on atsqa4-133.qa.lab:31010 ]
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2030) CTAS with SELECT * and expression creates column names with prefix

2015-01-21 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285583#comment-14285583
 ] 

Jinfeng Ni commented on DRILL-2030:
---

Committed as [a8036d2]

 CTAS with SELECT * and expression creates column names with prefix
 --

 Key: DRILL-2030
 URL: https://issues.apache.org/jira/browse/DRILL-2030
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.7.0
Reporter: Aman Sinha
Assignee: Jinfeng Ni
 Fix For: 0.8.0

 Attachments: 
 0001-DRILL-2030-fix-column-prefix-for-CTAS-when-query-has.patch


 Doing a CTAS with the star column and an expression creates columns that 
 contain the table prefix 'T||' .  Looks like the top project did not strip 
 out the prefix. 
 {code}
 0: jdbc:drill:zk=local create table region4 as select *, r_regionkey + 1 
 from cp.`tpch/region.parquet`;
 ++---+
 |  Fragment  | Number of records written |
 ++---+
 | 0_0| 5 |
 ++---+
 0: jdbc:drill:zk=local select * from region4;
 +-++---++
 | T2¦¦r_regionkey | T2¦¦r_name | T2¦¦r_comment |   EXPR$1   |
 +-++---++
 {code}
 The column names are correct if there is a regular column with the star 
 column. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1563) IndexOutOfBoundException during change traits when aggregating over subquery with order-by

2015-01-22 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288108#comment-14288108
 ] 

Jinfeng Ni commented on DRILL-1563:
---

This seems to be have been fixed.  I run a simliar query on today's master 
branch, and got correct results. 

{code}

0: jdbc:drill:zk=local select count(*) from (select o_orderkey, o_custkey from 
cp.`tpch/orders.parquet` order by o_orderkey, o_custkey);
++
|   EXPR$0   |
++
| 15000  |
++
1 row selected (0.491 seconds)
0: jdbc:drill:zk=local select * from sys.version;
+++-+-++
| commit_id  | commit_message | commit_time | build_email | build_time |
+++-+-++
| 8d1e1affe86a5adca3bc17eeaf7520f0d379a393 | DRILL-1932: Query fails if the 
first record batch does not contain any record | 20.01.2015 @ 23:02:03 PST | 
j...@maprtech.com | 22.01.2015 @ 11:19:02 PST |
+++-+-++
1 row selected (0.108 seconds)

{code}

Can you re-run the original query? If the issue is still there, please re-open 
it.

 IndexOutOfBoundException during change traits when aggregating over subquery 
 with order-by 
 ---

 Key: DRILL-1563
 URL: https://issues.apache.org/jira/browse/DRILL-1563
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.6.0
Reporter: Aman Sinha
Assignee: Jinfeng Ni
 Fix For: 0.8.0


 On TPCH SF1, doing a plain aggregation on top of a subquery that does 
 order-by of 2 or more columns gives IOBE.  This happens during planning, when 
 processing traits.  See below. 
 0: jdbc:drill:zk=local select count(*) from (select o_orderkey, o_custkey 
 from orders order by o_orderkey, o_custkey);
 Query failed: Failure while setting up Foreman. index (1) must be less than 
 size (1) 
 Query failed: Failure while setting up Foreman. index (1) must be less than 
 size (1) [4f2cc753-655e-47a1-a8f0-38e62eb79a2b]
 Here's the stack trace: 
 java.lang.IndexOutOfBoundsException: index (1) must be less than size (1)
   
 com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305)
  ~[guava-14.0.1.jar:na]
   
 com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284)
  ~[guava-14.0.1.jar:na]
   
 com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:45)
  ~[guava-14.0.1.jar:na]
   org.eigenbase.rex.RexBuilder.makeInputRef(RexBuilder.java:764) 
 ~[optiq-core-0.9-drill-r4.jar:na]
   org.eigenbase.rel.SortRel.init(SortRel.java:94) 
 ~[optiq-core-0.9-drill-r4.jar:na]
   org.eigenbase.rel.SortRel.init(SortRel.java:59) 
 ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.eigenbase.rel.RelCollationTraitDef.convert(RelCollationTraitDef.java:77) 
 ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.eigenbase.rel.RelCollationTraitDef.convert(RelCollationTraitDef.java:36) 
 ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.eigenbase.relopt.volcano.VolcanoPlanner.changeTraitsUsingConverters(VolcanoPlanner.java:1010)
  ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.eigenbase.relopt.volcano.VolcanoPlanner.changeTraitsUsingConverters(VolcanoPlanner.java:1102)
  ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.eigenbase.relopt.volcano.AbstractConverter$ExpandConversionRule.onMatch(AbstractConverter.java:107)
  ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223)
  ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661)
  ~[optiq-core-0.9-drill-r4.jar:na]
   
 net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) 
 ~[optiq-core-0.9-drill-r4.jar:na]
   
 net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:273) 
 ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:165)
  ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating]
   
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:134)
  ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating]
   
 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:132)
  ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating]
   org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:425) 
 ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating]
   org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:219)

[jira] [Resolved] (DRILL-1563) IndexOutOfBoundException during change traits when aggregating over subquery with order-by

2015-01-22 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-1563.
---
Resolution: Fixed

 IndexOutOfBoundException during change traits when aggregating over subquery 
 with order-by 
 ---

 Key: DRILL-1563
 URL: https://issues.apache.org/jira/browse/DRILL-1563
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.6.0
Reporter: Aman Sinha
Assignee: Jinfeng Ni
 Fix For: 0.8.0


 On TPCH SF1, doing a plain aggregation on top of a subquery that does 
 order-by of 2 or more columns gives IOBE.  This happens during planning, when 
 processing traits.  See below. 
 0: jdbc:drill:zk=local select count(*) from (select o_orderkey, o_custkey 
 from orders order by o_orderkey, o_custkey);
 Query failed: Failure while setting up Foreman. index (1) must be less than 
 size (1) 
 Query failed: Failure while setting up Foreman. index (1) must be less than 
 size (1) [4f2cc753-655e-47a1-a8f0-38e62eb79a2b]
 Here's the stack trace: 
 java.lang.IndexOutOfBoundsException: index (1) must be less than size (1)
   
 com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305)
  ~[guava-14.0.1.jar:na]
   
 com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284)
  ~[guava-14.0.1.jar:na]
   
 com.google.common.collect.SingletonImmutableList.get(SingletonImmutableList.java:45)
  ~[guava-14.0.1.jar:na]
   org.eigenbase.rex.RexBuilder.makeInputRef(RexBuilder.java:764) 
 ~[optiq-core-0.9-drill-r4.jar:na]
   org.eigenbase.rel.SortRel.init(SortRel.java:94) 
 ~[optiq-core-0.9-drill-r4.jar:na]
   org.eigenbase.rel.SortRel.init(SortRel.java:59) 
 ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.eigenbase.rel.RelCollationTraitDef.convert(RelCollationTraitDef.java:77) 
 ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.eigenbase.rel.RelCollationTraitDef.convert(RelCollationTraitDef.java:36) 
 ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.eigenbase.relopt.volcano.VolcanoPlanner.changeTraitsUsingConverters(VolcanoPlanner.java:1010)
  ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.eigenbase.relopt.volcano.VolcanoPlanner.changeTraitsUsingConverters(VolcanoPlanner.java:1102)
  ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.eigenbase.relopt.volcano.AbstractConverter$ExpandConversionRule.onMatch(AbstractConverter.java:107)
  ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223)
  ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661)
  ~[optiq-core-0.9-drill-r4.jar:na]
   
 net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) 
 ~[optiq-core-0.9-drill-r4.jar:na]
   
 net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:273) 
 ~[optiq-core-0.9-drill-r4.jar:na]
   
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:165)
  ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating]
   
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:134)
  ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating]
   
 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:132)
  ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating]
   org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:425) 
 ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating]
   org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:219) 
 ~[drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating]
   
 org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:250)
  [drill-java-exec-0.6.0-incubating-rebuffed.jar:0.6.0-incubating]
   
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_45]
   
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_45]
   java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-2199) Wrong results while using case expression with decimal data type

2015-02-10 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-2199:
-

Assignee: Mehant Baid  (was: Jinfeng Ni)

 Wrong results while using case expression with decimal data type
 

 Key: DRILL-2199
 URL: https://issues.apache.org/jira/browse/DRILL-2199
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Attachments: DRILL-2199.patch


 Consider the below SQL statement:
 select 
 case when true then cast(employee_id as decimal(15, 5)) else cast('0.0' as 
 decimal(5, 2)) end 
 from cp.`employee.json` where employee_id = 1
 This should return 1.0 instead Drill returns 10. In the case 
 statement the first expression (condition expression) is nullable and the 
 else expression is non-nullable. For our codegen to work correctly we make 
 the non-nullable expression also nullable using convertToNullableXHolder() 
 functions. These functions however do not preserve the scale and precision of 
 the input, hence the output we get is the unscaled value of the decimal.
 Fix is to make the converToNullableDecimalHolder() functions preserve the 
 scale and precision by returning the correct major type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2054) Support `||` as concat operator for varchar inputs

2015-02-18 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326400#comment-14326400
 ] 

Jinfeng Ni commented on DRILL-2054:
---

LGTM. 

+1


 Support `||` as concat operator for varchar inputs
 

 Key: DRILL-2054
 URL: https://issues.apache.org/jira/browse/DRILL-2054
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Sean Hsuan-Yi Chu
Assignee: Jinfeng Ni
 Fix For: Future

 Attachments: DRILL-2054.1.patch


 For example, the query:
 select n_nationkey || ' + ' || n_name || ' = ' as CONCAT, n_nationkey, ' + ' 
 as PLUS, n_name from cp.`tpch/nation.parquet`
 should print out 
 CONCATn_nationkey PLUSn_name
 0 + ALGERIA = 0+  ALGERIA
 1 + ARGENTINA =   1+  ARGENTINA
 2 + BRAZIL =  2+  BRAZIL
 3 + CANADA =  3+  CANADA
 ... ... ... (rows/records are omitted due to space limitation)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-2268) Applying flatten after a join fails with IOBE

2015-02-18 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-2268:
-

Assignee: Jinfeng Ni  (was: Jason Altekruse)

 Applying flatten after a join  fails with IOBE
 --

 Key: DRILL-2268
 URL: https://issues.apache.org/jira/browse/DRILL-2268
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Reporter: Rahul Challapalli
Assignee: Jinfeng Ni
Priority: Critical

 git.commit.id.abbrev=6676f2d
 Data Set :
 {code}
 {
   uid:1,
   lst_lst : [[1,2],[3,4]]
 } 
 {
   uid:2,
   lst_lst : [[1,2],[3,4]]
 }
 {code}
 The below query fails :
 {code}
 select flatten(t1.lst_lst) from `temp.json` t1 inner join `temp.json` t2 on 
 t1.uid=t2.uid;
 Query failed: RemoteRpcException: Failure while running fragment., index: -4, 
 length: 4 (expected: range(0, 16384)) [ e32cd0c6-a84a-4bbb-9812-bc0f7de17a68 
 on qa-node190.qa.lab:31010 ]
 [ e32cd0c6-a84a-4bbb-9812-bc0f7de17a68 on qa-node190.qa.lab:31010 ]
 {code}
 Let me know if you have any questions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2269) Provide default implementation for getting cost of evaluating an expression in base class of LogicalExpression, in stead of throw Exception

2015-02-18 Thread Jinfeng Ni (JIRA)

Jinfeng Ni created DRILL-2269:
-

 Summary: Provide default implementation for getting cost of 
evaluating an expression in base class of LogicalExpression, in stead of throw 
Exception 
 Key: DRILL-2269
 URL: https://issues.apache.org/jira/browse/DRILL-2269
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jinfeng Ni


Today, the abstract class of LogicalExpression will throw Exception for two 
methods related to estimate the cost of evaluating an expression, in the hope 
that sub classes will override those two methods. However, if someone forgets 
to override, compiler and mvn build process will still be successful, yet will 
hit Exception in execution-time, complaining the sub class does not implement 
these methods.  As Drill's adding more sub classes of LogicalExpression to the 
code base, it's more likely for someone to forget to override those two 
methods, making it error-prone to just throw Exception in the base class.

In stead, we should provide a default implementation for those two methods in 
the base class. Certain sub classes will override to change the behavior. 
  

 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-2269) Provide default implementation for getting cost of evaluating an expression in base class of LogicalExpression, in stead of throw Exception

2015-02-18 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-2269:
-

Assignee: Jinfeng Ni

 Provide default implementation for getting cost of evaluating an expression 
 in base class of LogicalExpression, in stead of throw Exception 
 

 Key: DRILL-2269
 URL: https://issues.apache.org/jira/browse/DRILL-2269
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni

 Today, the abstract class of LogicalExpression will throw Exception for two 
 methods related to estimate the cost of evaluating an expression, in the hope 
 that sub classes will override those two methods. However, if someone forgets 
 to override, compiler and mvn build process will still be successful, yet 
 will hit Exception in execution-time, complaining the sub class does not 
 implement these methods.  As Drill's adding more sub classes of 
 LogicalExpression to the code base, it's more likely for someone to forget to 
 override those two methods, making it error-prone to just throw Exception in 
 the base class.
 In stead, we should provide a default implementation for those two methods in 
 the base class. Certain sub classes will override to change the behavior. 
   
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2168) Wrong result on grouping by expression involving CONCAT function

2015-02-12 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318783#comment-14318783
 ] 

Jinfeng Ni commented on DRILL-2168:
---

Uploaded the patches. The patches include a unit test case in Drill, and fix in 
Optiq/Calcite. Part of fix in Optiq/Calcite is submitted to Calcite master as a 
pull request.

[~amansinha100], can you please review the patches? Thanks.



 Wrong result on grouping by expression involving CONCAT function
 

 Key: DRILL-2168
 URL: https://issues.apache.org/jira/browse/DRILL-2168
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Critical
 Fix For: 0.9.0

 Attachments: 
 0001-CALCITE-593-SqlValidator-in-Frameworks-should-by-def.patch, 
 0001-DRILL-2168-Fix-incorrect-query-result-issue-when-gro.patch, 
 0002-Make-sure-expression-in-the-select-list-is-expanded-.patch, 
 0003-Bump-to-version-r18.1.patch, 0_0_0.parquet


 {code}
 0: jdbc:drill:schema=dfs select * from test;
 +++
 | a1 | b1 |
 +++
 | 51237400   | 2014-03-16 03:55:21.0 |
 | -691523338 | 2014-02-13 15:47:22.0 |
 | -1843395360 | 2014-01-14 03:31:27.0 |
 | 1095015454 | 2014-09-27 17:35:21.0 |
 | -48817354  | 2014-01-17 16:14:25.0 |
 | 59892266   | 2014-04-23 05:08:34.0 |
 | -827965492 | 2014-06-21 07:29:12.0 |
 | -1281245980 | 2014-10-25 15:49:46.0 |
 | -1778510302 | 2014-07-31 04:11:53.0 |
 | 1346460386 | 2014-07-31 04:11:53.0 |
 +++
 10 rows selected (0.072 seconds)
 0: jdbc:drill:schema=dfs select
 . . . . . . . . . . . .  concat(cast(extract(day from b1) as 
 varchar(10)), '-', cast(extract(month from b1) as 
 varchar(10)),'-',cast(extract(year from b1) as varchar(10))),
 . . . . . . . . . . . .  sum(a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  test
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  concat(cast(extract(day from b1) as 
 varchar(10)), '-', cast(extract(month from b1) as 
 varchar(10)),'-',cast(extract(year from b1) as varchar(10)));
 +++
 |   EXPR$0   |   EXPR$1   |
 +++
 | 1-1-1970   | 51237400   |
 | 23-12-1969 | -691523338 |
 | 10-12-1969 | -1843395360 |
 | 13-1-1970  | 1095015454 |
 | 31-12-1969 | -48817354  |
 | 1-1-1970   | 59892266   |
 | 22-12-1969 | -827965492 |
 | 17-12-1969 | -1281245980 |
 | 26-12-1969 | -432049916 |
 +++
 9 rows selected (0.103 seconds)
 {code}
 Query plan:
 {code}
 00-01  Project(EXPR$0=[$0], EXPR$1=[$1])
 00-02Project(EXPR$0=[concat(CAST(EXTRACT(FLAG(DAY), $1)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', 
 CAST(EXTRACT(FLAG(MONTH), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE 
 ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $1)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], EXPR$1=[$1])
 00-03  HashAgg(group=[{0}], EXPR$1=[SUM($1)])
 00-04Project($f0=[CONCAT(CAST(EXTRACT(FLAG(DAY), $0)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', 
 CAST(EXTRACT(FLAG(MONTH), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE 
 ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $0)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], a1=[$1])
 00-05  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/aggregation/test]], 
 selectionRoot=/aggregation/test, numFiles=1, columns=[`b1`, `a1`]]])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2168) Wrong result on grouping by expression involving CONCAT function

2015-02-12 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2168:
--
Attachment: 0003-Bump-to-version-r18.1.patch
0002-Make-sure-expression-in-the-select-list-is-expanded-.patch
0001-CALCITE-593-SqlValidator-in-Frameworks-should-by-def.patch
0001-DRILL-2168-Fix-incorrect-query-result-issue-when-gro.patch

 Wrong result on grouping by expression involving CONCAT function
 

 Key: DRILL-2168
 URL: https://issues.apache.org/jira/browse/DRILL-2168
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Critical
 Fix For: 0.9.0

 Attachments: 
 0001-CALCITE-593-SqlValidator-in-Frameworks-should-by-def.patch, 
 0001-DRILL-2168-Fix-incorrect-query-result-issue-when-gro.patch, 
 0002-Make-sure-expression-in-the-select-list-is-expanded-.patch, 
 0003-Bump-to-version-r18.1.patch, 0_0_0.parquet


 {code}
 0: jdbc:drill:schema=dfs select * from test;
 +++
 | a1 | b1 |
 +++
 | 51237400   | 2014-03-16 03:55:21.0 |
 | -691523338 | 2014-02-13 15:47:22.0 |
 | -1843395360 | 2014-01-14 03:31:27.0 |
 | 1095015454 | 2014-09-27 17:35:21.0 |
 | -48817354  | 2014-01-17 16:14:25.0 |
 | 59892266   | 2014-04-23 05:08:34.0 |
 | -827965492 | 2014-06-21 07:29:12.0 |
 | -1281245980 | 2014-10-25 15:49:46.0 |
 | -1778510302 | 2014-07-31 04:11:53.0 |
 | 1346460386 | 2014-07-31 04:11:53.0 |
 +++
 10 rows selected (0.072 seconds)
 0: jdbc:drill:schema=dfs select
 . . . . . . . . . . . .  concat(cast(extract(day from b1) as 
 varchar(10)), '-', cast(extract(month from b1) as 
 varchar(10)),'-',cast(extract(year from b1) as varchar(10))),
 . . . . . . . . . . . .  sum(a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  test
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  concat(cast(extract(day from b1) as 
 varchar(10)), '-', cast(extract(month from b1) as 
 varchar(10)),'-',cast(extract(year from b1) as varchar(10)));
 +++
 |   EXPR$0   |   EXPR$1   |
 +++
 | 1-1-1970   | 51237400   |
 | 23-12-1969 | -691523338 |
 | 10-12-1969 | -1843395360 |
 | 13-1-1970  | 1095015454 |
 | 31-12-1969 | -48817354  |
 | 1-1-1970   | 59892266   |
 | 22-12-1969 | -827965492 |
 | 17-12-1969 | -1281245980 |
 | 26-12-1969 | -432049916 |
 +++
 9 rows selected (0.103 seconds)
 {code}
 Query plan:
 {code}
 00-01  Project(EXPR$0=[$0], EXPR$1=[$1])
 00-02Project(EXPR$0=[concat(CAST(EXTRACT(FLAG(DAY), $1)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', 
 CAST(EXTRACT(FLAG(MONTH), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE 
 ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $1)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], EXPR$1=[$1])
 00-03  HashAgg(group=[{0}], EXPR$1=[SUM($1)])
 00-04Project($f0=[CONCAT(CAST(EXTRACT(FLAG(DAY), $0)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', 
 CAST(EXTRACT(FLAG(MONTH), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE 
 ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $0)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], a1=[$1])
 00-05  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/aggregation/test]], 
 selectionRoot=/aggregation/test, numFiles=1, columns=[`b1`, `a1`]]])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2168) Wrong result on grouping by expression involving CONCAT function

2015-02-12 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318789#comment-14318789
 ] 

Jinfeng Ni commented on DRILL-2168:
---

Some additional comments:

1. The incorrect plan / query results only happens when the query uses drill 
build-in functions, or drill UDF. It will not happen to Optiq/Calcite built-in 
function. Part of the reason is the function resolution logic for drill 
functions and Optiq/Calcite built-in functions happen at different stages.  

2. If the query uses upper-case in both the group by expressions and select 
expression, then the query should get correct plan / query results ; indicating 
the issue was caused by the function naming resolution / validation / 
conversion logic.



 Wrong result on grouping by expression involving CONCAT function
 

 Key: DRILL-2168
 URL: https://issues.apache.org/jira/browse/DRILL-2168
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Critical
 Fix For: 0.9.0

 Attachments: 
 0001-CALCITE-593-SqlValidator-in-Frameworks-should-by-def.patch, 
 0001-DRILL-2168-Fix-incorrect-query-result-issue-when-gro.patch, 
 0002-Make-sure-expression-in-the-select-list-is-expanded-.patch, 
 0003-Bump-to-version-r18.1.patch, 0_0_0.parquet


 {code}
 0: jdbc:drill:schema=dfs select * from test;
 +++
 | a1 | b1 |
 +++
 | 51237400   | 2014-03-16 03:55:21.0 |
 | -691523338 | 2014-02-13 15:47:22.0 |
 | -1843395360 | 2014-01-14 03:31:27.0 |
 | 1095015454 | 2014-09-27 17:35:21.0 |
 | -48817354  | 2014-01-17 16:14:25.0 |
 | 59892266   | 2014-04-23 05:08:34.0 |
 | -827965492 | 2014-06-21 07:29:12.0 |
 | -1281245980 | 2014-10-25 15:49:46.0 |
 | -1778510302 | 2014-07-31 04:11:53.0 |
 | 1346460386 | 2014-07-31 04:11:53.0 |
 +++
 10 rows selected (0.072 seconds)
 0: jdbc:drill:schema=dfs select
 . . . . . . . . . . . .  concat(cast(extract(day from b1) as 
 varchar(10)), '-', cast(extract(month from b1) as 
 varchar(10)),'-',cast(extract(year from b1) as varchar(10))),
 . . . . . . . . . . . .  sum(a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  test
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  concat(cast(extract(day from b1) as 
 varchar(10)), '-', cast(extract(month from b1) as 
 varchar(10)),'-',cast(extract(year from b1) as varchar(10)));
 +++
 |   EXPR$0   |   EXPR$1   |
 +++
 | 1-1-1970   | 51237400   |
 | 23-12-1969 | -691523338 |
 | 10-12-1969 | -1843395360 |
 | 13-1-1970  | 1095015454 |
 | 31-12-1969 | -48817354  |
 | 1-1-1970   | 59892266   |
 | 22-12-1969 | -827965492 |
 | 17-12-1969 | -1281245980 |
 | 26-12-1969 | -432049916 |
 +++
 9 rows selected (0.103 seconds)
 {code}
 Query plan:
 {code}
 00-01  Project(EXPR$0=[$0], EXPR$1=[$1])
 00-02Project(EXPR$0=[concat(CAST(EXTRACT(FLAG(DAY), $1)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', 
 CAST(EXTRACT(FLAG(MONTH), $1)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE 
 ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $1)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], EXPR$1=[$1])
 00-03  HashAgg(group=[{0}], EXPR$1=[SUM($1)])
 00-04Project($f0=[CONCAT(CAST(EXTRACT(FLAG(DAY), $0)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary, '-', 
 CAST(EXTRACT(FLAG(MONTH), $0)):VARCHAR(10) CHARACTER SET ISO-8859-1 COLLATE 
 ISO-8859-1$en_US$primary, '-', CAST(EXTRACT(FLAG(YEAR), $0)):VARCHAR(10) 
 CHARACTER SET ISO-8859-1 COLLATE ISO-8859-1$en_US$primary)], a1=[$1])
 00-05  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/aggregation/test]], 
 selectionRoot=/aggregation/test, numFiles=1, columns=[`b1`, `a1`]]])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2239) Confusing error message Column 't1' is ambiguous for _table_ name t1

2015-02-12 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319334#comment-14319334
 ] 

Jinfeng Ni commented on DRILL-2239:
---

The error message seems to be fine.

In ON clause, expression ( t1 IS NULL AND t2 IS NULL) , t1 is interpreted 
as a column name.  Unqualified identifier in the expression is interpreted as a 
column name. Qualified identifier like t1.key,  the qualifier is interpreted 
as table name or alias, and the second part is interpreted as column name.




 Confusing error message Column 't1' is ambiguous for _table_ name t1
 --

 Key: DRILL-2239
 URL: https://issues.apache.org/jira/browse/DRILL-2239
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Reporter: Daniel Barclay (Drill/MapR)
Assignee: Aman Sinha
Priority: Minor

 The error message about the erroneous use of t1 in t1 IS NULL in this 
 query:
  select * 
 from   dfs_test.`xxx/jsoninput/nullable1U.json` t1
left outer join dfs_test.`xxx/jsoninput/nullable2U.json` t2
   on ( t1.key = t2.key OR ( t1 IS NULL AND t2 IS NULL) )
 says Column 't1' is ambiguous even though t1 is not a column name and isn't 
 ambiguous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2236) Optimize hash inner join by swapping inputs based on row count comparison.

2015-02-12 Thread Jinfeng Ni (JIRA)

Jinfeng Ni created DRILL-2236:
-

 Summary: Optimize hash inner join by swapping inputs based on row 
count comparison. 
 Key: DRILL-2236
 URL: https://issues.apache.org/jira/browse/DRILL-2236
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni


Currently, Drill's planner does not consider all the possible join order 
sequence during the planning phase, because one particular optimizer rule 
(SwapJoinrule) is not enabled.  The reason of not enabling this rule is because 
it would increase the planning time significantly otherwise.

This means that the join sequence for some queries might not be optimal;  the 
sequence in the FROM clause would impact what the final join sequence the 
planner would get. For example, 

{code}
select c.c_custkey, c.c_name, n.n_name 
from nation n, customer c 
where n.n_nationkey = c.c_nationkey;
{code}

The nation table contains 25 rows, while customer table contains 1.5 
million rows.  The optimal plan should put customer on the left side of hash 
inner join, and nation on the right side, since hash table is built on right 
side, and we would like to have hash table built on smaller dataset. 

{code}
select count(*) from customer;
++
| EXPR$0 |
++
| 150 |
++

select count(*) from nation;
++
| EXPR$0 |
++
| 25 |
++
{code}

However, currently Drill planner will get the following join sequence : NATION 
-- CUSTOMER.

{code}
00-01  Project(c_custkey=[$0], c_name=[$1], n_name=[$2])
00-02Project(c_custkey=[$3], c_name=[$4], n_name=[$1])
00-03  HashJoin(condition=[=($0, $2)], joinType=[inner])
00-05Project(n_nationkey=[$1], n_name=[$0])
00-06  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/Users/jni/work/data/tpch-sf10/nation]], 
selectionRoot=/Users/jni/work/data/tpch-sf10/nation, numFiles=1, 
columns=[`n_nationkey`, `n_name`]]])
00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/Users/jni/work/data/tpch-sf10/customer]], 
selectionRoot=/Users/jni/work/data/tpch-sf10/customer, numFiles=1, 
columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
{code}

Notice in the above plan, LEFT is nation table, while RIGHT is customer 
table. 

Before we resolve the increased planning time related to SwapJoinRule, as a 
workaround for now, I would like to propose that we swap the inputs for hash 
inner join, after the planner finishes the planning. That is, when we build the 
physical plan to be run on Drill's execution engine, we swap the inputs for 
hash inner join physical operators, based on row count comparison. 

The proposed workaround could cause performance regression for some queries, in 
particularly because the estimated row count is not accurate (especially after 
Filter / Join / Aggregation) due to lack of complete statistics. 

To remedy that regression risk, we will add a new planner option for this 
feature, so that user could turn on/off this feature, if they see performance 
regression. 

With this feature enabled, the above query will get the plan like :

{code}
00-01  Project(c_custkey=[$0], c_name=[$1], n_name=[$2])
00-02Project(c_custkey=[$3], c_name=[$4], n_name=[$1])
00-03  HashJoin(condition=[=($0, $2)], joinType=[inner])
00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/Users/jni/work/data/tpch-sf10/customer]], 
selectionRoot=/Users/jni/work/data/tpch-sf10/customer, numFiles=1, 
columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
00-05Project(n_nationkey=[$1], n_name=[$0])
00-06  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/Users/jni/work/data/tpch-sf10/nation]], 
selectionRoot=/Users/jni/work/data/tpch-sf10/nation, numFiles=1, 
columns=[`n_nationkey`, `n_name`]]])
{code}

Please note that once we resolve the issue of SwapJoinRule, we should  remove 
this workaround solution in Drill's code.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2239) Confusing error message Column 't1' is ambiguous for _table_ name t1

2015-02-12 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319338#comment-14319338
 ] 

Jinfeng Ni commented on DRILL-2239:
---

Why ambiguous? The reason is that Drill is schema-less. In a join, any column 
reference should be qualified; otherwise Drill does not know which table 
contains this column.

In the query, t1 in the expression of  t1 IS NULL AND t2 IS NULL is 
interpreted as column name, and planner does not know which table has column 
t1, hence complain ambiguous .



 Confusing error message Column 't1' is ambiguous for _table_ name t1
 --

 Key: DRILL-2239
 URL: https://issues.apache.org/jira/browse/DRILL-2239
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Reporter: Daniel Barclay (Drill/MapR)
Assignee: Aman Sinha
Priority: Minor

 The error message about the erroneous use of t1 in t1 IS NULL in this 
 query:
  select * 
 from   dfs_test.`xxx/jsoninput/nullable1U.json` t1
left outer join dfs_test.`xxx/jsoninput/nullable2U.json` t2
   on ( t1.key = t2.key OR ( t1 IS NULL AND t2 IS NULL) )
 says Column 't1' is ambiguous even though t1 is not a column name and isn't 
 ambiguous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2236) Optimize hash inner join by swapping inputs based on row count comparison.

2015-02-12 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2236:
--
Attachment: 0001-DRILL-2236-Optimize-hash-inner-join-by-swapping-inpu.patch

 Optimize hash inner join by swapping inputs based on row count comparison. 
 ---

 Key: DRILL-2236
 URL: https://issues.apache.org/jira/browse/DRILL-2236
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni
 Attachments: 
 0001-DRILL-2236-Optimize-hash-inner-join-by-swapping-inpu.patch, 
 0001-DRILL-2236-Optimize-hash-inner-join-by-swapping-inpu.patch


 Currently, Drill's planner does not consider all the possible join order 
 sequence during the planning phase, because one particular optimizer rule 
 (SwapJoinrule) is not enabled.  The reason of not enabling this rule is 
 because it would increase the planning time significantly otherwise.
 This means that the join sequence for some queries might not be optimal;  the 
 sequence in the FROM clause would impact what the final join sequence the 
 planner would get. For example, 
 {code}
 select c.c_custkey, c.c_name, n.n_name 
 from nation n, customer c 
 where n.n_nationkey = c.c_nationkey;
 {code}
 The nation table contains 25 rows, while customer table contains 1.5 
 million rows.  The optimal plan should put customer on the left side of 
 hash inner join, and nation on the right side, since hash table is built on 
 right side, and we would like to have hash table built on smaller dataset. 
 {code}
 select count(*) from customer;
 ++
 | EXPR$0 |
 ++
 | 150 |
 ++
 select count(*) from nation;
 ++
 | EXPR$0 |
 ++
 | 25 |
 ++
 {code}
 However, currently Drill planner will get the following join sequence : 
 NATION -- CUSTOMER.
 {code}
 00-01  Project(c_custkey=[$0], c_name=[$1], n_name=[$2])
 00-02Project(c_custkey=[$3], c_name=[$4], n_name=[$1])
 00-03  HashJoin(condition=[=($0, $2)], joinType=[inner])
 00-05Project(n_nationkey=[$1], n_name=[$0])
 00-06  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath 
 [path=file:/Users/jni/work/data/tpch-sf10/nation]], 
 selectionRoot=/Users/jni/work/data/tpch-sf10/nation, numFiles=1, 
 columns=[`n_nationkey`, `n_name`]]])
 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
 [path=file:/Users/jni/work/data/tpch-sf10/customer]], 
 selectionRoot=/Users/jni/work/data/tpch-sf10/customer, numFiles=1, 
 columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
 {code}
 Notice in the above plan, LEFT is nation table, while RIGHT is customer 
 table. 
 Before we resolve the increased planning time related to SwapJoinRule, as a 
 workaround for now, I would like to propose that we swap the inputs for hash 
 inner join, after the planner finishes the planning. That is, when we build 
 the physical plan to be run on Drill's execution engine, we swap the inputs 
 for hash inner join physical operators, based on row count comparison. 
 The proposed workaround could cause performance regression for some queries, 
 in particularly because the estimated row count is not accurate (especially 
 after Filter / Join / Aggregation) due to lack of complete statistics. 
 To remedy that regression risk, we will add a new planner option for this 
 feature, so that user could turn on/off this feature, if they see performance 
 regression. 
 With this feature enabled, the above query will get the plan like :
 {code}
 00-01  Project(c_custkey=[$0], c_name=[$1], n_name=[$2])
 00-02Project(c_custkey=[$3], c_name=[$4], n_name=[$1])
 00-03  HashJoin(condition=[=($0, $2)], joinType=[inner])
 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
 [path=file:/Users/jni/work/data/tpch-sf10/customer]], 
 selectionRoot=/Users/jni/work/data/tpch-sf10/customer, numFiles=1, 
 columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
 00-05Project(n_nationkey=[$1], n_name=[$0])
 00-06  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath 
 [path=file:/Users/jni/work/data/tpch-sf10/nation]], 
 selectionRoot=/Users/jni/work/data/tpch-sf10/nation, numFiles=1, 
 columns=[`n_nationkey`, `n_name`]]])
 {code}
 Please note that once we resolve the issue of SwapJoinRule, we should  remove 
 this workaround solution in Drill's code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2236) Optimize hash inner join by swapping inputs based on row count comparison.

2015-02-12 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2236:
--
Attachment: 0001-DRILL-2236-Optimize-hash-inner-join-by-swapping-inpu.patch

 Optimize hash inner join by swapping inputs based on row count comparison. 
 ---

 Key: DRILL-2236
 URL: https://issues.apache.org/jira/browse/DRILL-2236
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni
 Attachments: 
 0001-DRILL-2236-Optimize-hash-inner-join-by-swapping-inpu.patch


 Currently, Drill's planner does not consider all the possible join order 
 sequence during the planning phase, because one particular optimizer rule 
 (SwapJoinrule) is not enabled.  The reason of not enabling this rule is 
 because it would increase the planning time significantly otherwise.
 This means that the join sequence for some queries might not be optimal;  the 
 sequence in the FROM clause would impact what the final join sequence the 
 planner would get. For example, 
 {code}
 select c.c_custkey, c.c_name, n.n_name 
 from nation n, customer c 
 where n.n_nationkey = c.c_nationkey;
 {code}
 The nation table contains 25 rows, while customer table contains 1.5 
 million rows.  The optimal plan should put customer on the left side of 
 hash inner join, and nation on the right side, since hash table is built on 
 right side, and we would like to have hash table built on smaller dataset. 
 {code}
 select count(*) from customer;
 ++
 | EXPR$0 |
 ++
 | 150 |
 ++
 select count(*) from nation;
 ++
 | EXPR$0 |
 ++
 | 25 |
 ++
 {code}
 However, currently Drill planner will get the following join sequence : 
 NATION -- CUSTOMER.
 {code}
 00-01  Project(c_custkey=[$0], c_name=[$1], n_name=[$2])
 00-02Project(c_custkey=[$3], c_name=[$4], n_name=[$1])
 00-03  HashJoin(condition=[=($0, $2)], joinType=[inner])
 00-05Project(n_nationkey=[$1], n_name=[$0])
 00-06  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath 
 [path=file:/Users/jni/work/data/tpch-sf10/nation]], 
 selectionRoot=/Users/jni/work/data/tpch-sf10/nation, numFiles=1, 
 columns=[`n_nationkey`, `n_name`]]])
 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
 [path=file:/Users/jni/work/data/tpch-sf10/customer]], 
 selectionRoot=/Users/jni/work/data/tpch-sf10/customer, numFiles=1, 
 columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
 {code}
 Notice in the above plan, LEFT is nation table, while RIGHT is customer 
 table. 
 Before we resolve the increased planning time related to SwapJoinRule, as a 
 workaround for now, I would like to propose that we swap the inputs for hash 
 inner join, after the planner finishes the planning. That is, when we build 
 the physical plan to be run on Drill's execution engine, we swap the inputs 
 for hash inner join physical operators, based on row count comparison. 
 The proposed workaround could cause performance regression for some queries, 
 in particularly because the estimated row count is not accurate (especially 
 after Filter / Join / Aggregation) due to lack of complete statistics. 
 To remedy that regression risk, we will add a new planner option for this 
 feature, so that user could turn on/off this feature, if they see performance 
 regression. 
 With this feature enabled, the above query will get the plan like :
 {code}
 00-01  Project(c_custkey=[$0], c_name=[$1], n_name=[$2])
 00-02Project(c_custkey=[$3], c_name=[$4], n_name=[$1])
 00-03  HashJoin(condition=[=($0, $2)], joinType=[inner])
 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
 [path=file:/Users/jni/work/data/tpch-sf10/customer]], 
 selectionRoot=/Users/jni/work/data/tpch-sf10/customer, numFiles=1, 
 columns=[`c_nationkey`, `c_custkey`, `c_name`]]])
 00-05Project(n_nationkey=[$1], n_name=[$0])
 00-06  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath 
 [path=file:/Users/jni/work/data/tpch-sf10/nation]], 
 selectionRoot=/Users/jni/work/data/tpch-sf10/nation, numFiles=1, 
 columns=[`n_nationkey`, `n_name`]]])
 {code}
 Please note that once we resolve the issue of SwapJoinRule, we should  remove 
 this workaround solution in Drill's code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-2271) count(distinct ) against SystemTablePlugin will hit plan serialization/deserialization issue when executed in distributed mode.

2015-02-19 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-2271.
---
Resolution: Duplicate

 count(distinct ) against SystemTablePlugin will hit plan 
 serialization/deserialization issue when executed in distributed mode.
 ---

 Key: DRILL-2271
 URL: https://issues.apache.org/jira/browse/DRILL-2271
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jinfeng Ni
Priority: Minor

 The following query run fine in single node. 
 {code}
 select count(distinct name) as cnt from `sys`.`options`;
 {code}
 However, if change the slice_target =1 and make it run in distributed mode, 
 it hit error :
 {code}
 alter session set `planner.slice_target` = 1;
 +++
 | ok |  summary   |
 +++
 | true   | planner.slice_target updated. |
 +++
 select count(distinct name) as cnt from `sys`.`options`;
 Query failed: RemoteRpcException: Failure while trying to start remote 
 fragment, No suitable constructor found for type [simple type, class 
 org.apache.drill.exec.store.sys.SystemTablePlugin]: can not instantiate from 
 JSON object (need to add/enable type information?)
 {code}
 Seems the serialization/deserialization of the plan for SystemTablePlugin 
 does not work correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2242) Wrong result (more rows) when outer query groups by subset of columns that inner query groups by

2015-02-18 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2242:
--
Attachment: 0002-DRILL-2242-Propagate-distribution-trait-when-Project.patch

[~amansinha100], could you please review the patch? Thanks! 

 Wrong result (more rows) when outer query groups by subset of columns that 
 inner query groups by
 

 Key: DRILL-2242
 URL: https://issues.apache.org/jira/browse/DRILL-2242
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.7.0
Reporter: Aman Sinha
Assignee: Jinfeng Ni
Priority: Critical
 Attachments: 
 0002-DRILL-2242-Propagate-distribution-trait-when-Project.patch


 The following query has a subquery that groups on 2 columns and outer query 
 that queries on 1 of those columns.  With slice_target = 1 to force 
 exchanges, it produces incorrect result: 
 {code}
 alter session set `planner.slice_target` = 1;
 select count(*) from 
  (select l_partksy from
(select l_partkey, l_suppkey from cp.`tpch/lineitem.parquet`
   group by l_partkey, l_suppkey) 
group by l_partkey
  );
 ++
 |   EXPR$0   |
 ++
 | 6227   |
 ++
 1 row selected (1.522 seconds)
 {code}
 Correct result (from Postgres): 
 {code}
  count
 ---
   2000
 (1 row)
 {code}
 The cause appears to be related to distribution trait propagation.  Here's 
 the EXPLAIN plan: 
 {code}
 +++
 |text|json|
 +++
 | 00-00Screen
 00-01  StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
 00-02UnionExchange
 01-01  StreamAgg(group=[{}], EXPR$0=[COUNT()])
 01-02Project($f0=[0])
 01-03  HashAgg(group=[{0}])
 01-04Project(l_partkey=[$0])
 01-05  HashAgg(group=[{0, 1}])
 01-06HashToRandomExchange(dist0=[[$0]], dist1=[[$1]])
 02-01  HashAgg(group=[{0, 1}])
 02-02Project(l_partkey=[$1], l_suppkey=[$0])
 02-03  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=/tpch/lineitem.parquet]], 
 selectionRoot=/tpch/lineitem.parquet, numFiles=1, columns=[`l_partkey`, 
 `l_suppkey`]]])
 {code}
 Note that the HashExchange operator 06 does a distribute on 2 columns 
 l_partkey and l_suppkey in order to perform the 2phase aggregation. These are 
 the group-by columns.  However, in the outer query's HashAgg, there is no 
 re-distribution being done.  It assumes that data is already hash distributed 
 on l_partkey which is not correct. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2269) Provide default implementation for getting cost of evaluating an expression in base class of LogicalExpression, in stead of throw Exception

2015-02-18 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2269:
--
Attachment: 0001-DRILL-2269-Add-default-implementation-for-estimating.patch

[~amansinha100], could you please review the patch? Thanks.



 Provide default implementation for getting cost of evaluating an expression 
 in base class of LogicalExpression, in stead of throw Exception 
 

 Key: DRILL-2269
 URL: https://issues.apache.org/jira/browse/DRILL-2269
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni
 Attachments: 
 0001-DRILL-2269-Add-default-implementation-for-estimating.patch


 Today, the abstract class of LogicalExpression will throw Exception for two 
 methods related to estimate the cost of evaluating an expression, in the hope 
 that sub classes will override those two methods. However, if someone forgets 
 to override, compiler and mvn build process will still be successful, yet 
 will hit Exception in execution-time, complaining the sub class does not 
 implement these methods.  As Drill's adding more sub classes of 
 LogicalExpression to the code base, it's more likely for someone to forget to 
 override those two methods, making it error-prone to just throw Exception in 
 the base class.
 In stead, we should provide a default implementation for those two methods in 
 the base class. Certain sub classes will override to change the behavior. 
   
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2087) Some planner option setting might produce a wrong plan and wrong query result for join query.

2015-01-27 Thread Jinfeng Ni (JIRA)

Jinfeng Ni created DRILL-2087:
-

 Summary: Some planner option setting might produce a wrong plan 
and wrong query result for join query.
 Key: DRILL-2087
 URL: https://issues.apache.org/jira/browse/DRILL-2087
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni


I have the following parquet data ( essentially tpch's nation/region tables, 
but break into multiple parts).

{code}
ls -l nation
total 24
-rw-r--r--@ 1 jni  staff  1376 Jan 27 16:47 01.parquet
-rw-r--r--@ 1 jni  staff  1400 Jan 27 16:47 02.parquet
-rw-r--r--@ 1 jni  staff  1279 Jan 27 16:47 03.parquet

ls -l region
total 24
-rw-r--r--@ 1 jni  staff  564 Jan 27 16:54 01.parquet
-rw-r--r--@ 1 jni  staff  491 Jan 27 16:54 02.parquet
-rw-r--r--@ 1 jni  staff  506 Jan 27 16:54 03.parquet
{code}

With the default planner setting, the following query produce the correct 
result :
{code}

0: jdbc:drill:zk=local select t1.n_nationkey from 
dfs.`/Users/jni/work/data/parquet/region` t2 join 
dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = t2.r_regionkey;
SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
+-+
| n_nationkey |
+-+
| 18  |
| 12  |
| 9   |
| 8   |
| 21  |
| 19  |
| 7   |
| 6   |
| 23  |
| 22  |
| 20  |
| 13  |
| 11  |
| 10  |
| 4   |
| 0   |
| 16  |
| 15  |
| 14  |
| 5   |
| 24  |
| 17  |
| 3   |
| 2   |
| 1   |
+-+
25 rows selected (2.057 seconds)
{code}

However, if I modify the planner option (`planner.slice_target` and 
`planner.join.row_count_estimate_factor`  ) as following, then the same query 
would return 0 row, which is not correct. 

{code}
0: jdbc:drill:zk=local alter session set `planner.slice_target` = 10;
+++
| ok |  summary   |
+++
| true   | planner.slice_target updated. |
+++
1 row selected (0.093 seconds)
0: jdbc:drill:zk=local alter session set 
`planner.join.row_count_estimate_factor` = 0.1;
+++
| ok |  summary   |
+++
| true   | planner.join.row_count_estimate_factor updated. |
+++
1 row selected (0.041 seconds)
0: jdbc:drill:zk=local select t1.n_nationkey from 
dfs.`/Users/jni/work/data/parquet/region` t2 join 
dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = t2.r_regionkey;
+-+
| n_nationkey |
+-+
+-+
No rows selected (0.71 seconds)

{code}

If we look at explain plan result, after changing the planner option, we will 
see for the JOIN operator, the LEFT does not have any EXCHANGE operator, while 
the RIGHT has HashToRandomExchange operator.  That seems to be not a valid plan.

{code}
00-00Screen
00-01  UnionExchange
01-01Project(n_nationkey=[$2])
01-02  HashJoin(condition=[=($1, $0)], joinType=[inner])
01-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/Users/jni/work/data/parquet/region]], 
selectionRoot=/Users/jni/work/data/parquet/region, numFiles=1, 
columns=[`r_regionkey`]]])
01-03HashToRandomExchange(dist0=[[$0]])
02-01  Project(n_regionkey=[$1], n_nationkey=[$0])
02-02Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/nation]], 
selectionRoot=/Users/jni/work/data/parquet/nation, numFiles=1, 
columns=[`n_regionkey`, `n_nationkey`]]])
{code} 

The cause of this problem seems to be that Drill will  remove EXCHANGE operator 
under some conditions.  That condition does not guarantee that a JOIN operator 
always either has EXCHANGE on both sides, or none. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2107) Hash Join throw IOBE for a query with exists subquery.

2015-01-28 Thread Jinfeng Ni (JIRA)

Jinfeng Ni created DRILL-2107:
-

 Summary: Hash Join throw IOBE for a query with exists subquery. 
 Key: DRILL-2107
 URL: https://issues.apache.org/jira/browse/DRILL-2107
 Project: Apache Drill
  Issue Type: New Feature
  Components: Execution - Operators
Reporter: Jinfeng Ni
Assignee: Chris Westin
Priority: Critical


I hit an IOBE for TestTpchDistributed Q4, when I tried to enable an optimizer 
rule.  Then, I simplified Q4 to the following, and still re-produce the same 
IOBE.

{code}
select
  o.o_orderpriority
from
  cp.`tpch/orders.parquet` o
where
  exists (
select
  *
from
  cp.`tpch/lineitem.parquet` l
where
  l.l_orderkey = o.o_orderkey
  )
;
{code}

Stack trace of the exception:
{code}
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_45]
 at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_45]
 at 
org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:232)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById(RecordBatchLoader.java:149)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getValueAccessorById(UnorderedReceiverBatch.java:132)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
org.apache.drill.exec.test.generated.HashTableGen307.doSetup(HashTableTemplate.java:71)
 ~[na:na]
 at 
org.apache.drill.exec.test.generated.HashTableGen307.updateBatches(HashTableTemplate.java:473)
 ~[na:na]
 at 
org.apache.drill.exec.test.generated.HashJoinProbeGen313.executeProbePhase(HashJoinProbeTemplate.java:139)
 ~[na:na]
 at 
org.apache.drill.exec.test.generated.HashJoinProbeGen313.probeAndProject(HashJoinProbeTemplate.java:223)
 ~[na:na]
 at 
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:227)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]

{code}

The physical plan seems to be correct, after enabling the new rule. Actually, 
if I disable HashJoin, and use merge join for the query, it works fine. So, 
seems the IOBE exposes some bug in HashJoin.

To re-produce this issue,  two options:
 1 )  - Modify DrillRuleSets.java, remove the comment before SwapJoinRule 
   - alter session set `planner.slice_target` = 10;
   - run the query
 
 2) use the attached physical plan in json file, and use submitplan to submit 
the physical plan.

For comparison, I also attached the physical plan when disabling hashjoin (use 
merge join), and the explain plan at physical operator level. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2107) Hash Join throw IOBE for a query with exists subquery.

2015-01-28 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2107:
--
Attachment: q4_1_mj.json
q4_1_mj_phy.txt
q4_1_hj.json
q4_1_hj_phy.txt

q4_1_hj_phy.txt :  the explain plan output when HashJoin is enabled. 
q4_1_hj.json : the physical plan in JSON format when HashJoin is enabled. 
q4_1_mj_phy.txt: the explain plan output when MergeJoin is used.
q4_1_mj.json:  the physical plan in JSON format when MergeJoin is used. 

To submit the physical plan in JSON format directly, use the following command:

{
distribution/target/apache-drill-0.8.0-SNAPSHOT/apache-drill-0.8.0-SNAPSHOT/bin/submit_plan
 -f q4_1_hj.json -t physical -l
}


 Hash Join throw IOBE for a query with exists subquery. 
 ---

 Key: DRILL-2107
 URL: https://issues.apache.org/jira/browse/DRILL-2107
 Project: Apache Drill
  Issue Type: New Feature
  Components: Execution - Operators
Reporter: Jinfeng Ni
Assignee: Chris Westin
Priority: Critical
 Attachments: q4_1_hj.json, q4_1_hj_phy.txt, q4_1_mj.json, 
 q4_1_mj_phy.txt


 I hit an IOBE for TestTpchDistributed Q4, when I tried to enable an optimizer 
 rule.  Then, I simplified Q4 to the following, and still re-produce the same 
 IOBE.
 {code}
 select
   o.o_orderpriority
 from
   cp.`tpch/orders.parquet` o
 where
   exists (
 select
   *
 from
   cp.`tpch/lineitem.parquet` l
 where
   l.l_orderkey = o.o_orderkey
   )
 ;
 {code}
 Stack trace of the exception:
 {code}
 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
  at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_45]
  at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_45]
  at 
 org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:232)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
  at 
 org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById(RecordBatchLoader.java:149)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
  at 
 org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getValueAccessorById(UnorderedReceiverBatch.java:132)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
  at 
 org.apache.drill.exec.test.generated.HashTableGen307.doSetup(HashTableTemplate.java:71)
  ~[na:na]
  at 
 org.apache.drill.exec.test.generated.HashTableGen307.updateBatches(HashTableTemplate.java:473)
  ~[na:na]
  at 
 org.apache.drill.exec.test.generated.HashJoinProbeGen313.executeProbePhase(HashJoinProbeTemplate.java:139)
  ~[na:na]
  at 
 org.apache.drill.exec.test.generated.HashJoinProbeGen313.probeAndProject(HashJoinProbeTemplate.java:223)
  ~[na:na]
  at 
 org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:227)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 
 {code}
 The physical plan seems to be correct, after enabling the new rule. Actually, 
 if I disable HashJoin, and use merge join for the query, it works fine. So, 
 seems the IOBE exposes some bug in HashJoin.
 To re-produce this issue,  two options:
  1 )  - Modify DrillRuleSets.java, remove the comment before SwapJoinRule 
- alter session set `planner.slice_target` = 10;
- run the query
  
  2) use the attached physical plan in json file, and use submitplan to 
 submit the physical plan.
 For comparison, I also attached the physical plan when disabling hashjoin 
 (use merge join), and the explain plan at physical operator level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2107) Hash Join throw IOBE for a query with exists subquery.

2015-01-28 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2107:
--
Assignee: Mehant Baid  (was: Chris Westin)

 Hash Join throw IOBE for a query with exists subquery. 
 ---

 Key: DRILL-2107
 URL: https://issues.apache.org/jira/browse/DRILL-2107
 Project: Apache Drill
  Issue Type: New Feature
  Components: Execution - Operators
Reporter: Jinfeng Ni
Assignee: Mehant Baid
Priority: Critical
 Attachments: q4_1_hj.json, q4_1_hj_phy.txt, q4_1_mj.json, 
 q4_1_mj_phy.txt


 I hit an IOBE for TestTpchDistributed Q4, when I tried to enable an optimizer 
 rule.  Then, I simplified Q4 to the following, and still re-produce the same 
 IOBE.
 {code}
 select
   o.o_orderpriority
 from
   cp.`tpch/orders.parquet` o
 where
   exists (
 select
   *
 from
   cp.`tpch/lineitem.parquet` l
 where
   l.l_orderkey = o.o_orderkey
   )
 ;
 {code}
 Stack trace of the exception:
 {code}
 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
  at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_45]
  at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_45]
  at 
 org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:232)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
  at 
 org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById(RecordBatchLoader.java:149)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
  at 
 org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getValueAccessorById(UnorderedReceiverBatch.java:132)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
  at 
 org.apache.drill.exec.test.generated.HashTableGen307.doSetup(HashTableTemplate.java:71)
  ~[na:na]
  at 
 org.apache.drill.exec.test.generated.HashTableGen307.updateBatches(HashTableTemplate.java:473)
  ~[na:na]
  at 
 org.apache.drill.exec.test.generated.HashJoinProbeGen313.executeProbePhase(HashJoinProbeTemplate.java:139)
  ~[na:na]
  at 
 org.apache.drill.exec.test.generated.HashJoinProbeGen313.probeAndProject(HashJoinProbeTemplate.java:223)
  ~[na:na]
  at 
 org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:227)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 
 {code}
 The physical plan seems to be correct, after enabling the new rule. Actually, 
 if I disable HashJoin, and use merge join for the query, it works fine. So, 
 seems the IOBE exposes some bug in HashJoin.
 To re-produce this issue,  two options:
  1 )  - Modify DrillRuleSets.java, remove the comment before SwapJoinRule 
- alter session set `planner.slice_target` = 10;
- run the query
  
  2) use the attached physical plan in json file, and use submitplan to 
 submit the physical plan.
 For comparison, I also attached the physical plan when disabling hashjoin 
 (use merge join), and the explain plan at physical operator level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2074) Queries fail with OutOfMemory Exception when Hash Join Agg are turned off

2015-01-29 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2074:
--
Component/s: (was: Query Planning  Optimization)
 Execution - Operators
   Assignee: Chris Westin  (was: Jinfeng Ni)

 Queries fail with OutOfMemory Exception when Hash Join  Agg are turned off
 ---

 Key: DRILL-2074
 URL: https://issues.apache.org/jira/browse/DRILL-2074
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Operators
Reporter: Abhishek Girish
Assignee: Chris Westin
 Attachments: 05_par1000.q, 05_par1000_d6e54ab.logical.plan, 
 05_par1000_d6e54ab.verbose.plan, drill-env.sh


 Query attached. 
 Hash Join and Hash Agg were turned off. And the following property was added 
 to drill-override.conf:
 sort: {
 purge.threshold : 100,
 external: {
   batch.size : 4000,
   spill: {
 batch.size : 4000,
 group.size : 100,
 threshold : 200,
 directories : [ /drill_spill ],
 fs : maprfs:///
   }
 }
   }
 Query failed with the below error message:
 Query failed: RemoteRpcException: Failure while running fragment., Unable to 
 allocate sv2 buffer after repeated attempts [ 
 faf3044a-e14a-427b-b66d-7bcd7522ead5 on drone-42:31010 ]
 [ faf3044a-e14a-427b-b66d-7bcd7522ead5 on drone-42:31010 ]
 Log Snippets:
 2015-01-26 20:07:33,239 atsqa8c42.qa.lab 
 [2b396307-2c1e-3486-90bc-fbaf09fbeb3e:frag:15:51] ERROR 
 o.a.d.e.w.f.AbstractStatusReporter - Error 
 faf3044a-e14a-427b-b66d-7bcd7522ead5: Failure while running fragment.
 java.lang.RuntimeException: 
 org.apache.drill.exec.memory.OutOfMemoryException: Unable to allocate sv2 
 buffer after repeated attempts
 at 
 org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:309)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:96)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.physical.impl.join.JoinStatus.nextRight(JoinStatus.java:80)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.physical.impl.join.JoinStatus.ensureInitial(JoinStatus.java:95)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.physical.impl.join.MergeJoinBatch.innerNext(MergeJoinBatch.java:147)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:96)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) 
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at

[jira] [Commented] (DRILL-2074) Queries fail with OutOfMemory Exception when Hash Join Agg are turned off

2015-01-29 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297340#comment-14297340
 ] 

Jinfeng Ni commented on DRILL-2074:
---

This exception is thrown in physical operator during execution time. Change the 
component to Execution-operators.

 Queries fail with OutOfMemory Exception when Hash Join  Agg are turned off
 ---

 Key: DRILL-2074
 URL: https://issues.apache.org/jira/browse/DRILL-2074
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Operators
Reporter: Abhishek Girish
Assignee: Chris Westin
 Attachments: 05_par1000.q, 05_par1000_d6e54ab.logical.plan, 
 05_par1000_d6e54ab.verbose.plan, drill-env.sh


 Query attached. 
 Hash Join and Hash Agg were turned off. And the following property was added 
 to drill-override.conf:
 sort: {
 purge.threshold : 100,
 external: {
   batch.size : 4000,
   spill: {
 batch.size : 4000,
 group.size : 100,
 threshold : 200,
 directories : [ /drill_spill ],
 fs : maprfs:///
   }
 }
   }
 Query failed with the below error message:
 Query failed: RemoteRpcException: Failure while running fragment., Unable to 
 allocate sv2 buffer after repeated attempts [ 
 faf3044a-e14a-427b-b66d-7bcd7522ead5 on drone-42:31010 ]
 [ faf3044a-e14a-427b-b66d-7bcd7522ead5 on drone-42:31010 ]
 Log Snippets:
 2015-01-26 20:07:33,239 atsqa8c42.qa.lab 
 [2b396307-2c1e-3486-90bc-fbaf09fbeb3e:frag:15:51] ERROR 
 o.a.d.e.w.f.AbstractStatusReporter - Error 
 faf3044a-e14a-427b-b66d-7bcd7522ead5: Failure while running fragment.
 java.lang.RuntimeException: 
 org.apache.drill.exec.memory.OutOfMemoryException: Unable to allocate sv2 
 buffer after repeated attempts
 at 
 org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:309)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:96)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.physical.impl.join.JoinStatus.nextRight(JoinStatus.java:80)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.physical.impl.join.JoinStatus.ensureInitial(JoinStatus.java:95)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.physical.impl.join.MergeJoinBatch.innerNext(MergeJoinBatch.java:147)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:96)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) 
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at

[jira] [Commented] (DRILL-2012) Flatten fails when we filter by a non-projected column

2015-01-29 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297342#comment-14297342
 ] 

Jinfeng Ni commented on DRILL-2012:
---

Sean, 

Can you please take a look at this issue? thx.

 Flatten  fails when we filter by a non-projected column
 ---

 Key: DRILL-2012
 URL: https://issues.apache.org/jira/browse/DRILL-2012
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Rahul Challapalli
Assignee: Sean Hsuan-Yi Chu
Priority: Critical
 Attachments: error.log


 git.commit.id.abbrev=b491cdb
 {code}
 {
 id: 0001,
 type: donut,
 name: Cake,
 ppu: 0.55,
 batters:
 {
 batter:
 [
 { id: 1001, type: Regular },
 { id: 1002, type: Chocolate },
 { id: 1003, type: Blueberry },
 { id: 1004, type: Devil's Food }
 ]
 },
 topping:
 [
 { id: 5001, type: None },
 { id: 5002, type: Glazed },
 { id: 5005, type: Sugar },
 { id: 5007, type: Powdered Sugar },
 { id: 5006, type: Chocolate with Sprinkles },
 { id: 5003, type: Chocolate },
 { id: 5004, type: Maple }
 ]
 }
 {code}
 The below query fails :
 {code}
 select j.id id, j.name name, flatten(j.topping) tt, flatten(j.batters.batter) 
 bb from `sample.json` j where j.type = 'donut';
 Query failed: Query failed: Failure while running fragment., Trying to 
 flatten a non-repeated filed. [ 98ff5db0-4aa7-4f65-9a7a-93ae8bc41f6b on 
 qa-node190.qa.lab:31010 ]
 [ 98ff5db0-4aa7-4f65-9a7a-93ae8bc41f6b on qa-node190.qa.lab:31010 ]
 {code}
 However if I include the filter column in the list of selected columns, then 
 the query succeeds.
 {code}
 select j.type, j.id id, j.name name, flatten(j.topping) tt, 
 flatten(j.batters.batter) bb from `user.json` j where j.type = 'donut';
 ++++++
 |type| id |name| tt | bb |
 ++++++
 | donut  | 0001   | Cake   | {id:5001,type:None} | 
 {id:1001,type:Regular} |
 | donut  | 0001   | Cake   | {id:5001,type:None} | 
 {id:1002,type:Chocolate} |
 | donut  | 0001   | Cake   | {id:5001,type:None} | 
 {id:1003,type:Blueberry} |
 | donut  | 0001   | Cake   | {id:5001,type:None} | 
 {id:1004,type:Devil's Food} |
 | donut  | 0001   | Cake   | {id:5002,type:Glazed} | 
 {id:1001,type:Regular} |
 | donut  | 0001   | Cake   | {id:5002,type:Glazed} | 
 {id:1002,type:Chocolate} |
 | donut  | 0001   | Cake   | {id:5002,type:Glazed} | 
 {id:1003,type:Blueberry} |
 | donut  | 0001   | Cake   | {id:5002,type:Glazed} | 
 {id:1004,type:Devil's Food} |
 | donut  | 0001   | Cake   | {id:5005,type:Sugar} | 
 {id:1001,type:Regular} |
 | donut  | 0001   | Cake   | {id:5005,type:Sugar} | 
 {id:1002,type:Chocolate} |
 | donut  | 0001   | Cake   | {id:5005,type:Sugar} | 
 {id:1003,type:Blueberry} |
 | donut  | 0001   | Cake   | {id:5005,type:Sugar} | 
 {id:1004,type:Devil's Food} |
 | donut  | 0001   | Cake   | {id:5007,type:Powdered 
 Sugar} | {id:1001,type:Regular} |
 | donut  | 0001   | Cake   | {id:5007,type:Powdered 
 Sugar} | {id:1002,type:Chocolate} |
 | donut  | 0001   | Cake   | {id:5007,type:Powdered 
 Sugar} | {id:1003,type:Blueberry} |
 | donut  | 0001   | Cake   | {id:5007,type:Powdered 
 Sugar} | {id:1004,type:Devil's Food} |
 | donut  | 0001   | Cake   | {id:5006,type:Chocolate with 
 Sprinkles} | {id:1001,type:Regular} |
 | donut  | 0001   | Cake   | {id:5006,type:Chocolate with 
 Sprinkles} | {id:1002,type:Chocolate} |
 | donut  | 0001   | Cake   | {id:5006,type:Chocolate with 
 Sprinkles} | {id:1003,type:Blueberry} |
 | donut  | 0001   | Cake   | {id:5006,type:Chocolate with 
 Sprinkles} | {id:1004,type:Devil's Food} |
 | donut  | 0001   | Cake   | {id:5003,type:Chocolate} | 
 {id:1001,type:Regular} |
 | donut  | 0001   | Cake   | {id:5003,type:Chocolate} | 
 {id:1002,type:Chocolate} |
 | donut  | 0001   | Cake   | {id:5003,type:Chocolate} | 
 {id:1003,type:Blueberry} |
 | donut  | 0001   | Cake   | {id:5003,type:Chocolate} | 
 {id:1004,type:Devil's Food} |
 | donut  | 0001   | Cake   | {id:5004,type:Maple} | 
 {id:1001,type:Regular} |
 | donut  | 0001   | Cake   | {id:5004,type:Maple} | 
 {id:1002,type:Chocolate} |
 | donut

[jira] [Updated] (DRILL-2115) Disable function CARDINALITY in grammar

2015-01-29 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2115:
--
Assignee: Sean Hsuan-Yi Chu  (was: Jinfeng Ni)

 Disable function CARDINALITY in grammar
 ---

 Key: DRILL-2115
 URL: https://issues.apache.org/jira/browse/DRILL-2115
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Sean Hsuan-Yi Chu
Priority: Minor

 Since we don't support multiset type, we should disable CARDINALITY function 
 as well:
 {code}
 0: jdbc:drill:schema=dfs select cardinality(list) from `test.json`;
 Query failed: Query stopped., Failure while trying to materialize incoming 
 schema.  Errors:
  
 Error in expression at index -1.  Error: Missing function implementation: 
 [cardinality(BIGINT-REPEATED)].  Full expression: --UNKNOWN EXPRESSION--.. [ 
 db86cf79-6083-4ad7-afa1-fac534b942bc on atsqa4-134.qa.lab:31010 ]
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 0: jdbc:drill:schema=dfs select cardinality(id) from `test.json`;
 Query failed: Query stopped., Failure while trying to materialize incoming 
 schema.  Errors:
  
 Error in expression at index -1.  Error: Missing function implementation: 
 [cardinality(BIGINT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--.. [ 
 1f209cb2-4d60-4c1d-b1d0-d921f8e8a913 on atsqa4-134.qa.lab:31010 ]
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 0: jdbc:drill:schema=dfs select cardinality() from `test.json`;
 Query failed: SqlValidatorException: Invalid number of arguments to function 
 'CARDINALITY'. Was expecting 1 arguments
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 0: jdbc:drill:schema=dfs select cardinality('aa') from `test.json`;
 Query failed: SqlValidatorException: Cannot apply 'CARDINALITY' to arguments 
 of type 'CARDINALITY(CHAR(2))'. Supported form(s): 'CARDINALITY(MULTISET)'
 'CARDINALITY(ARRAY)'
 'CARDINALITY(MAP)'
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery

2015-01-27 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2069:
--
Attachment: 0001-DRILL-2069-Fix-star-query-prefix-logic-it-s-used-wit.patch

 Star is not expanded correctly in the query with IN clause containing subquery
 --

 Key: DRILL-2069
 URL: https://issues.apache.org/jira/browse/DRILL-2069
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Critical
 Attachments: 
 0001-DRILL-2069-Fix-star-query-prefix-logic-it-s-used-wit.patch


 t1.json
 {code}
 { a1: aa, b1: 1 }
 { a1: bb, b1: 2 }
 { a1: cc, b1: 3 }
 {code}
 t2.json
 {code}
 { a2: aa, b2: 1 }
 { a2: bb, b2: 2 }
 { a2: xx, b2: 10 }
 {code}
 Star is expanded incorrectly, we should get only columns from `t1.json`
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` where a1 in (select a2 from 
 `t2.json`);
 +++++
 | a2 | a1 | b1 |a10 |
 +++++
 | aa | aa | 1  | aa |
 | bb | bb | 2  | bb |
 +++++
 2 rows selected (0.172 seconds)
 {code}
 explain plan
 {code}
 00-01  Project(*=[$0])
 00-02Project(*=[$0])
 00-03  HashJoin(condition=[=($1, $2)], joinType=[inner])
 00-05Project(*=[$0], a1=[$1])
 00-07  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], 
 files=[maprfs:/test/t1.json]]])
 00-04HashAgg(group=[{0}])
 00-06  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], 
 files=[maprfs:/test/t2.json]]])
 {code}
 Workaround - specify columns explicitly
 {code}
 0: jdbc:drill:schema=dfs select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 
 in (select t2.a2 from `t2.json` t2);
 +++
 | a1 |a10 |
 +++
 | aa | aa |
 | bb | bb |
 +++
 2 rows selected (0.24 seconds)
 {code}
 Note to myself: include cases like below during verification:
 {code}
 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in 
 (select a2, b2 from `t2.json`);
 +++++++
 | a2 | b2 | a1 | b1 |a10 |b10 
 |
 +++++++
 | aa | 1  | aa | 1  | aa | 1  
 |
 | bb | 2  | bb | 2  | bb | 2  
 |
 +++++++
 2 rows selected (0.323 seconds)
 0: jdbc:drill:schema=dfs select * from `t1.json` t1 where (a1, b1) in 
 (select * from `t2.json`);
 Query failed: SqlValidatorException: Values passed to IN operator must have 
 compatible types
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2016) Remove multiset data type from supported grammar

2015-01-28 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2016:
--
Assignee: Sean Hsuan-Yi Chu  (was: Jinfeng Ni)

 Remove multiset data type from supported grammar
 --

 Key: DRILL-2016
 URL: https://issues.apache.org/jira/browse/DRILL-2016
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Sean Hsuan-Yi Chu

 Should throw unsupported error
 {code}
 0: jdbc:drill:schema=dfs select cast(a1 as multiset) from `t1.json`;
 Query failed: Query failed: Unexpected exception during fragment 
 initialization: use createMultisetType() instead
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2008) select a || b from ... fails while select cast(a as varchar) || cast(b as varchar) from ...

2015-01-28 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2008:
--
Assignee: Sean Hsuan-Yi Chu  (was: Jinfeng Ni)

 select a || b from ... fails while select cast(a as varchar) || cast(b as 
 varchar) from ...
 ---

 Key: DRILL-2008
 URL: https://issues.apache.org/jira/browse/DRILL-2008
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Sean Hsuan-Yi Chu
Assignee: Sean Hsuan-Yi Chu
Priority: Critical

 The first query would fail at planning by optiq/calcite. 
 For example,
 select n_name || n_name from cp.`tpch/nation.parquet`;
 org.apache.drill.exec.rpc.RpcException: Query failed: Unexpected exception 
 during fragment initialization: null
   at 
 org.apache.drill.exec.rpc.user.QueryResultHandler.batchArrived(QueryResultHandler.java:79)
   at 
 org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:93)
   at 
 org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:52)
   at 
 org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:34)
   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:58)
   at 
 org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:194)
   at 
 org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:173)
   at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
   at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
   at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
   at 
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
   at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
   at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
   at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:161)
   at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
   at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
   at 
 io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
   at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
   at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
   at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
   at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
   at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1969) Error message thrown while querying Hive Views from Drill must be improved

2015-01-09 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271870#comment-14271870
 ] 

Jinfeng Ni commented on DRILL-1969:
---

+1.

Fix looks good to me for this issue.

Seems this is not the only place where we did not display the root cause of the 
exception. We may want to consider fixing the rest of them as well later. 

 Error message thrown while querying Hive Views from Drill must be improved
 --

 Key: DRILL-1969
 URL: https://issues.apache.org/jira/browse/DRILL-1969
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Hive
Affects Versions: 0.7.0, 0.8.0
Reporter: Abhishek Girish
Assignee: Venki Korukanti
 Fix For: 0.8.0

 Attachments: 
 0001-DRILL-1969-Improve-the-error-message-displayed-when-.patch


 Currently querying Hive views is not supported via Drill. This may however 
 not be clear to an end user as both hive tables and hive views can be listed 
 via Drill.
  use hive;
 +++
 | ok |  summary   |
 +++
 | true   | Default schema changed to 'hive' |
 +++
  show tables;
 +--++
 | TABLE_SCHEMA | TABLE_NAME |
 +--++
 | hive.default | tab1_view  |
 | hive.default | tab1   |
 +--++
 Querying a hive table is successful. However querying a hive view fails with 
 an error which isn't informative. 
  select * from tab1;
 +++
 | a1 | a2 |
 +++
 | 1  | ABC|
 +++
  select * from tab1_view;
 Query failed: Query failed: Unexpected exception during fragment 
 initialization: Internal error: Error while applying rule 
 DrillPushProjIntoScan, args 
 [rel#5828:ProjectRel.NONE.ANY([]).[](child=rel#5827:Subset#0.ENUMERABLE.ANY([]).[],a1=$0,a2=$1),
  rel#5822:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[hive, 
 tab1_view])]
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 The error message thrown at the console must be improved such as to indicate 
 that at present querying Hive Views is not supported. 
 LOG:
 2015-01-08 17:38:41,009 [2b50cede-b757-f6c3-6c5f-56f524796772:foreman] ERROR 
 o.a.drill.exec.work.foreman.Foreman - Error 
 3efe02b8-19cd-4e42-a89c-190488762993: Query failed: Unexpected exception 
 during fragment initialization: Internal error: Error while applying rule 
 DrillPushProjIntoScan, args 
 [rel#5828:ProjectRel.NONE.ANY([]).[](child=rel#5827:Subset#0.ENUMERABLE.ANY([]).[],a1=$0,a2=$1),
  rel#5822:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[hive, 
 tab1_view])]
 org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception 
 during fragment initialization: Internal error: Error while applying rule 
 DrillPushProjIntoScan, args 
 [rel#5828:ProjectRel.NONE.ANY([]).[](child=rel#5827:Subset#0.ENUMERABLE.ANY([]).[],a1=$0,a2=$1),
  rel#5822:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[hive, 
 tab1_view])]
 at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:194) 
 [drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT]
 at 
 org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254)
  [drill-java-exec-0.7.0-r2-SNAPSHOT-rebuffed.jar:0.7.0-r2-SNAPSHOT]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_65]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_65]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
 Caused by: java.lang.AssertionError: Internal error: Error while applying 
 rule DrillPushProjIntoScan, args 
 [rel#5828:ProjectRel.NONE.ANY([]).[](child=rel#5827:Subset#0.ENUMERABLE.ANY([]).[],a1=$0,a2=$1),
  rel#5822:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[hive, 
 tab1_view])]
 at org.eigenbase.util.Util.newInternal(Util.java:750) 
 ~[optiq-core-0.9-drill-r12.jar:na]
 at 
 org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:246)
  ~[optiq-core-0.9-drill-r12.jar:na]
 at 
 org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661)
  ~[optiq-core-0.9-drill-r12.jar:na]
 at 
 net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) 
 ~[optiq-core-0.9-drill-r12.jar:na]
 at 
 net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:276) 
 ~[optiq-core-0.9-drill-r12.jar:na]
 at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:155)

[jira] [Commented] (DRILL-2087) Some planner option setting might produce a wrong plan and wrong query result for join query.

2015-02-12 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318512#comment-14318512
 ] 

Jinfeng Ni commented on DRILL-2087:
---

Talked to [~vkorukanti] about the conflict with Local Exchange. Did not find 
issue with both changes applied.


 Some planner option setting might produce a wrong plan and wrong query result 
 for join query.
 -

 Key: DRILL-2087
 URL: https://issues.apache.org/jira/browse/DRILL-2087
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Jacques Nadeau
 Fix For: 0.9.0

 Attachments: 
 0001-DRILL-2087-Fix-an-incorrect-join-plan-by-ensure-eith.patch


 I have the following parquet data ( essentially tpch's nation/region tables, 
 but break into multiple parts).
 {code}
 ls -l nation
 total 24
 -rw-r--r--@ 1 jni  staff  1376 Jan 27 16:47 01.parquet
 -rw-r--r--@ 1 jni  staff  1400 Jan 27 16:47 02.parquet
 -rw-r--r--@ 1 jni  staff  1279 Jan 27 16:47 03.parquet
 ls -l region
 total 24
 -rw-r--r--@ 1 jni  staff  564 Jan 27 16:54 01.parquet
 -rw-r--r--@ 1 jni  staff  491 Jan 27 16:54 02.parquet
 -rw-r--r--@ 1 jni  staff  506 Jan 27 16:54 03.parquet
 {code}
 With the default planner setting, the following query produce the correct 
 result :
 {code}
 0: jdbc:drill:zk=local select t1.n_nationkey from 
 dfs.`/Users/jni/work/data/parquet/region` t2 join 
 dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = 
 t2.r_regionkey;
 SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.
 SLF4J: Defaulting to no-operation (NOP) logger implementation
 SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
 details.
 +-+
 | n_nationkey |
 +-+
 | 18  |
 | 12  |
 | 9   |
 | 8   |
 | 21  |
 | 19  |
 | 7   |
 | 6   |
 | 23  |
 | 22  |
 | 20  |
 | 13  |
 | 11  |
 | 10  |
 | 4   |
 | 0   |
 | 16  |
 | 15  |
 | 14  |
 | 5   |
 | 24  |
 | 17  |
 | 3   |
 | 2   |
 | 1   |
 +-+
 25 rows selected (2.057 seconds)
 {code}
 However, if I modify the planner option (`planner.slice_target` and 
 `planner.join.row_count_estimate_factor`  ) as following, then the same query 
 would return 0 row, which is not correct. 
 {code}
 0: jdbc:drill:zk=local alter session set `planner.slice_target` = 10;
 +++
 | ok |  summary   |
 +++
 | true   | planner.slice_target updated. |
 +++
 1 row selected (0.093 seconds)
 0: jdbc:drill:zk=local alter session set 
 `planner.join.row_count_estimate_factor` = 0.1;
 +++
 | ok |  summary   |
 +++
 | true   | planner.join.row_count_estimate_factor updated. |
 +++
 1 row selected (0.041 seconds)
 0: jdbc:drill:zk=local select t1.n_nationkey from 
 dfs.`/Users/jni/work/data/parquet/region` t2 join 
 dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = 
 t2.r_regionkey;
 +-+
 | n_nationkey |
 +-+
 +-+
 No rows selected (0.71 seconds)
 {code}
 If we look at explain plan result, after changing the planner option, we will 
 see for the JOIN operator, the LEFT does not have any EXCHANGE operator, 
 while the RIGHT has HashToRandomExchange operator.  That seems to be not a 
 valid plan.
 {code}
 00-00Screen
 00-01  UnionExchange
 01-01Project(n_nationkey=[$2])
 01-02  HashJoin(condition=[=($1, $0)], joinType=[inner])
 01-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
 [path=file:/Users/jni/work/data/parquet/region]], 
 selectionRoot=/Users/jni/work/data/parquet/region, numFiles=1, 
 columns=[`r_regionkey`]]])
 01-03HashToRandomExchange(dist0=[[$0]])
 02-01  Project(n_regionkey=[$1], n_nationkey=[$0])
 02-02Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/nation]], 
 selectionRoot=/Users/jni/work/data/parquet/nation, numFiles=1, 
 columns=[`n_regionkey`, `n_nationkey`]]])
 {code} 
 The cause of this problem seems to be that Drill will  remove EXCHANGE 
 operator under some conditions.  That condition does not guarantee that a 
 JOIN operator always either has EXCHANGE on both sides, or none. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2250) Parquet reader hit ExecutionSetupException when reading decimal columns created using CTAS from another parquet files

2015-02-13 Thread Jinfeng Ni (JIRA)

Jinfeng Ni created DRILL-2250:
-

 Summary: Parquet reader hit ExecutionSetupException when reading 
decimal columns created using CTAS from another parquet files
 Key: DRILL-2250
 URL: https://issues.apache.org/jira/browse/DRILL-2250
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet, Storage - Writer
Reporter: Jinfeng Ni
Assignee: Steven Phillips


Not sure if this is related to DRILL-2249, since the error seems to be 
different.

On today's master commit:

select commit_id from sys.version;
++
| commit_id  |
++
| 4ed0a8d68ec5ef344fb54ff7c9d857f7f3f153aa |
++

If I create a parquet file containing decimal types, using the following CTAS
{code}
create table my_dec_table as select cast(l_quantity as decimal(10,2)) dec1, 
cast(l_discount as decimal(10,2)) dec2, cast(l_tax as decimal(10,2)) as dec3 
from cp.`tpch/lineitem.parquet` ;

++---+
|  Fragment  | Number of records written |
++---+
| 0_0| 60175 |
++---+
1 row selected (1.71 seconds)

{code}

Then, if I want to query this new parquet file with :
{code}
select * from my_dec_table;

RemoteRpcException: Failure while running fragment., 
org.apache.drill.exec.vector.Decimal18Vector cannot be cast to 
org.apache.drill.exec.vector.BigIntVector [ 
de164ae4-2d2b-4257-bafb-3ca5165678f9 on 10.250.0.8:31010 ]
[ de164ae4-2d2b-4257-bafb-3ca5165678f9 on 10.250.0.8:31010 ]

select dec1, dec2, dec3 from my_dec_table;

Query failed: RemoteRpcException: Failure while running fragment., 
org.apache.drill.exec.vector.Decimal18Vector cannot be cast to 
org.apache.drill.exec.vector.BigIntVector [ 
0f5b41c3-1ab3-43b4-9742-cea622d3f476 on 10.250.0.8:31010 ]
[ 0f5b41c3-1ab3-43b4-9742-cea622d3f476 on 10.250.0.8:31010 ]

{code}

parquet-tools shows the following schema information for the generated parquet 
file:

{code}
message root {
  required int64 dec1 (DECIMAL(10,2));
  required int64 dec2 (DECIMAL(10,2));
  required int64 dec3 (DECIMAL(10,2));
}
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2250) Parquet reader hit ExecutionSetupException when reading decimal columns created using CTAS from another parquet files

2015-02-13 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321181#comment-14321181
 ] 

Jinfeng Ni commented on DRILL-2250:
---

Here is part of stack trace for the error:

{code}

org.apache.drill.common.exceptions.ExecutionSetupException: 
java.lang.ClassCastException: org.apache.drill.exec.vector.Decimal18Vector 
cannot be cast to org.apache.drill.exec.vector.BigIntVector
at 
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.setup(ParquetRecordReader.java:325)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.ScanBatch.init(ScanBatch.java:99) 
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:156)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:56)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) 
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) 
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitSubScan(AbstractPhysicalVisitor.java:125)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at 
org.apache.drill.exec.store.parquet.ParquetRowGroupScan.accept(ParquetRowGroupScan.java:107)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
..

{code}

 Parquet reader hit ExecutionSetupException when reading decimal columns 
 created using CTAS from another parquet files
 -

 Key: DRILL-2250
 URL: https://issues.apache.org/jira/browse/DRILL-2250
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet, Storage - Writer
Reporter: Jinfeng Ni
Assignee: Steven Phillips

 Not sure if this is related to DRILL-2249, since the error seems to be 
 different.
 On today's master commit:
 select commit_id from sys.version;
 ++
 | commit_id  |
 ++
 | 4ed0a8d68ec5ef344fb54ff7c9d857f7f3f153aa |
 ++
 If I create a parquet file containing decimal types, using the following CTAS
 {code}
 create table my_dec_table as select cast(l_quantity as decimal(10,2)) dec1, 
 cast(l_discount as decimal(10,2)) dec2, cast(l_tax as decimal(10,2)) as dec3 
 from cp.`tpch/lineitem.parquet` ;
 ++---+
 |  Fragment  | Number of records written |
 ++---+
 | 0_0| 60175 |
 ++---+
 1 row selected (1.71 seconds)
 {code}
 Then, if I want to query this new parquet file with :
 {code}
 select * from my_dec_table;
 RemoteRpcException: Failure while running fragment., 
 org.apache.drill.exec.vector.Decimal18Vector cannot be cast to 
 org.apache.drill.exec.vector.BigIntVector [ 
 de164ae4-2d2b-4257-bafb-3ca5165678f9 on 10.250.0.8:31010 ]
 [ de164ae4-2d2b-4257-bafb-3ca5165678f9 on 10.250.0.8:31010 ]
 select dec1, dec2, dec3 from my_dec_table;
 Query failed: RemoteRpcException: Failure while running fragment., 
 org.apache.drill.exec.vector.Decimal18Vector cannot be cast to 
 org.apache.drill.exec.vector.BigIntVector [ 
 0f5b41c3-1ab3-43b4-9742-cea622d3f476 on 10.250.0.8:31010 ]
 [ 0f5b41c3-1ab3-43b4-9742-cea622d3f476 on 10.250.0.8:31010 ]
 {code}
 parquet-tools shows the following schema information for the generated 
 parquet file:
 {code}
 message root {
   required int64 dec1 (DECIMAL(10,2));
   required int64 dec2 (DECIMAL(10,2));
   required int64 dec3 (DECIMAL(10,2));
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2087) Some planner option setting might produce a wrong plan and wrong query result for join query.

2015-02-04 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2087:
--
Attachment: 0001-DRILL-2087-Fix-an-incorrect-join-plan-by-ensure-eith.patch

 Some planner option setting might produce a wrong plan and wrong query result 
 for join query.
 -

 Key: DRILL-2087
 URL: https://issues.apache.org/jira/browse/DRILL-2087
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni
 Attachments: 
 0001-DRILL-2087-Fix-an-incorrect-join-plan-by-ensure-eith.patch


 I have the following parquet data ( essentially tpch's nation/region tables, 
 but break into multiple parts).
 {code}
 ls -l nation
 total 24
 -rw-r--r--@ 1 jni  staff  1376 Jan 27 16:47 01.parquet
 -rw-r--r--@ 1 jni  staff  1400 Jan 27 16:47 02.parquet
 -rw-r--r--@ 1 jni  staff  1279 Jan 27 16:47 03.parquet
 ls -l region
 total 24
 -rw-r--r--@ 1 jni  staff  564 Jan 27 16:54 01.parquet
 -rw-r--r--@ 1 jni  staff  491 Jan 27 16:54 02.parquet
 -rw-r--r--@ 1 jni  staff  506 Jan 27 16:54 03.parquet
 {code}
 With the default planner setting, the following query produce the correct 
 result :
 {code}
 0: jdbc:drill:zk=local select t1.n_nationkey from 
 dfs.`/Users/jni/work/data/parquet/region` t2 join 
 dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = 
 t2.r_regionkey;
 SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.
 SLF4J: Defaulting to no-operation (NOP) logger implementation
 SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
 details.
 +-+
 | n_nationkey |
 +-+
 | 18  |
 | 12  |
 | 9   |
 | 8   |
 | 21  |
 | 19  |
 | 7   |
 | 6   |
 | 23  |
 | 22  |
 | 20  |
 | 13  |
 | 11  |
 | 10  |
 | 4   |
 | 0   |
 | 16  |
 | 15  |
 | 14  |
 | 5   |
 | 24  |
 | 17  |
 | 3   |
 | 2   |
 | 1   |
 +-+
 25 rows selected (2.057 seconds)
 {code}
 However, if I modify the planner option (`planner.slice_target` and 
 `planner.join.row_count_estimate_factor`  ) as following, then the same query 
 would return 0 row, which is not correct. 
 {code}
 0: jdbc:drill:zk=local alter session set `planner.slice_target` = 10;
 +++
 | ok |  summary   |
 +++
 | true   | planner.slice_target updated. |
 +++
 1 row selected (0.093 seconds)
 0: jdbc:drill:zk=local alter session set 
 `planner.join.row_count_estimate_factor` = 0.1;
 +++
 | ok |  summary   |
 +++
 | true   | planner.join.row_count_estimate_factor updated. |
 +++
 1 row selected (0.041 seconds)
 0: jdbc:drill:zk=local select t1.n_nationkey from 
 dfs.`/Users/jni/work/data/parquet/region` t2 join 
 dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = 
 t2.r_regionkey;
 +-+
 | n_nationkey |
 +-+
 +-+
 No rows selected (0.71 seconds)
 {code}
 If we look at explain plan result, after changing the planner option, we will 
 see for the JOIN operator, the LEFT does not have any EXCHANGE operator, 
 while the RIGHT has HashToRandomExchange operator.  That seems to be not a 
 valid plan.
 {code}
 00-00Screen
 00-01  UnionExchange
 01-01Project(n_nationkey=[$2])
 01-02  HashJoin(condition=[=($1, $0)], joinType=[inner])
 01-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
 [path=file:/Users/jni/work/data/parquet/region]], 
 selectionRoot=/Users/jni/work/data/parquet/region, numFiles=1, 
 columns=[`r_regionkey`]]])
 01-03HashToRandomExchange(dist0=[[$0]])
 02-01  Project(n_regionkey=[$1], n_nationkey=[$0])
 02-02Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/nation]], 
 selectionRoot=/Users/jni/work/data/parquet/nation, numFiles=1, 
 columns=[`n_regionkey`, `n_nationkey`]]])
 {code} 
 The cause of this problem seems to be that Drill will  remove EXCHANGE 
 operator under some conditions.  That condition does not guarantee that a 
 JOIN operator always either has EXCHANGE on both sides, or none. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-2165) Assert when grouping on expression with extract and concat functions

2015-02-05 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-2165:
-

Assignee: Sean Hsuan-Yi Chu

 Assert when grouping on expression with extract and concat functions
 

 Key: DRILL-2165
 URL: https://issues.apache.org/jira/browse/DRILL-2165
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Sean Hsuan-Yi Chu
 Attachments: 0_0_0.parquet


 EXTRACT with date data type:
 {code}
 0: jdbc:drill:schema=dfs select concat(extract(day from c_date), 
 extract(month from c_date)) from t1 group by concat(extract(day from c_date), 
 extract(month from c_date));
 Query failed: AssertionError:
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}
 EXTRACT with timestamp data type
 {code}
 0: jdbc:drill:schema=dfs select concat(extract(day from c_timestamp), 
 extract(month from c_timestamp)) from t1 group by concat(extract(day from 
 c_timestamp), extract(month from c_timestamp));
 Query failed: AssertionError:
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}
 If you add aggregate function SUM, query runs:
 {code}
 select
 concat(extract(day from c_date), extract(month from c_date)),
 sum(c_integer)
 from
 t1
 group by
 concat(extract(day from c_date), extract(month from c_date));
 {code}
 drillbit.log
 {code}
 2015-02-04 20:06:15,103 [2b2d8447-b333-5df4-f49a-b2111f875a67:foreman] ERROR 
 o.a.drill.exec.work.foreman.Foreman - Error 
 8cff16d9-b34e-4ce3-a00b-18eaf3f64e66: AssertionError:
 org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception 
 during fragment initialization: Internal error: while converting 
 `concat`(EXTRACT(DAY FROM `c_timestamp`), EXTRACT(MONTH FROM `c_timestamp`))
 at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:197) 
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:254)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_71]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_71]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
 Caused by: java.lang.AssertionError: Internal error: while converting 
 `concat`(EXTRACT(DAY FROM `c_timestamp`), EXTRACT(MONTH FROM `c_timestamp`))
 at org.eigenbase.util.Util.newInternal(Util.java:750) 
 ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.ReflectiveConvertletTable$2.convertCall(ReflectiveConvertletTable.java:149)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:52)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4099)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:1)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at org.eigenbase.sql.SqlCall.accept(SqlCall.java:125) 
 ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:3994)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:2402)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:2191)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:514)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:474)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2657)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.eigenbase.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:432)
  ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 net.hydromatic.optiq.prepare.PlannerImpl.convert(PlannerImpl.java:186) 
 ~[optiq-core-0.9-drill-r18.jar:na]
 at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:149)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
 at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:126)

[jira] [Updated] (DRILL-2458) Extra hash column added when running CTAS with order by

2015-03-19 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2458:
--
Fix Version/s: 0.9.0

 Extra hash column added when running CTAS with order by
 ---

 Key: DRILL-2458
 URL: https://issues.apache.org/jira/browse/DRILL-2458
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
Assignee: Jinfeng Ni
 Fix For: 0.9.0

 Attachments: 
 0001-DRILL-2458-Ensure-no-unwanted-columns-are-added-to-C.patch


 I created the table via the command:
 {code:sql}
 create table dfs.tmp.lineitem_sort as select * from 
 dfs.`/drill/SF10/lineitem` order by l_extendedprice desc
 {code}
 This resulted in an extra column when reading the data back:
 {code}
 0: jdbc:drill: select * from `lineitem_sort/0_0_0.parquet` limit 1;
 +---++--++-+--+--++++---+--++++++--+
 | E_X_P_R_H_A_S_H_F_I_E_L_D | L_COMMENT  | L_COMMITDATE | L_DISCOUNT | 
 L_EXTENDEDPRICE | L_LINENUMBER | L_LINESTATUS | L_ORDERKEY | L_PARTKEY  | 
 L_QUANTITY | L_RECEIPTDATE | L_RETURNFLAG | L_SHIPDATE | L_SHIPINSTRUCT | 
 L_SHIPMODE | L_SUPPKEY  |   L_TAX| l_extendedprice0 |
 +---++--++-+--+--++++---+--++++++--+
 | -1909175176   | [B@187a06b6 | [B@734ea347  | 0.02   | 
 104949.5| 2| [B@2fc1c575  | 16734176   | 19 | 
 50.0   | [B@5a8a9cd1   | [B@423d8bc7  | [B@56a3d7ca | [B@1eac3b36| 
 [B@3d6365f5 | 50002  | 0.05   | 104949.5 |
 +---++--++-+--+--++++---+--++++++--+
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2458) Extra hash column added when running CTAS with order by

2015-03-19 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370069#comment-14370069
 ] 

Jinfeng Ni commented on DRILL-2458:
---

The extra column added seems not to be related to the code of LocalExchange; it 
could happen to any other regular column. I could re-produce similar issue as 
following.

{code}
create table dfs.tmp.region_sort as select *, r_regionkey + 1 from 
cp.`tpch/region.parquet` order by r_name;
++---+
|  Fragment  | Number of records written |
++---+
| 0_0| 5 |
++---+
1 row selected (2.556 seconds)
{code}

Then, run query against the created parquet file. 
{code}
0: jdbc:drill:zk=local select * from dfs.tmp.region_sort;
+-++++++
| r_regionkey |   r_name   | r_comment  |   EXPR$1   |  r_name0   |  EXPR$10   |
+-++++++
| 0   | AFRICA | lar deposits. blithely final packages cajole. 
regular waters are final requests. regular accounts are according to  | 1   
   | AFRICA | 1  |
| 1   | AMERICA| hs use ironic, even requests. s | 2  | 
AMERICA| 2  |
| 2   | ASIA   | ges. thinly even pinto beans ca | 3  | 
ASIA   | 3  |
| 3   | EUROPE | ly final courts cajole furiously final excuse | 4  
| EUROPE | 4  |
| 4   | MIDDLE EAST | uickly special accounts cajole carefully blithely 
close requests. carefully final asymptotes haggle furiousl | 5  | 
MIDDLE EAST | 5  |
+-++++++
5 rows selected (0.108 seconds)
{code}

We can see two additional columns were added to the output parquet files : 
r_name0   |  EXPR$10.

The root cause of this problem is in the star column handling, where we should 
add prefix to the star, so that execution operator would be able to distinguish 
the regular columns expanded from the star column, from the column/expression 
referenced in the query. 
 

 Extra hash column added when running CTAS with order by
 ---

 Key: DRILL-2458
 URL: https://issues.apache.org/jira/browse/DRILL-2458
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
Assignee: Jinfeng Ni

 I created the table via the command:
 {code:sql}
 create table dfs.tmp.lineitem_sort as select * from 
 dfs.`/drill/SF10/lineitem` order by l_extendedprice desc
 {code}
 This resulted in an extra column when reading the data back:
 {code}
 0: jdbc:drill: select * from `lineitem_sort/0_0_0.parquet` limit 1;
 +---++--++-+--+--++++---+--++++++--+
 | E_X_P_R_H_A_S_H_F_I_E_L_D | L_COMMENT  | L_COMMITDATE | L_DISCOUNT | 
 L_EXTENDEDPRICE | L_LINENUMBER | L_LINESTATUS | L_ORDERKEY | L_PARTKEY  | 
 L_QUANTITY | L_RECEIPTDATE | L_RETURNFLAG | L_SHIPDATE | L_SHIPINSTRUCT | 
 L_SHIPMODE | L_SUPPKEY  |   L_TAX| l_extendedprice0 |
 +---++--++-+--+--++++---+--++++++--+
 | -1909175176   | [B@187a06b6 | [B@734ea347  | 0.02   | 
 104949.5| 2| [B@2fc1c575  | 16734176   | 19 | 
 50.0   | [B@5a8a9cd1   | [B@423d8bc7  | [B@56a3d7ca | [B@1eac3b36| 
 [B@3d6365f5 | 50002  | 0.05   | 104949.5 |
 +---++--++-+--+--++++---+--++++++--+
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2458) Extra hash column added when running CTAS with order by

2015-03-19 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2458:
--
Attachment: 0001-DRILL-2458-Ensure-no-unwanted-columns-are-added-to-C.patch

 Extra hash column added when running CTAS with order by
 ---

 Key: DRILL-2458
 URL: https://issues.apache.org/jira/browse/DRILL-2458
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips
Assignee: Jinfeng Ni
 Attachments: 
 0001-DRILL-2458-Ensure-no-unwanted-columns-are-added-to-C.patch


 I created the table via the command:
 {code:sql}
 create table dfs.tmp.lineitem_sort as select * from 
 dfs.`/drill/SF10/lineitem` order by l_extendedprice desc
 {code}
 This resulted in an extra column when reading the data back:
 {code}
 0: jdbc:drill: select * from `lineitem_sort/0_0_0.parquet` limit 1;
 +---++--++-+--+--++++---+--++++++--+
 | E_X_P_R_H_A_S_H_F_I_E_L_D | L_COMMENT  | L_COMMITDATE | L_DISCOUNT | 
 L_EXTENDEDPRICE | L_LINENUMBER | L_LINESTATUS | L_ORDERKEY | L_PARTKEY  | 
 L_QUANTITY | L_RECEIPTDATE | L_RETURNFLAG | L_SHIPDATE | L_SHIPINSTRUCT | 
 L_SHIPMODE | L_SUPPKEY  |   L_TAX| l_extendedprice0 |
 +---++--++-+--+--++++---+--++++++--+
 | -1909175176   | [B@187a06b6 | [B@734ea347  | 0.02   | 
 104949.5| 2| [B@2fc1c575  | 16734176   | 19 | 
 50.0   | [B@5a8a9cd1   | [B@423d8bc7  | [B@56a3d7ca | [B@1eac3b36| 
 [B@3d6365f5 | 50002  | 0.05   | 104949.5 |
 +---++--++-+--+--++++---+--++++++--+
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2411) Scalar SUM/AVG over empty result set returns no rows instead of NULL

2015-03-09 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2411:
--
Component/s: (was: Query Planning  Optimization)
 Execution - Relational Operators

 Scalar SUM/AVG over empty result set returns no rows instead of NULL
 

 Key: DRILL-2411
 URL: https://issues.apache.org/jira/browse/DRILL-2411
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Victoria Markman
Assignee: Jinfeng Ni

 Queries below should return NULL:
 {code}
 0: jdbc:drill:schema=dfs select sum(a2) from t2 where 1=0;
 ++
 |   EXPR$0   |
 ++
 ++
 No rows selected (0.08 seconds)
 0: jdbc:drill:schema=dfs select avg(a2) from t2 where 1=0;
 ++
 |   EXPR$0   |
 ++
 ++
 No rows selected (0.074 seconds)
 {code}
 When grouped, result is correct:
 {code}
 0: jdbc:drill:schema=dfs select a2, sum(a2) from t2 where 1=0 group by a2;
 +++
 | a2 |   EXPR$1   |
 +++
 +++
 No rows selected (0.11 seconds)
 {code}
 I'm not convinced and it is not very intuitive that correct result should be 
 NULL, but this is what postgres returns and Aman thinks NULL is the correct 
 behavior :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2411) Scalar SUM/AVG over empty result set returns no rows instead of NULL

2015-03-09 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2411:
--
Assignee: Steven Phillips  (was: Jinfeng Ni)

 Scalar SUM/AVG over empty result set returns no rows instead of NULL
 

 Key: DRILL-2411
 URL: https://issues.apache.org/jira/browse/DRILL-2411
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Victoria Markman
Assignee: Steven Phillips

 Queries below should return NULL:
 {code}
 0: jdbc:drill:schema=dfs select sum(a2) from t2 where 1=0;
 ++
 |   EXPR$0   |
 ++
 ++
 No rows selected (0.08 seconds)
 0: jdbc:drill:schema=dfs select avg(a2) from t2 where 1=0;
 ++
 |   EXPR$0   |
 ++
 ++
 No rows selected (0.074 seconds)
 {code}
 When grouped, result is correct:
 {code}
 0: jdbc:drill:schema=dfs select a2, sum(a2) from t2 where 1=0 group by a2;
 +++
 | a2 |   EXPR$1   |
 +++
 +++
 No rows selected (0.11 seconds)
 {code}
 I'm not convinced and it is not very intuitive that correct result should be 
 NULL, but this is what postgres returns and Aman thinks NULL is the correct 
 behavior :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2411) Scalar SUM/AVG over empty result set returns no rows instead of NULL

2015-03-09 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353860#comment-14353860
 ] 

Jinfeng Ni commented on DRILL-2411:
---

The plan for the query seems to be fine, and I feel the issue comes from 
execution operator.

{code}
select sum(n_nationkey) from cp.`tpch/nation.parquet` where n_nationkey  0;
{code}

{code}
00-00Screen
00-01  StreamAgg(group=[{}], EXPR$0=[SUM($0)])
00-02Filter(condition=[($0, 0)])
00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, numFiles=1, 
columns=[`n_nationkey`]]])
{code}

 Scalar SUM/AVG over empty result set returns no rows instead of NULL
 

 Key: DRILL-2411
 URL: https://issues.apache.org/jira/browse/DRILL-2411
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Victoria Markman
Assignee: Steven Phillips

 Queries below should return NULL:
 {code}
 0: jdbc:drill:schema=dfs select sum(a2) from t2 where 1=0;
 ++
 |   EXPR$0   |
 ++
 ++
 No rows selected (0.08 seconds)
 0: jdbc:drill:schema=dfs select avg(a2) from t2 where 1=0;
 ++
 |   EXPR$0   |
 ++
 ++
 No rows selected (0.074 seconds)
 {code}
 When grouped, result is correct:
 {code}
 0: jdbc:drill:schema=dfs select a2, sum(a2) from t2 where 1=0 group by a2;
 +++
 | a2 |   EXPR$1   |
 +++
 +++
 No rows selected (0.11 seconds)
 {code}
 I'm not convinced and it is not very intuitive that correct result should be 
 NULL, but this is what postgres returns and Aman thinks NULL is the correct 
 behavior :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-2313) Query fails when one of the operands is a DATE literal without an explicit cast

2015-03-05 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-2313:
-

Assignee: Sean Hsuan-Yi Chu  (was: Jinfeng Ni)

 Query fails when one of the operands is a DATE literal without an explicit 
 cast
 ---

 Key: DRILL-2313
 URL: https://issues.apache.org/jira/browse/DRILL-2313
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Sean Hsuan-Yi Chu

 For operations involving the date datatype, when one of the operands is a 
 DATE literal without a cast, query fails. 
 *The following query fails to validate:*
 {code:sql}
 SELECT
  *
 FROM 
  date_dim
  
 WHEREd_date BETWEEN '2002-3-01' AND cast('2002-3-01' AS DATE) 
 LIMIT 1;
 {code}
 Query failed: SqlValidatorException: Cannot apply 'BETWEEN' to arguments of 
 type 'ANY BETWEEN CHAR(9) AND DATE'. Supported form(s): 
 'COMPARABLE_TYPE BETWEEN COMPARABLE_TYPE AND COMPARABLE_TYPE'
 *The following query executes fine:*
 {code:sql}
 SELECT
  *
 FROM 
  date_dim
  
 WHEREd_date BETWEEN '2002-3-01' AND 
   '2002-3-01'
 LIMIT 1;
 {code}
 Both the queries execute fine on postgres



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-2317) UNION ALL on sub-queries with single column projection fails on GROUP BY

2015-03-05 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-2317:
-

Assignee: Sean Hsuan-Yi Chu  (was: Jinfeng Ni)

 UNION ALL on sub-queries with single column projection fails on GROUP BY
 

 Key: DRILL-2317
 URL: https://issues.apache.org/jira/browse/DRILL-2317
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Sean Hsuan-Yi Chu

 A UNION ALL on two sub-queries with only a single column projected, and GROUP 
 BY on the whole query fails with an IndexOutOfBoundsException
 *The following query fails to execute:*
 {code:sql}
 SELECT c1
 FROM   ((SELECT ss_quantity   AS c1
 FROM   store_sales limit 1)
 UNION ALL 
 (SELECT ws_quantity  AS c2
 FROM   web_sales limit 1)
 ) foo 
 GROUP  BY c1;
 {code}
 *The following query also fails to execute:*
 {code:sql}
 SELECT distinct(c)
 FROM   ((SELECT ss_quantity   AS c
 FROM   store_sales limit 1)
 UNION ALL 
 (SELECT ws_quantity  AS c
 FROM   web_sales limit 1)
 ) ;
 {code}
 Error:
 Query failed: RemoteRpcException: Failure while running fragment., index: 0, 
 length: 1 (expected: range(0, 0))
 LOG message indicates java.lang.IndexOutOfBoundsException
 *The following query executes fine:*
 {code:sql}
 SELECT c1
 FROM   ((SELECT ss_quantity   AS c1, ss_item_sk c3
 FROM   store_sales limit 1)
 UNION ALL 
 (SELECT ws_quantity  AS c2, ws_item_sk c4
 FROM   web_sales limit 1)
 ) foo 
 GROUP  BY c1;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2400) Drill's query planner did not merge multiple filters into one.

2015-03-06 Thread Jinfeng Ni (JIRA)

Jinfeng Ni created DRILL-2400:
-

 Summary: Drill's query planner did not merge multiple filters into 
one. 
 Key: DRILL-2400
 URL: https://issues.apache.org/jira/browse/DRILL-2400
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni


Let's say we have a view defined with a WHERE clause as the following:

{code}
create or replace view MyViewWithFilter as  +
 SELECT  first_name,  +
 last_name,  +
 full_name,  +
 salary,  +
 employee_id,  +
 store_id,  +
 position_id,  +
 position_title,  +
 education_level  +
 FROM cp.`employee.json`  +
 WHERE position_id in (1, 2, 3 ) 
{code}

For the following query which refers to the above view in two places, one in 
out query, one in the subquery, each reference adds an additional filter,

{code}
select dat.store_id
  , sum(dat.store_cost) as total_cost
 from (
   select store_id,  
  position_id , 
  sum( salary) as store_cost
   from MyViewWithFilter 
   where full_name in ( select n_name
   from cp.`tpch/nation.parquet`)
   and  education_level = 'GRADUATE DEGREE'   
   and position_id in ( select position_id 
  from MyViewWithFilter
  where position_title like '%VP%'
)
  group by store_id, position_id
) dat
group by dat.store_id
order by dat.store_id
{code}

The current plan will be like the following :
{code}
00-00Screen
00-01  Project(store_id=[$0], total_cost=[$1])
00-02SelectionVectorRemover
00-03  Sort(sort0=[$0], dir0=[ASC])
00-04HashAgg(group=[{0}], total_cost=[SUM($1)])
00-05  Project(store_id=[$0], store_cost=[$2])
00-06HashAgg(group=[{0, 1}], store_cost=[SUM($2)])
00-07  Project(store_id=[$1], position_id=[$2], $f3=[$0])
00-08HashJoin(condition=[=($4, $5)], joinType=[inner])
00-10  Project($f3=[$0], $f5=[$1], $f6=[$2], $f8=[$3], 
$f11=[$2])
00-12HashJoin(condition=[=($4, $5)], joinType=[inner])
00-15  SelectionVectorRemover
00-18Filter(condition=[=($3, 'GRADUATE DEGREE')])
00-20  Project($f3=[CAST($2):ANY NOT NULL], 
$f5=[CAST($3):ANY NOT NULL], $f6=[CAST($0):ANY NOT NULL], $f8=[CAST($4):ANY NOT 
NULL], $f9=[CAST($1):ANY NOT NULL])
00-22SelectionVectorRemover
00-23  Filter(condition=[OR(=($0, 1), =($0, 2), 
=($0, 3))])
00-24Project(position_id=[$0], 
full_name=[$4], salary=[$1], store_id=[$2], education_level=[$3])
00-25  Scan(groupscan=[EasyGroupScan 
[selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, `full_name`, 
`salary`, `store_id`, `education_level`], files=[/employee.json]]])
00-14  HashAgg(group=[{0}])
00-17Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], 
selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]])
00-09  HashAgg(group=[{0}])
00-11Project(position_id=[CAST($0):ANY NOT NULL])
00-13  SelectionVectorRemover
00-16Filter(condition=[LIKE(CAST($1):ANY NOT NULL, 
'%VP%')])
00-19  Filter(condition=[OR(=($0, 1), =($0, 2), 
=($0, 3))])
00-21Scan(groupscan=[EasyGroupScan 
[selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, 
`position_title`], files=[/employee.json]]])
{code}

In the plan,  the View's own filter and the additional filter were not merged, 
and we could see either in the out query or subquery, there are two filters, 
like :
{code}
00-16Filter(condition=[LIKE(CAST($1):ANY NOT NULL, 
'%VP%')])
00-19  Filter(condition=[OR(=($0, 1), =($0, 2), 
=($0, 3))])
{code}

It is desired to have filter merged, because 
1) it could improve the possibility of partition pruning with the merged 
filter, since the partition pruning will match a filter directly on top of SCAN.
   Filter   (dir0 = ' ') 
  \ 
   Filter   ( colA = '...')
  \
   SCAN
If the filter with (dir0) is not merged with the second filter, then partition 
pruning will not kick in

2) Merged filter would allow short circuit evaluation, which could mean better 
performance, compared with the case

[jira] [Commented] (DRILL-2400) Drill's query planner did not merge multiple filters into one.

2015-03-06 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350970#comment-14350970
 ] 

Jinfeng Ni commented on DRILL-2400:
---

[~amansinha100], could you please review the patch for DRILL-2240, in 
particular, the first part where the estimation of filter's cpu cost is 
modified? Thanks.



 Drill's query planner did not merge multiple filters into one. 
 ---

 Key: DRILL-2400
 URL: https://issues.apache.org/jira/browse/DRILL-2400
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni
 Attachments: 
 0005-DRILL-2240-Part-1-Change-cpu-cost-estimation-formula.patch, 
 0006-DRILL-2240-Part-2-Add-merge-filter-rule-to-logical-r.patch


 Let's say we have a view defined with a WHERE clause as the following:
 {code}
 create or replace view MyViewWithFilter as  +
  SELECT  first_name,  +
  last_name,  +
  full_name,  +
  salary,  +
  employee_id,  +
  store_id,  +
  position_id,  +
  position_title,  +
  education_level  +
  FROM cp.`employee.json`  +
  WHERE position_id in (1, 2, 3 ) 
 {code}
 For the following query which refers to the above view in two places, one in 
 out query, one in the subquery, each reference adds an additional filter,
 {code}
 select dat.store_id
   , sum(dat.store_cost) as total_cost
  from (
select store_id,  
   position_id , 
   sum( salary) as store_cost
from MyViewWithFilter 
where full_name in ( select n_name
from cp.`tpch/nation.parquet`)
and  education_level = 'GRADUATE DEGREE'   
and position_id in ( select position_id 
   from MyViewWithFilter
   where position_title like '%VP%'
 )
   group by store_id, position_id
 ) dat
 group by dat.store_id
 order by dat.store_id
 {code}
 The current plan will be like the following :
 {code}
 00-00Screen
 00-01  Project(store_id=[$0], total_cost=[$1])
 00-02SelectionVectorRemover
 00-03  Sort(sort0=[$0], dir0=[ASC])
 00-04HashAgg(group=[{0}], total_cost=[SUM($1)])
 00-05  Project(store_id=[$0], store_cost=[$2])
 00-06HashAgg(group=[{0, 1}], store_cost=[SUM($2)])
 00-07  Project(store_id=[$1], position_id=[$2], $f3=[$0])
 00-08HashJoin(condition=[=($4, $5)], joinType=[inner])
 00-10  Project($f3=[$0], $f5=[$1], $f6=[$2], $f8=[$3], 
 $f11=[$2])
 00-12HashJoin(condition=[=($4, $5)], joinType=[inner])
 00-15  SelectionVectorRemover
 00-18Filter(condition=[=($3, 'GRADUATE DEGREE')])
 00-20  Project($f3=[CAST($2):ANY NOT NULL], 
 $f5=[CAST($3):ANY NOT NULL], $f6=[CAST($0):ANY NOT NULL], $f8=[CAST($4):ANY 
 NOT NULL], $f9=[CAST($1):ANY NOT NULL])
 00-22SelectionVectorRemover
 00-23  Filter(condition=[OR(=($0, 1), =($0, 
 2), =($0, 3))])
 00-24Project(position_id=[$0], 
 full_name=[$4], salary=[$1], store_id=[$2], education_level=[$3])
 00-25  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, 
 `full_name`, `salary`, `store_id`, `education_level`], 
 files=[/employee.json]]])
 00-14  HashAgg(group=[{0}])
 00-17Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], 
 selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]])
 00-09  HashAgg(group=[{0}])
 00-11Project(position_id=[CAST($0):ANY NOT NULL])
 00-13  SelectionVectorRemover
 00-16Filter(condition=[LIKE(CAST($1):ANY NOT 
 NULL, '%VP%')])
 00-19  Filter(condition=[OR(=($0, 1), =($0, 2), 
 =($0, 3))])
 00-21Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, 
 `position_title`], files=[/employee.json]]])
 {code}
 In the plan,  the View's own filter and the additional filter were not 
 merged, and we could see either in the out query or subquery, there are two 
 filters, like :
 {code}
 00-16Filter(condition=[LIKE(CAST($1):ANY NOT 
 NULL,

[jira] [Updated] (DRILL-2400) Drill's query planner did not merge multiple filters into one.

2015-03-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2400:
--
Attachment: 0006-DRILL-2240-Part-2-Add-merge-filter-rule-to-logical-r.patch
0005-DRILL-2240-Part-1-Change-cpu-cost-estimation-formula.patch

 Drill's query planner did not merge multiple filters into one. 
 ---

 Key: DRILL-2400
 URL: https://issues.apache.org/jira/browse/DRILL-2400
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni
 Attachments: 
 0005-DRILL-2240-Part-1-Change-cpu-cost-estimation-formula.patch, 
 0006-DRILL-2240-Part-2-Add-merge-filter-rule-to-logical-r.patch


 Let's say we have a view defined with a WHERE clause as the following:
 {code}
 create or replace view MyViewWithFilter as  +
  SELECT  first_name,  +
  last_name,  +
  full_name,  +
  salary,  +
  employee_id,  +
  store_id,  +
  position_id,  +
  position_title,  +
  education_level  +
  FROM cp.`employee.json`  +
  WHERE position_id in (1, 2, 3 ) 
 {code}
 For the following query which refers to the above view in two places, one in 
 out query, one in the subquery, each reference adds an additional filter,
 {code}
 select dat.store_id
   , sum(dat.store_cost) as total_cost
  from (
select store_id,  
   position_id , 
   sum( salary) as store_cost
from MyViewWithFilter 
where full_name in ( select n_name
from cp.`tpch/nation.parquet`)
and  education_level = 'GRADUATE DEGREE'   
and position_id in ( select position_id 
   from MyViewWithFilter
   where position_title like '%VP%'
 )
   group by store_id, position_id
 ) dat
 group by dat.store_id
 order by dat.store_id
 {code}
 The current plan will be like the following :
 {code}
 00-00Screen
 00-01  Project(store_id=[$0], total_cost=[$1])
 00-02SelectionVectorRemover
 00-03  Sort(sort0=[$0], dir0=[ASC])
 00-04HashAgg(group=[{0}], total_cost=[SUM($1)])
 00-05  Project(store_id=[$0], store_cost=[$2])
 00-06HashAgg(group=[{0, 1}], store_cost=[SUM($2)])
 00-07  Project(store_id=[$1], position_id=[$2], $f3=[$0])
 00-08HashJoin(condition=[=($4, $5)], joinType=[inner])
 00-10  Project($f3=[$0], $f5=[$1], $f6=[$2], $f8=[$3], 
 $f11=[$2])
 00-12HashJoin(condition=[=($4, $5)], joinType=[inner])
 00-15  SelectionVectorRemover
 00-18Filter(condition=[=($3, 'GRADUATE DEGREE')])
 00-20  Project($f3=[CAST($2):ANY NOT NULL], 
 $f5=[CAST($3):ANY NOT NULL], $f6=[CAST($0):ANY NOT NULL], $f8=[CAST($4):ANY 
 NOT NULL], $f9=[CAST($1):ANY NOT NULL])
 00-22SelectionVectorRemover
 00-23  Filter(condition=[OR(=($0, 1), =($0, 
 2), =($0, 3))])
 00-24Project(position_id=[$0], 
 full_name=[$4], salary=[$1], store_id=[$2], education_level=[$3])
 00-25  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, 
 `full_name`, `salary`, `store_id`, `education_level`], 
 files=[/employee.json]]])
 00-14  HashAgg(group=[{0}])
 00-17Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], 
 selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]])
 00-09  HashAgg(group=[{0}])
 00-11Project(position_id=[CAST($0):ANY NOT NULL])
 00-13  SelectionVectorRemover
 00-16Filter(condition=[LIKE(CAST($1):ANY NOT 
 NULL, '%VP%')])
 00-19  Filter(condition=[OR(=($0, 1), =($0, 2), 
 =($0, 3))])
 00-21Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, 
 `position_title`], files=[/employee.json]]])
 {code}
 In the plan,  the View's own filter and the additional filter were not 
 merged, and we could see either in the out query or subquery, there are two 
 filters, like :
 {code}
 00-16Filter(condition=[LIKE(CAST($1):ANY NOT 
 NULL, '%VP%')])
 00-19

[jira] [Commented] (DRILL-2400) Drill's query planner did not merge multiple filters into one.

2015-03-07 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351813#comment-14351813
 ] 

Jinfeng Ni commented on DRILL-2400:
---

With the patch, here is the new plan, where the two filters are merged together.

{code}
00-00Screen
00-01  Project(store_id=[$0], total_cost=[$1])
00-02SelectionVectorRemover
00-03  Sort(sort0=[$0], dir0=[ASC])
00-04HashAgg(group=[{0}], total_cost=[SUM($1)])
00-05  Project(store_id=[$0], store_cost=[$2])
00-06HashAgg(group=[{0, 1}], store_cost=[SUM($2)])
00-07  Project(store_id=[$1], position_id=[$2], $f3=[$0])
00-08HashJoin(condition=[=($3, $4)], joinType=[inner])
00-10  Project($f3=[$0], $f5=[$1], $f6=[$2], $f11=[$2])
00-12HashJoin(condition=[=($3, $4)], joinType=[inner])
00-15  Project($f3=[CAST($2):ANY NOT NULL], 
$f5=[CAST($3):ANY NOT NULL], $f6=[CAST($0):ANY NOT NULL], $f9=[CAST($1):ANY NOT 
NULL])
00-18SelectionVectorRemover
00-20  Filter(condition=[AND(OR(=($0, 1), =($0, 2), 
=($0, 3)), =(CAST($4):ANY NOT NULL, 'GRADUATE DEGREE'))])
00-21Project(position_id=[$0], full_name=[$4], 
salary=[$1], store_id=[$2], education_level=[$3])
00-22  Scan(groupscan=[EasyGroupScan 
[selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, `full_name`, 
`salary`, `store_id`, `education_level`], files=[/employee.json]]])
00-14  HashAgg(group=[{0}])
00-17Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], 
selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]])
00-09  HashAgg(group=[{0}])
00-11Project(position_id=[CAST($0):ANY NOT NULL])
00-13  SelectionVectorRemover
00-16Filter(condition=[AND(OR(=($0, 1), =($0, 2), 
=($0, 3)), LIKE(CAST($1):ANY NOT NULL, '%VP%'))])
00-19  Scan(groupscan=[EasyGroupScan 
[selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, 
`position_title`], files=[/employee.json]]])

{code}

 Drill's query planner did not merge multiple filters into one. 
 ---

 Key: DRILL-2400
 URL: https://issues.apache.org/jira/browse/DRILL-2400
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni
 Attachments: 
 0005-DRILL-2240-Part-1-Change-cpu-cost-estimation-formula.patch, 
 0006-DRILL-2240-Part-2-Add-merge-filter-rule-to-logical-r.patch


 Let's say we have a view defined with a WHERE clause as the following:
 {code}
 create or replace view MyViewWithFilter as  +
  SELECT  first_name,  +
  last_name,  +
  full_name,  +
  salary,  +
  employee_id,  +
  store_id,  +
  position_id,  +
  position_title,  +
  education_level  +
  FROM cp.`employee.json`  +
  WHERE position_id in (1, 2, 3 ) 
 {code}
 For the following query which refers to the above view in two places, one in 
 out query, one in the subquery, each reference adds an additional filter,
 {code}
 select dat.store_id
   , sum(dat.store_cost) as total_cost
  from (
select store_id,  
   position_id , 
   sum( salary) as store_cost
from MyViewWithFilter 
where full_name in ( select n_name
from cp.`tpch/nation.parquet`)
and  education_level = 'GRADUATE DEGREE'   
and position_id in ( select position_id 
   from MyViewWithFilter
   where position_title like '%VP%'
 )
   group by store_id, position_id
 ) dat
 group by dat.store_id
 order by dat.store_id
 {code}
 The current plan will be like the following :
 {code}
 00-00Screen
 00-01  Project(store_id=[$0], total_cost=[$1])
 00-02SelectionVectorRemover
 00-03  Sort(sort0=[$0], dir0=[ASC])
 00-04HashAgg(group=[{0}], total_cost=[SUM($1)])
 00-05  Project(store_id=[$0], store_cost=[$2])
 00-06HashAgg(group=[{0, 1}], store_cost=[SUM($2)])
 00-07  Project(store_id=[$1], position_id=[$2], $f3=[$0])
 00-08HashJoin(condition=[=($4, $5)], joinType=[inner])
 00-10  Project($f3=[$0], $f5=[$1], $f6=[$2], $f8=[$3],

[jira] [Resolved] (DRILL-2400) Drill's query planner did not merge multiple filters into one.

2015-03-07 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-2400.
---
Resolution: Fixed

Fixed in commit 6be9e3b and 6e6f789

 Drill's query planner did not merge multiple filters into one. 
 ---

 Key: DRILL-2400
 URL: https://issues.apache.org/jira/browse/DRILL-2400
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni
 Attachments: 
 0005-DRILL-2240-Part-1-Change-cpu-cost-estimation-formula.patch, 
 0006-DRILL-2240-Part-2-Add-merge-filter-rule-to-logical-r.patch


 Let's say we have a view defined with a WHERE clause as the following:
 {code}
 create or replace view MyViewWithFilter as  +
  SELECT  first_name,  +
  last_name,  +
  full_name,  +
  salary,  +
  employee_id,  +
  store_id,  +
  position_id,  +
  position_title,  +
  education_level  +
  FROM cp.`employee.json`  +
  WHERE position_id in (1, 2, 3 ) 
 {code}
 For the following query which refers to the above view in two places, one in 
 out query, one in the subquery, each reference adds an additional filter,
 {code}
 select dat.store_id
   , sum(dat.store_cost) as total_cost
  from (
select store_id,  
   position_id , 
   sum( salary) as store_cost
from MyViewWithFilter 
where full_name in ( select n_name
from cp.`tpch/nation.parquet`)
and  education_level = 'GRADUATE DEGREE'   
and position_id in ( select position_id 
   from MyViewWithFilter
   where position_title like '%VP%'
 )
   group by store_id, position_id
 ) dat
 group by dat.store_id
 order by dat.store_id
 {code}
 The current plan will be like the following :
 {code}
 00-00Screen
 00-01  Project(store_id=[$0], total_cost=[$1])
 00-02SelectionVectorRemover
 00-03  Sort(sort0=[$0], dir0=[ASC])
 00-04HashAgg(group=[{0}], total_cost=[SUM($1)])
 00-05  Project(store_id=[$0], store_cost=[$2])
 00-06HashAgg(group=[{0, 1}], store_cost=[SUM($2)])
 00-07  Project(store_id=[$1], position_id=[$2], $f3=[$0])
 00-08HashJoin(condition=[=($4, $5)], joinType=[inner])
 00-10  Project($f3=[$0], $f5=[$1], $f6=[$2], $f8=[$3], 
 $f11=[$2])
 00-12HashJoin(condition=[=($4, $5)], joinType=[inner])
 00-15  SelectionVectorRemover
 00-18Filter(condition=[=($3, 'GRADUATE DEGREE')])
 00-20  Project($f3=[CAST($2):ANY NOT NULL], 
 $f5=[CAST($3):ANY NOT NULL], $f6=[CAST($0):ANY NOT NULL], $f8=[CAST($4):ANY 
 NOT NULL], $f9=[CAST($1):ANY NOT NULL])
 00-22SelectionVectorRemover
 00-23  Filter(condition=[OR(=($0, 1), =($0, 
 2), =($0, 3))])
 00-24Project(position_id=[$0], 
 full_name=[$4], salary=[$1], store_id=[$2], education_level=[$3])
 00-25  Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, 
 `full_name`, `salary`, `store_id`, `education_level`], 
 files=[/employee.json]]])
 00-14  HashAgg(group=[{0}])
 00-17Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], 
 selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`n_name`]]])
 00-09  HashAgg(group=[{0}])
 00-11Project(position_id=[CAST($0):ANY NOT NULL])
 00-13  SelectionVectorRemover
 00-16Filter(condition=[LIKE(CAST($1):ANY NOT 
 NULL, '%VP%')])
 00-19  Filter(condition=[OR(=($0, 1), =($0, 2), 
 =($0, 3))])
 00-21Scan(groupscan=[EasyGroupScan 
 [selectionRoot=/employee.json, numFiles=1, columns=[`position_id`, 
 `position_title`], files=[/employee.json]]])
 {code}
 In the plan,  the View's own filter and the additional filter were not 
 merged, and we could see either in the out query or subquery, there are two 
 filters, like :
 {code}
 00-16Filter(condition=[LIKE(CAST($1):ANY NOT 
 NULL, '%VP%')])
 00-19  Filter(condition=[OR(=($0, 1), =($0, 2), 
 =($0, 3))])
 {code}
 It is desired to have filter merged, because 
 1)

[jira] [Updated] (DRILL-2311) Create table with same columns of different case results in a java.lang.IllegalStateException

2015-03-13 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2311:
--
Assignee: Sean Hsuan-Yi Chu  (was: Aman Sinha)

 Create table with same columns of different case results in a 
 java.lang.IllegalStateException
 -

 Key: DRILL-2311
 URL: https://issues.apache.org/jira/browse/DRILL-2311
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Affects Versions: 0.8.0
Reporter: Ramana Inukonda Nagaraj
Assignee: Sean Hsuan-Yi Chu

 Doing a create table with same column in different case results in a runtime 
 exception. This query should fail at planning or parsing.
 CREATE TABLE drill_parquet_mulCaseColumns3 as select cast( BIGINT_col as 
 bigint) BIGINT_col,cast( DECIMAL9_col as decimal) bigint_col FROM 
 dfs.`/user/root/alltypes.json`;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2329) TPCDS query 95 and simplified variant fail to execute

2015-03-13 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2329:
--
Fix Version/s: 1.1.0

 TPCDS query 95 and simplified variant fail to execute
 -

 Key: DRILL-2329
 URL: https://issues.apache.org/jira/browse/DRILL-2329
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Jinfeng Ni
 Fix For: 1.1.0

 Attachments: drillbit.log, query95_stream0_sf1.sql


 TPCDS query 95 (attached) fails to validate. 
 *A simplified variant of the query (may not have much semantics) that fails:*
 {code:sql}
 WITH abc AS
 (
SELECT *
FROM   web_sales)
 SELECT
  Count(DISTINCT ws_order_number) AS a1 ,
  Sum(ws_ext_ship_cost)   AS a2 
 FROM web_sales ws 
 WHERE ws.ws_ship_addr_sk = 1
 AND  ws.ws_web_site_sk = web_site_sk
 ORDER BY count(DISTINCT ws_order_number)
 LIMIT 100;
 {code}
 *Error:*
 Query failed: SqlValidatorException: Aggregate expression is illegal in ORDER 
 BY clause of non-aggregating SELECT



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2346) Star is not expanded correctly in create view if view fields are specified

2015-03-13 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2346:
--
Fix Version/s: (was: 0.9.0)
   1.1.0

 Star is not expanded correctly in create view if view fields are specified
 --

 Key: DRILL-2346
 URL: https://issues.apache.org/jira/browse/DRILL-2346
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Minor
 Fix For: 1.1.0


 {code}
 0: jdbc:drill:schema=dfs select * from t2 limit 1;
 ++++
 | a2 | b2 | c2 |
 ++++
 | 0  | zzz| 2014-12-31 |
 ++++
 1 row selected (0.065 seconds)
 {code}
 We get an error, even though table t2 has 3 columns:
 {code}
 0: jdbc:drill:schema=dfs create view v2(a2, b2, c2) as select * from t2;
 +++
 | ok |  summary   |
 +++
 | false  | View's field list and View's query field list have different 
 counts. |
 +++
 1 row selected (0.038 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2346) Star is not expanded correctly in create view if view fields are specified

2015-03-13 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2346:
--
Fix Version/s: 0.9.0

 Star is not expanded correctly in create view if view fields are specified
 --

 Key: DRILL-2346
 URL: https://issues.apache.org/jira/browse/DRILL-2346
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Minor
 Fix For: 0.9.0


 {code}
 0: jdbc:drill:schema=dfs select * from t2 limit 1;
 ++++
 | a2 | b2 | c2 |
 ++++
 | 0  | zzz| 2014-12-31 |
 ++++
 1 row selected (0.065 seconds)
 {code}
 We get an error, even though table t2 has 3 columns:
 {code}
 0: jdbc:drill:schema=dfs create view v2(a2, b2, c2) as select * from t2;
 +++
 | ok |  summary   |
 +++
 | false  | View's field list and View's query field list have different 
 counts. |
 +++
 1 row selected (0.038 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2398) IS NOT DISTINCT FROM predicate returns incorrect result when used as a join filter

2015-03-13 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2398:
--
Assignee: Aman Sinha  (was: Jinfeng Ni)

 IS NOT DISTINCT FROM predicate returns incorrect result when used as a join 
 filter
 --

 Key: DRILL-2398
 URL: https://issues.apache.org/jira/browse/DRILL-2398
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Victoria Markman
Assignee: Aman Sinha
Priority: Critical
 Fix For: 0.9.0

 Attachments: j1.parquet, j2.parquet


 count(*) should return 0 and not NULL
 {code}
 0: jdbc:drill:schema=dfs select
 . . . . . . . . . . . .  count(*)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  j1 INNER JOIN j2 ON
 . . . . . . . . . . . .  ( j1.c_double = j2.c_double)
 . . . . . . . . . . . .  where
 . . . . . . . . . . . .  j1.c_bigint IS NOT DISTINCT FROM j2.c_bigint
 . . . . . . . . . . . .  ;
 ++
 |   EXPR$0   |
 ++
 ++
 {code}
 These are the values in the table
 {code}
 0: jdbc:drill:schema=dfs select j1.c_bigint, j2.c_bigint, count(*) from j1 
 INNER JOIN j2 ON (j1.c_double = j2.c_double) group by j1.c_bigint, 
 j2.c_bigint;
 ++++
 |  c_bigint  | c_bigint1  |   EXPR$1   |
 ++++
 | 460194667  | -498749284 | 1  |
 | 464547172  | -498828740 | 1  |
 | 467451850  | -498966611 | 2  |
 | 471050029  | -499154096 | 3  |
 | 472873799  | -499233550 | 3  |
 | 475698977  | -499395929 | 2  |
 | 478986584  | -499564607 | 1  |
 | 488139464  | -499763274 | 3  |
 | 498214699  | -499871720 | 2  |
 ++++
 9 rows selected (0.339 seconds)
 {code}
 IS DISTINCT FROM predicate returns correct result
 {code}
 select
 count(*)
 from
 j1 INNER JOIN j2 ON
 ( j1.c_double = j2.c_double)
 where
 j1.c_bigint IS DISTINCT FROM j2.c_bigint
 {code}
 Explain plan for query that returns incorrect result:
 {code}
 00-01  StreamAgg(group=[{}], EXPR$0=[COUNT()])
 00-02Project($f0=[0])
 00-03  SelectionVectorRemover
 00-04Filter(condition=[CAST(CASE(IS NULL($1), IS NULL($3), IS 
 NULL($3), IS NULL($1), =($1, $3))):BOOLEAN NOT NULL])
 00-05  HashJoin(condition=[=($0, $2)], joinType=[inner])
 00-07Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/joins/j1]], 
 selectionRoot=/joins/j1, numFiles=1, columns=[`c_double`, `c_bigint`]]])
 00-06Project(c_double0=[$0], c_bigint0=[$1])
 00-08  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/joins/j2]], 
 selectionRoot=/joins/j2, numFiles=1, columns=[`c_double`, `c_bigint`]]])
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2318) Query fails when an ORDER BY clause is used with CTEs

2015-03-13 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-2318:
--
Fix Version/s: 0.9.0

 Query fails when an ORDER BY clause is used with CTEs 
 --

 Key: DRILL-2318
 URL: https://issues.apache.org/jira/browse/DRILL-2318
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 0.8.0
Reporter: Abhishek Girish
Assignee: Jinfeng Ni
 Fix For: 0.9.0

 Attachments: drillbit.log


 Adding a WITH clause with a simple CTE causes a query with an ORDER BY to 
 fail. This happens even when the CTE is unrelated to the main query. 
 *The following query fails to execute:*
 {code:sql}
 WITH 
  x
  AS (SELECT  ss_sold_date_sk a1
  FROM  store_sales) 
 SELECT  x.a1
 FROM   x
 ORDER  BY 
   x.a1;
 {code}
 Error:
 Query failed: SqlValidatorException: Table 'x' not found
 Log attached. 
 *The following query executes fine:*
 {code:sql}
 WITH 
  x
  AS (SELECT  ss_sold_date_sk a1
  FROM  store_sales) 
 SELECT  x.a1
 FROM   x
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 7 8 9 >

1 - 100 of 840 matches

Mail list logo