[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN

2018-10-03 Thread Gautam Kumar Parai (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637491#comment-16637491
 ] 

Gautam Kumar Parai edited comment on DRILL-786 at 10/3/18 8:51 PM:
---

Yes, option 3 makes the most sense given the alternatives. The default value 
(TRUE) of the option serves as a defensive check. When the user sets it to 
FALSE they know what they are getting into rather than Drill surprising them.


was (Author: gparai):
Yes, option 3 makes the most sense. The default value (TRUE) of the option 
serves as a defensive check. When the user sets it to FALSE they know what they 
are getting into rather than Drill surprising them.

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

2018-10-03 Thread Gautam Kumar Parai (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637491#comment-16637491
 ] 

Gautam Kumar Parai commented on DRILL-786:
--

Yes, option 3 makes the most sense. The default value (TRUE) of the option 
serves as a defensive check. When the user sets it to FALSE they know what they 
are getting into rather than Drill surprising them.

> Implement CROSS JOIN
> 
>
> Key: DRILL-786
> URL: https://issues.apache.org/jira/browse/DRILL-786
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Krystal
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
> DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>   DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
> DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>   DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
> DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>   DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Sets:
> Set#22, type: (DrillRecordRow[*, age, name, studentnum])
> rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, 
> importance=0.59049001
> rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), 
> rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 

[jira] [Commented] (DRILL-6552) Drill Metadata management "Drill MetaStore"

2018-09-19 Thread Gautam Kumar Parai (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621266#comment-16621266
 ] 

Gautam Kumar Parai commented on DRILL-6552:
---

I would like to mention that two-phase aggregation along with custom operators 
for computing statistics (instead of e.g. count(*)) was done as part of 
DRILL-1328 similar to the approach suggested by [~okalinin]. However, the perf 
numbers were nowhere near earth-shattering :(

The future improvements were identified as either have a multi-phase agg 
approach OR use sampling in order to speed it up further. Another option would 
be to re-visit the code to see if we can speed up the existing implementation 
further. [~paul-rogers] had reviewed the code at the time - he is certainly a 
ton more versed with execution efficiency than I am. Any suggestions Paul and 
others?

[~vitalii] in addition to the metadata-at-scale problem we should also consider 
the functional completeness. For performance benchmarks like TPC-H/TPCH-DS, we 
had identified histograms as critical for improving planning. Last time when 
you and [~vvysotskyi] had presented the proposal, it seemed like another 
limitation of HMS would be the inability to store histograms. Do you have a 
proposal or workaround for handling histograms - or is it not feasible at all?

> Drill Metadata management "Drill MetaStore"
> ---
>
> Key: DRILL-6552
> URL: https://issues.apache.org/jira/browse/DRILL-6552
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 2.0.0
>
>
> It would be useful for Drill to have some sort of metastore which would 
> enable Drill to remember previously defined schemata so Drill doesn’t have to 
> do the same work over and over again.
> It allows to store schema and statistics, which will allow to accelerate 
> queries validation, planning and execution time. Also it increases stability 
> of Drill and allows to avoid different kind if issues: "schema change 
> Exceptions", "limit 0" optimization and so on. 
> One of the main candidates is Hive Metastore.
> Starting from 3.0 version Hive Metastore can be the separate service from 
> Hive server:
> [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration]
> Optional enhancement is storing Drill's profiles, UDFs, plugins configs in 
> some kind of metastore as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6589) Push transitive closure generated predicates past aggregates

2018-07-26 Thread Gautam Kumar Parai (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-6589:
--
Summary: Push transitive closure generated predicates past aggregates  
(was: Push transitive closure generated predicates past aggregates/projects)

> Push transitive closure generated predicates past aggregates
> 
>
> Key: DRILL-6589
> URL: https://issues.apache.org/jira/browse/DRILL-6589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Here is a sample query that may benefit from this optimization:
> SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 
> Here the transitive predicate a2 = 5 would be pushed past the aggregate due 
> to this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

2018-07-10 Thread Gautam Kumar Parai (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-6589:
--
Description: 
Here is a sample query that may benefit from this optimization:

SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 

Here the transitive predicate a2 = 5 would be pushed past the aggregate due to 
this optimization.

> Push transitive closure generated predicates past aggregates/projects
> -
>
> Key: DRILL-6589
> URL: https://issues.apache.org/jira/browse/DRILL-6589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> Here is a sample query that may benefit from this optimization:
> SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 
> Here the transitive predicate a2 = 5 would be pushed past the aggregate due 
> to this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

2018-07-10 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-6589:
-

 Summary: Push transitive closure generated predicates past 
aggregates/projects
 Key: DRILL-6589
 URL: https://issues.apache.org/jira/browse/DRILL-6589
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.13.0
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai
 Fix For: 1.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6501) Revert/modify fix for DRILL-6212 after CALCITE-2223 is fixed

2018-06-15 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-6501:
-

 Summary: Revert/modify fix for DRILL-6212 after CALCITE-2223 is 
fixed
 Key: DRILL-6501
 URL: https://issues.apache.org/jira/browse/DRILL-6501
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai


DRILL-6212 is a temporary fix to alleviate issues due to CALCITE-2223. Once, 
CALCITE-2223 is fixed this change needs to be reverted back which would require 
DrillProjectMergeRule to go back to extending the ProjectMergeRule. Please take 
a look at how CALCITE-2223 is eventually fixed (as of now it is still not clear 
which fix is the way to do). Depending on the fix we may need to additional 
work to integrate these changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6487) Negative row count when selecting from a json file with an OFFSET clause

2018-06-14 Thread Gautam Kumar Parai (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513153#comment-16513153
 ] 

Gautam Kumar Parai commented on DRILL-6487:
---

I missed that it is an assert not an exception. I will try to repro it with 
asserts enabled.

> Negative row count when selecting from a json file with an OFFSET clause
> 
>
> Key: DRILL-6487
> URL: https://issues.apache.org/jira/browse/DRILL-6487
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.13.0
>Reporter: Boaz Ben-Zvi
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> This simple query fails: 
> {code}
> select * from dfs.`/data/foo.json` offset 1 row;
> {code}
> where foo.json is 
> {code}
> {"key": "aa", "sales": 11}
> {"key": "bb", "sales": 22}
> {code}
> The error returned is:
> {code}
> 0: jdbc:drill:zk=local> select * from dfs.`/data/foo.json` offset 1 row;
> Error: SYSTEM ERROR: AssertionError
> [Error Id: 960d66a9-b480-4a7e-9a25-beb4928e8139 on 10.254.130.25:31020]
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: null
> org.apache.drill.exec.work.foreman.Foreman.run():282
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (java.lang.AssertionError) null
> org.apache.calcite.rel.metadata.RelMetadataQuery.isNonNegative():900
> org.apache.calcite.rel.metadata.RelMetadataQuery.validateResult():919
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount():236
> org.apache.calcite.rel.SingleRel.estimateRowCount():68
> 
> org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier$MajorFragmentStat.add():103
> 
> org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.visitPrel():76
> 
> org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.visitPrel():32
> 
> org.apache.drill.exec.planner.physical.visitor.BasePrelVisitor.visitProject():50
> org.apache.drill.exec.planner.physical.ProjectPrel.accept():98
> 
> org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.visitScreen():63
> 
> org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.visitScreen():32
> org.apache.drill.exec.planner.physical.ScreenPrel.accept():65
> 
> org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.removeExcessiveEchanges():41
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():557
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():179
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83
> org.apache.drill.exec.work.foreman.Foreman.runSQL():567
> org.apache.drill.exec.work.foreman.Foreman.run():264
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6487) Negative row count when selecting from a json file with an OFFSET clause

2018-06-14 Thread Gautam Kumar Parai (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513080#comment-16513080
 ] 

Gautam Kumar Parai commented on DRILL-6487:
---

[~ben-zvi] I cannot reproduce it on latest Apache master. Can you please try it 
and give me repro steps if you can?

> Negative row count when selecting from a json file with an OFFSET clause
> 
>
> Key: DRILL-6487
> URL: https://issues.apache.org/jira/browse/DRILL-6487
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.13.0
>Reporter: Boaz Ben-Zvi
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.14.0
>
>
> This simple query fails: 
> {code}
> select * from dfs.`/data/foo.json` offset 1 row;
> {code}
> where foo.json is 
> {code}
> {"key": "aa", "sales": 11}
> {"key": "bb", "sales": 22}
> {code}
> The error returned is:
> {code}
> 0: jdbc:drill:zk=local> select * from dfs.`/data/foo.json` offset 1 row;
> Error: SYSTEM ERROR: AssertionError
> [Error Id: 960d66a9-b480-4a7e-9a25-beb4928e8139 on 10.254.130.25:31020]
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: null
> org.apache.drill.exec.work.foreman.Foreman.run():282
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (java.lang.AssertionError) null
> org.apache.calcite.rel.metadata.RelMetadataQuery.isNonNegative():900
> org.apache.calcite.rel.metadata.RelMetadataQuery.validateResult():919
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount():236
> org.apache.calcite.rel.SingleRel.estimateRowCount():68
> 
> org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier$MajorFragmentStat.add():103
> 
> org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.visitPrel():76
> 
> org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.visitPrel():32
> 
> org.apache.drill.exec.planner.physical.visitor.BasePrelVisitor.visitProject():50
> org.apache.drill.exec.planner.physical.ProjectPrel.accept():98
> 
> org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.visitScreen():63
> 
> org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.visitScreen():32
> org.apache.drill.exec.planner.physical.ScreenPrel.accept():65
> 
> org.apache.drill.exec.planner.physical.visitor.ExcessiveExchangeIdentifier.removeExcessiveEchanges():41
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():557
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():179
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83
> org.apache.drill.exec.work.foreman.Foreman.runSQL():567
> org.apache.drill.exec.work.foreman.Foreman.run():264
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6212) A simple join is recursing too deep in planning and eventually throwing stack overflow.

2018-06-12 Thread Gautam Kumar Parai (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510147#comment-16510147
 ] 

Gautam Kumar Parai commented on DRILL-6212:
---

[~cshi] I am reassigning this Jira to push it over the finish line.

> A simple join is recursing too deep in planning and eventually throwing stack 
> overflow.
> ---
>
> Key: DRILL-6212
> URL: https://issues.apache.org/jira/browse/DRILL-6212
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Gautam Kumar Parai
>Priority: Critical
> Fix For: 1.14.0
>
>
> Create two views using following statements.
> {code}
> create view v1 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> create view v2 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> {code}
> Executing the following join query produces a stack overflow during the 
> planning phase.
> {code}
> select t1.f from dfs.tmp.v1 as t inner join dfs.tmp.v2 as t1 on cast(t.f as 
> int) = cast(t1.f as int) and cast(t.f as int) = 10 and cast(t1.f as int) = 10;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6212) A simple join is recursing too deep in planning and eventually throwing stack overflow.

2018-06-12 Thread Gautam Kumar Parai (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510147#comment-16510147
 ] 

Gautam Kumar Parai edited comment on DRILL-6212 at 6/12/18 8:17 PM:


[~cshi] I am reassigning this Jira to myself so that I can push it over the 
finish line.


was (Author: gparai):
[~cshi] I am reassigning this Jira to push it over the finish line.

> A simple join is recursing too deep in planning and eventually throwing stack 
> overflow.
> ---
>
> Key: DRILL-6212
> URL: https://issues.apache.org/jira/browse/DRILL-6212
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Gautam Kumar Parai
>Priority: Critical
> Fix For: 1.14.0
>
>
> Create two views using following statements.
> {code}
> create view v1 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> create view v2 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> {code}
> Executing the following join query produces a stack overflow during the 
> planning phase.
> {code}
> select t1.f from dfs.tmp.v1 as t inner join dfs.tmp.v2 as t1 on cast(t.f as 
> int) = cast(t1.f as int) and cast(t.f as int) = 10 and cast(t1.f as int) = 10;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6212) A simple join is recursing too deep in planning and eventually throwing stack overflow.

2018-06-11 Thread Gautam Kumar Parai (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509017#comment-16509017
 ] 

Gautam Kumar Parai edited comment on DRILL-6212 at 6/12/18 12:57 AM:
-

I see that CALCITE-2223 is still open. We should go ahead and fix the issue on 
DRILL and open another DRILL Jira to revert/modify this fix once the CALCITE 
changes are complete.

 [~vvysotskyi] [~vrozov]  [~amansinha100] what do you think?


was (Author: gparai):
I see that CALCITE-2223 is still open. We should go ahead and fix the issue on 
DRILL and open another DRILL Jira to revert/modify this fix onceCALCITE changes 
are complete.

 [~vvysotskyi] [~vrozov]  [~amansinha100] what do you think?

> A simple join is recursing too deep in planning and eventually throwing stack 
> overflow.
> ---
>
> Key: DRILL-6212
> URL: https://issues.apache.org/jira/browse/DRILL-6212
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Chunhui Shi
>Priority: Critical
> Fix For: 1.14.0
>
>
> Create two views using following statements.
> {code}
> create view v1 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> create view v2 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> {code}
> Executing the following join query produces a stack overflow during the 
> planning phase.
> {code}
> select t1.f from dfs.tmp.v1 as t inner join dfs.tmp.v2 as t1 on cast(t.f as 
> int) = cast(t1.f as int) and cast(t.f as int) = 10 and cast(t1.f as int) = 10;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6212) A simple join is recursing too deep in planning and eventually throwing stack overflow.

2018-06-11 Thread Gautam Kumar Parai (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509017#comment-16509017
 ] 

Gautam Kumar Parai commented on DRILL-6212:
---

I see that CALCITE-2223 is still open. We should go ahead and fix the issue on 
DRILL and open another DRILL Jira to revert/modify this fix onceCALCITE changes 
are complete.

 [~vvysotskyi] [~vrozov]  [~amansinha100] what do you think?

> A simple join is recursing too deep in planning and eventually throwing stack 
> overflow.
> ---
>
> Key: DRILL-6212
> URL: https://issues.apache.org/jira/browse/DRILL-6212
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Chunhui Shi
>Priority: Critical
> Fix For: 1.14.0
>
>
> Create two views using following statements.
> {code}
> create view v1 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> create view v2 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> {code}
> Executing the following join query produces a stack overflow during the 
> planning phase.
> {code}
> select t1.f from dfs.tmp.v1 as t inner join dfs.tmp.v2 as t1 on cast(t.f as 
> int) = cast(t1.f as int) and cast(t.f as int) = 10 and cast(t1.f as int) = 10;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6480) Use double data type for ScanStats rowcounts

2018-06-07 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-6480:
-

 Summary: Use double data type for ScanStats rowcounts
 Key: DRILL-6480
 URL: https://issues.apache.org/jira/browse/DRILL-6480
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.13.0
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai


Since ScanStats now uses doubles to store rowcounts all callers should ensure  
using the correct datatypes. Currently, several callers are casting doubles to 
long which may lead to a loss in precision.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6056) Mock datasize could overflow to negative

2018-06-04 Thread Gautam Kumar Parai (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai closed DRILL-6056.
-
Resolution: Duplicate

> Mock datasize could overflow to negative
> 
>
> Key: DRILL-6056
> URL: https://issues.apache.org/jira/browse/DRILL-6056
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Chunhui Shi
>Priority: Major
>
> In some cases, mock datasize (rowCount * rowWidth) could be too large, 
> especially when we test spilling or memory OOB exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6463) ProfileParser cannot parse costs when using MockScanBatch

2018-06-02 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-6463:
-

 Summary: ProfileParser cannot parse costs when using MockScanBatch
 Key: DRILL-6463
 URL: https://issues.apache.org/jira/browse/DRILL-6463
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.13.0
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai
 Fix For: 1.14.0


One of the unit testHashAggrSecondaryTertiarySpill() runs into this issue 
although the issue is generic. It happens due to cost being stored in an int 
which overflows with big enough rows/data size and becomes negative. This 
causes the Profile parser to error out on seeing negative costs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-3539) CTAS over empty json file throws NPE

2018-05-25 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491094#comment-16491094
 ] 

Gautam Kumar Parai commented on DRILL-3539:
---

[~khfaraaz] this still appears to be an issue in 1.13.0

> CTAS over empty json file throws NPE
> 
>
> Key: DRILL-3539
> URL: https://issues.apache.org/jira/browse/DRILL-3539
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: Future
>
>
> CTAS over empty JSON file results in NPE.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> create table t45645 as select * from 
> `empty.json`;
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: 79039288-5402-4b0a-b32d-5bf5024f3b71 on centos-02.qa.lab:31010] 
> (state=,code=0)
> {code}
> Stack trace from drillbit.log
> {code}
> 2015-07-22 00:34:03,788 [2a511b03-90b3-1d39-f4e3-cfd754aa085f:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: 79039288-5402-4b0a-b32d-5bf5024f3b71 on centos-02.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 0:0
> [Error Id: 79039288-5402-4b0a-b32d-5bf5024f3b71 on centos-02.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.physical.impl.WriterRecordBatch.addOutputContainerData(WriterRecordBatch.java:133)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:126)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) 
> ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:79)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) 
> ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[na:1.7.0_45]
> at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_45]
> at 
> 

[jira] [Resolved] (DRILL-3539) CTAS over empty json file throws NPE

2018-05-25 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai resolved DRILL-3539.
---
Resolution: Duplicate

> CTAS over empty json file throws NPE
> 
>
> Key: DRILL-3539
> URL: https://issues.apache.org/jira/browse/DRILL-3539
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: Future
>
>
> CTAS over empty JSON file results in NPE.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> create table t45645 as select * from 
> `empty.json`;
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: 79039288-5402-4b0a-b32d-5bf5024f3b71 on centos-02.qa.lab:31010] 
> (state=,code=0)
> {code}
> Stack trace from drillbit.log
> {code}
> 2015-07-22 00:34:03,788 [2a511b03-90b3-1d39-f4e3-cfd754aa085f:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: 79039288-5402-4b0a-b32d-5bf5024f3b71 on centos-02.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 0:0
> [Error Id: 79039288-5402-4b0a-b32d-5bf5024f3b71 on centos-02.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.physical.impl.WriterRecordBatch.addOutputContainerData(WriterRecordBatch.java:133)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:126)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) 
> ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:79)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) 
> ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[na:1.7.0_45]
> at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_45]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
>  ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>   

[jira] [Assigned] (DRILL-3539) CTAS over empty json file throws NPE

2018-05-25 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai reassigned DRILL-3539:
-

Assignee: Gautam Kumar Parai

> CTAS over empty json file throws NPE
> 
>
> Key: DRILL-3539
> URL: https://issues.apache.org/jira/browse/DRILL-3539
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: Future
>
>
> CTAS over empty JSON file results in NPE.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> create table t45645 as select * from 
> `empty.json`;
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: 79039288-5402-4b0a-b32d-5bf5024f3b71 on centos-02.qa.lab:31010] 
> (state=,code=0)
> {code}
> Stack trace from drillbit.log
> {code}
> 2015-07-22 00:34:03,788 [2a511b03-90b3-1d39-f4e3-cfd754aa085f:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: 79039288-5402-4b0a-b32d-5bf5024f3b71 on centos-02.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 0:0
> [Error Id: 79039288-5402-4b0a-b32d-5bf5024f3b71 on centos-02.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: java.lang.NullPointerException: null
> at 
> org.apache.drill.exec.physical.impl.WriterRecordBatch.addOutputContainerData(WriterRecordBatch.java:133)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:126)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) 
> ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:79)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) 
> ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252)
>  ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[na:1.7.0_45]
> at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_45]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
>  

[jira] [Assigned] (DRILL-3964) CTAS fails with NPE when source JSON file is empty

2018-05-25 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai reassigned DRILL-3964:
-

Assignee: Gautam Kumar Parai

> CTAS fails with NPE when source JSON file is empty
> --
>
> Key: DRILL-3964
> URL: https://issues.apache.org/jira/browse/DRILL-3964
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.2.0
>Reporter: Abhishek Girish
>Assignee: Gautam Kumar Parai
>Priority: Major
>
> {code:sql}
> CREATE TABLE `complex.json` AS
>   SELECT id,
>  gbyi,
>  gbyt,
>  fl,
>  nul,
>  bool,
>  str,
>  sia,
>  sfa,
>  soa,
>  ooa,
>  oooi,
>  ooof,
>  ooos,
>  oooa
> FROM   dfs.`/drill/testdata/complex/json/complex.json`;
> Error: SYSTEM ERROR: NullPointerException
> Fragment 0:0
> [Error Id: 97679667-412a-475f-aebf-e935405c7330 on drill-democ1:31010] 
> (state=,code=0)
> {code}
> {code:sql}
> > select * from dfs.`/drill/testdata/complex/json/complex.json` limit 1;
> +--+
> |  |
> +--+
> +--+
> No rows selected (0.295 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-4525) Query with BETWEEN clause on Date and Timestamp values fails with Validation Error

2018-05-17 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai reassigned DRILL-4525:
-

Assignee: Bohdan Kazydub  (was: Gautam Kumar Parai)

> Query with BETWEEN clause on Date and Timestamp values fails with Validation 
> Error
> --
>
> Key: DRILL-4525
> URL: https://issues.apache.org/jira/browse/DRILL-4525
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Reporter: Abhishek Girish
>Assignee: Bohdan Kazydub
>Priority: Critical
> Fix For: 1.14.0
>
>
> Query: (simplified variant of TPC-DS Query37)
> {code}
> SELECT
>*
> FROM   
>date_dim
> WHERE   
>d_date BETWEEN Cast('1999-03-06' AS DATE) AND  (
>   Cast('1999-03-06' AS DATE) + INTERVAL '60' day)
> LIMIT 10;
> {code}
> Error:
> {code}
> Error: VALIDATION ERROR: From line 6, column 8 to line 7, column 64: Cannot 
> apply 'BETWEEN ASYMMETRIC' to arguments of type ' BETWEEN ASYMMETRIC 
>  AND '. Supported form(s): ' BETWEEN 
>  AND '
> SQL Query null
> [Error Id: 223fb37c-f561-4a37-9283-871dc6f4d6d0 on abhi2:31010] 
> (state=,code=0)
> {code}
> This is a regression from 1.6.0. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-4525) Query with BETWEEN clause on Date and Timestamp values fails with Validation Error

2018-05-17 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai reassigned DRILL-4525:
-

Assignee: Gautam Kumar Parai  (was: Bohdan Kazydub)

> Query with BETWEEN clause on Date and Timestamp values fails with Validation 
> Error
> --
>
> Key: DRILL-4525
> URL: https://issues.apache.org/jira/browse/DRILL-4525
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Reporter: Abhishek Girish
>Assignee: Gautam Kumar Parai
>Priority: Critical
> Fix For: 1.14.0
>
>
> Query: (simplified variant of TPC-DS Query37)
> {code}
> SELECT
>*
> FROM   
>date_dim
> WHERE   
>d_date BETWEEN Cast('1999-03-06' AS DATE) AND  (
>   Cast('1999-03-06' AS DATE) + INTERVAL '60' day)
> LIMIT 10;
> {code}
> Error:
> {code}
> Error: VALIDATION ERROR: From line 6, column 8 to line 7, column 64: Cannot 
> apply 'BETWEEN ASYMMETRIC' to arguments of type ' BETWEEN ASYMMETRIC 
>  AND '. Supported form(s): ' BETWEEN 
>  AND '
> SQL Query null
> [Error Id: 223fb37c-f561-4a37-9283-871dc6f4d6d0 on abhi2:31010] 
> (state=,code=0)
> {code}
> This is a regression from 1.6.0. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6375) ANY_VALUE aggregate function

2018-05-01 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-6375:
-

 Summary: ANY_VALUE aggregate function
 Key: DRILL-6375
 URL: https://issues.apache.org/jira/browse/DRILL-6375
 Project: Apache Drill
  Issue Type: New Feature
  Components: Functions - Drill
Affects Versions: 1.13.0
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai
 Fix For: 1.14.0


We had discussions on the Apache Calcite [1] and Apache Drill [2] mailing lists 
regarding an equivalent for DISTINCT ON. The community seems to prefer the 
ANY_VALUE. This Jira is a placeholder for implementing the ANY_VALUE aggregate 
function in Apache Drill. We should also eventually contribute it to Apache 
Calcite.

[1]https://lists.apache.org/thread.html/f2007a489d3a5741875bcc8a1edd8d5c3715e5114ac45058c3b3a42d@%3Cdev.calcite.apache.org%3E

[2]https://lists.apache.org/thread.html/2517eef7410aed4e88b9515f7e4256335215c1ad39a2676a08d21cb9@%3Cdev.drill.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-4326) JDBC Storage Plugin for PostgreSQL does not work

2018-04-17 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai reassigned DRILL-4326:
-

Assignee: (was: Gautam Kumar Parai)

> JDBC Storage Plugin for PostgreSQL does not work
> 
>
> Key: DRILL-4326
> URL: https://issues.apache.org/jira/browse/DRILL-4326
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
> Environment: Mac OS X JDK 1.8 PostgreSQL 9.4.4 PostgreSQL JDBC jars 
> (postgresql-9.2-1004-jdbc4.jar, postgresql-9.1-901-1.jdbc4.jar, )
>Reporter: Akon Dey
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.5.0, 1.14.0
>
>
> Queries with the JDBC Storage Plugin for PostgreSQL fail with DATA_READ ERROR.
> The JDBC Storage Plugin settings in use are:
> {code}
> {
>   "type": "jdbc",
>   "driver": "org.postgresql.Driver",
>   "url": "jdbc:postgresql://127.0.0.1/test",
>   "username": "akon",
>   "password": null,
>   "enabled": false
> }
> {code}
> Please refer to the following stack for further details:
> {noformat}
> Akons-MacBook-Pro:drill akon$ 
> ./distribution/target/apache-drill-1.5.0-SNAPSHOT/apache-drill-1.5.0-SNAPSHOT/bin/drill-embedded
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; 
> support was removed in 8.0
> Jan 29, 2016 9:17:18 AM org.glassfish.jersey.server.ApplicationHandler 
> initialize
> INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 
> 01:25:26...
> apache drill 1.5.0-SNAPSHOT
> "a little sql for your nosql"
> 0: jdbc:drill:zk=local> !verbose
> verbose: on
> 0: jdbc:drill:zk=local> use pgdb;
> +---+---+
> |  ok   |  summary  |
> +---+---+
> | true  | Default schema changed to [pgdb]  |
> +---+---+
> 1 row selected (0.753 seconds)
> 0: jdbc:drill:zk=local> select * from ips;
> Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the 
> SQL query.
> sql SELECT *
> FROM "test"."ips"
> plugin pgdb
> Fragment 0:0
> [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010] 
> (state=,code=0)
> java.sql.SQLException: DATA_READ ERROR: The JDBC storage plugin failed while 
> trying setup the SQL query.
> sql SELECT *
> FROM "test"."ips"
> plugin pgdb
> Fragment 0:0
> [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010]
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1923)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:73)
>   at 
> net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:338)
>   at 
> net.hydromatic.avatica.AvaticaStatement.execute(AvaticaStatement.java:69)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(DrillStatementImpl.java:101)
>   at sqlline.Commands.execute(Commands.java:841)
>   at sqlline.Commands.sql(Commands.java:751)
>   at sqlline.SqlLine.dispatch(SqlLine.java:746)
>   at sqlline.SqlLine.begin(SqlLine.java:621)
>   at sqlline.SqlLine.start(SqlLine.java:375)
>   at sqlline.SqlLine.main(SqlLine.java:268)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: DATA_READ 
> ERROR: The JDBC storage plugin failed while trying setup the SQL query.
> sql SELECT *
> FROM "test"."ips"
> plugin pgdb
> Fragment 0:0
> [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
>   at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374)
>   at 
> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
>   at 
> org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252)
>   at 
> org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
>   

[jira] [Assigned] (DRILL-4326) JDBC Storage Plugin for PostgreSQL does not work

2018-04-17 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai reassigned DRILL-4326:
-

Assignee: Gautam Kumar Parai

> JDBC Storage Plugin for PostgreSQL does not work
> 
>
> Key: DRILL-4326
> URL: https://issues.apache.org/jira/browse/DRILL-4326
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
> Environment: Mac OS X JDK 1.8 PostgreSQL 9.4.4 PostgreSQL JDBC jars 
> (postgresql-9.2-1004-jdbc4.jar, postgresql-9.1-901-1.jdbc4.jar, )
>Reporter: Akon Dey
>Assignee: Gautam Kumar Parai
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.5.0, 1.14.0
>
>
> Queries with the JDBC Storage Plugin for PostgreSQL fail with DATA_READ ERROR.
> The JDBC Storage Plugin settings in use are:
> {code}
> {
>   "type": "jdbc",
>   "driver": "org.postgresql.Driver",
>   "url": "jdbc:postgresql://127.0.0.1/test",
>   "username": "akon",
>   "password": null,
>   "enabled": false
> }
> {code}
> Please refer to the following stack for further details:
> {noformat}
> Akons-MacBook-Pro:drill akon$ 
> ./distribution/target/apache-drill-1.5.0-SNAPSHOT/apache-drill-1.5.0-SNAPSHOT/bin/drill-embedded
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; 
> support was removed in 8.0
> Jan 29, 2016 9:17:18 AM org.glassfish.jersey.server.ApplicationHandler 
> initialize
> INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 
> 01:25:26...
> apache drill 1.5.0-SNAPSHOT
> "a little sql for your nosql"
> 0: jdbc:drill:zk=local> !verbose
> verbose: on
> 0: jdbc:drill:zk=local> use pgdb;
> +---+---+
> |  ok   |  summary  |
> +---+---+
> | true  | Default schema changed to [pgdb]  |
> +---+---+
> 1 row selected (0.753 seconds)
> 0: jdbc:drill:zk=local> select * from ips;
> Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the 
> SQL query.
> sql SELECT *
> FROM "test"."ips"
> plugin pgdb
> Fragment 0:0
> [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010] 
> (state=,code=0)
> java.sql.SQLException: DATA_READ ERROR: The JDBC storage plugin failed while 
> trying setup the SQL query.
> sql SELECT *
> FROM "test"."ips"
> plugin pgdb
> Fragment 0:0
> [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010]
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1923)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:73)
>   at 
> net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:338)
>   at 
> net.hydromatic.avatica.AvaticaStatement.execute(AvaticaStatement.java:69)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(DrillStatementImpl.java:101)
>   at sqlline.Commands.execute(Commands.java:841)
>   at sqlline.Commands.sql(Commands.java:751)
>   at sqlline.SqlLine.dispatch(SqlLine.java:746)
>   at sqlline.SqlLine.begin(SqlLine.java:621)
>   at sqlline.SqlLine.start(SqlLine.java:375)
>   at sqlline.SqlLine.main(SqlLine.java:268)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: DATA_READ 
> ERROR: The JDBC storage plugin failed while trying setup the SQL query.
> sql SELECT *
> FROM "test"."ips"
> plugin pgdb
> Fragment 0:0
> [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
>   at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374)
>   at 
> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
>   at 
> org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252)
>   at 
> 

[jira] [Commented] (DRILL-6260) Query fails with "ERROR: Non-scalar sub-query used in an expression" when it contains a cast expression around a scalar sub-query

2018-03-16 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402565#comment-16402565
 ] 

Gautam Kumar Parai commented on DRILL-6260:
---

[~hanu.ncr] did not realize you had reassigned it to yourself. I analyzed it a 
bit - Drill throws the error when it `visits` the Calcite logical op tree and 
finds a LogicalAggregate which has SqlSingleValueAggFunction. However, we may 
need to change the visitor to keep going down the tree to find one which is 
not. The code is in PreProcessLogicalRel.java:visit(LogicalAggregate 
aggregate). Hope this helps!

> Query fails with "ERROR: Non-scalar sub-query used in an expression" when it 
> contains a cast expression around a scalar sub-query 
> --
>
> Key: DRILL-6260
> URL: https://issues.apache.org/jira/browse/DRILL-6260
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.13.0, 1.14.0
> Environment: git Commit ID: dd4a46a6c57425284a2b8c68676357f947e01988 
> git Commit Message: Update version to 1.14.0-SNAPSHOT
>Reporter: Abhishek Girish
>Assignee: Hanumath Rao Maduri
>Priority: Major
>
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > cast(max(T2.a) as varchar) FROM `t2.json` T2);
> Error: UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used in an expression
> See Apache Drill JIRA: DRILL-1937
> {code}
> Slightly different variants of the query work fine. 
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > max(cast(T2.a as varchar)) FROM `t2.json` T2);
> 00-00    Screen
> 00-01      Project(b=[$0])
> 00-02        Project(b=[$1])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($0, $2)])
> 00-05              NestedLoopJoin(condition=[true], joinType=[left])
> 00-07                Scan(table=[[si, tmp, t1.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, 
> columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]])
> 00-06                StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-08                  Project($f0=[CAST($0):VARCHAR(65535) CHARACTER SET 
> "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"])
> 00-09                    Scan(table=[[si, tmp, t2.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
> columns=[`a`], files=[maprfs:///tmp/t2.json]]]){code}
> {code}
> > explain plan for SELECT T1.b FROM `t1.json` T1  WHERE  T1.a = (SELECT 
> > max(T2.a) FROM `t2.json` T2);
> 00-00Screen
> 00-01  Project(b=[$0])
> 00-02Project(b=[$1])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($0, $2)])
> 00-05  NestedLoopJoin(condition=[true], joinType=[left])
> 00-07Scan(table=[[si, tmp, t1.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, 
> columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]])
> 00-06StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-08  Scan(table=[[si, tmp, t2.json]], 
> groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, 
> columns=[`a`], files=[maprfs:///tmp/t2.json]]])
> {code}
> File contents:
> {code}
> # cat t1.json 
> {"a":1, "b":"V"}
> {"a":2, "b":"W"}
> {"a":3, "b":"X"}
> {"a":4, "b":"Y"}
> {"a":5, "b":"Z"}
> # cat t2.json 
> {"a":1, "b":"A"}
> {"a":2, "b":"B"}
> {"a":3, "b":"C"}
> {"a":4, "b":"D"}
> {"a":5, "b":"E"}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6226) Projection pushdown does not occur with Calcite upgrade

2018-03-09 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai closed DRILL-6226.
-
Resolution: Not A Bug

> Projection pushdown does not occur with Calcite upgrade
> ---
>
> Key: DRILL-6226
> URL: https://issues.apache.org/jira/browse/DRILL-6226
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.13.0
>
>
> I am seeing  plan issues where projection pushdown does not occur due to 
> costing. The root cause is in DrillScanRel we are relying on utility 
> functions to determine STAR columns which fails with the Calcite upgrade - 
> because GroupScan is using GroupScan.ALL_COLUMNS which is different from 
> STAR_COLUMN/DYNAMIC_STAR.
> While looking at references of SchemaPath.STAR_COLUMN, we found comparisons 
> involving hardcoded '*'. These might also need to change to accommodate for 
> DYNAMIC STAR.
> [~arina]  [~amansinha100] please let me know your thoughts on this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6226) Projection pushdown does not occur with Calcite upgrade

2018-03-09 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393259#comment-16393259
 ] 

Gautam Kumar Parai commented on DRILL-6226:
---

[~arina] I verified this is present in master. Thanks! I will close this Jira 
as not a bug.

> Projection pushdown does not occur with Calcite upgrade
> ---
>
> Key: DRILL-6226
> URL: https://issues.apache.org/jira/browse/DRILL-6226
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.13.0
>
>
> I am seeing  plan issues where projection pushdown does not occur due to 
> costing. The root cause is in DrillScanRel we are relying on utility 
> functions to determine STAR columns which fails with the Calcite upgrade - 
> because GroupScan is using GroupScan.ALL_COLUMNS which is different from 
> STAR_COLUMN/DYNAMIC_STAR.
> While looking at references of SchemaPath.STAR_COLUMN, we found comparisons 
> involving hardcoded '*'. These might also need to change to accommodate for 
> DYNAMIC STAR.
> [~arina]  [~amansinha100] please let me know your thoughts on this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6226) Projection pushdown does not occur with Calcite upgrade

2018-03-08 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-6226:
-

 Summary: Projection pushdown does not occur with Calcite upgrade
 Key: DRILL-6226
 URL: https://issues.apache.org/jira/browse/DRILL-6226
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.13.0
Reporter: Gautam Kumar Parai
Assignee: Arina Ielchiieva
 Fix For: 1.13.0


I am seeing  plan issues where projection pushdown does not occur due to 
costing. The root cause is in DrillScanRel we are relying on utility functions 
to determine STAR columns which fails with the Calcite upgrade - because 
GroupScan is using GroupScan.ALL_COLUMNS which is different from 
STAR_COLUMN/DYNAMIC_STAR.

While looking at references of SchemaPath.STAR_COLUMN, we found comparisons 
involving hardcoded '*'. These might also need to change to accommodate for 
DYNAMIC STAR.

[~arina]  [~amansinha100] please let me know your thoughts on this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan

2018-02-27 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379154#comment-16379154
 ] 

Gautam Kumar Parai commented on DRILL-6099:
---

[~priteshm] no I did not get a chance to address them yet. I will take a look.

> Drill does not push limit past project (flatten) if it cannot be pushed into 
> scan
> -
>
> Key: DRILL-6099
> URL: https://issues.apache.org/jira/browse/DRILL-6099
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.13.0
>
>
> It would be useful to have pushdown occur past flatten(project). Here is an 
> example to illustrate the issue:
> {{explain plan without implementation for }}{{select name, 
> flatten(categories) as category from dfs.`/tmp/t_json_20` LIMIT 1;}}
> {{DrillScreenRel}}{{  }}
> {{  DrillLimitRel(fetch=[1])}}{{    }}
> {{    DrillProjectRel(name=[$0], category=[FLATTEN($1)])}}
> {{      DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, 
> `categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}}
> = 
> Content of 0_0_0.json
> =
> {
>   "name" : "Eric Goldberg, MD",
>   "categories" : [ "Doctors", "Health & Medical" ]
> } {
>   "name" : "Pine Cone Restaurant",
>   "categories" : [ "Restaurants" ]
> } {
>   "name" : "Deforest Family Restaurant",
>   "categories" : [ "American (Traditional)", "Restaurants" ]
> } {
>   "name" : "Culver's",
>   "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", 
> "Restaurants" ]
> } {
>   "name" : "Chang Jiang Chinese Kitchen",
>   "categories" : [ "Chinese", "Restaurants" ]
> } 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan

2018-01-29 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-6099:
--
Labels: ready-to-commit  (was: )

[~ben-zvi] please consider this PR (https://github.com/apache/drill/pull/1096) 
for the batch commit.

> Drill does not push limit past project (flatten) if it cannot be pushed into 
> scan
> -
>
> Key: DRILL-6099
> URL: https://issues.apache.org/jira/browse/DRILL-6099
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It would be useful to have pushdown occur past flatten(project). Here is an 
> example to illustrate the issue:
> {{explain plan without implementation for }}{{select name, 
> flatten(categories) as category from dfs.`/tmp/t_json_20` LIMIT 1;}}
> {{DrillScreenRel}}{{  }}
> {{  DrillLimitRel(fetch=[1])}}{{    }}
> {{    DrillProjectRel(name=[$0], category=[FLATTEN($1)])}}
> {{      DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, 
> `categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}}
> = 
> Content of 0_0_0.json
> =
> {
>   "name" : "Eric Goldberg, MD",
>   "categories" : [ "Doctors", "Health & Medical" ]
> } {
>   "name" : "Pine Cone Restaurant",
>   "categories" : [ "Restaurants" ]
> } {
>   "name" : "Deforest Family Restaurant",
>   "categories" : [ "American (Traditional)", "Restaurants" ]
> } {
>   "name" : "Culver's",
>   "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", 
> "Restaurants" ]
> } {
>   "name" : "Chang Jiang Chinese Kitchen",
>   "categories" : [ "Chinese", "Restaurants" ]
> } 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6093) Unneeded columns in Drill logical project

2018-01-22 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335251#comment-16335251
 ] 

Gautam Kumar Parai commented on DRILL-6093:
---

[~arina] please consider this PR during the batch commit. Thanks!

> Unneeded columns in Drill logical project
> -
>
> Key: DRILL-6093
> URL: https://issues.apache.org/jira/browse/DRILL-6093
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Here is an example query with the corresponding logical plan. The project 
> contains unnecessary columns L_ORDERKEY, O_ORDERKEY in the projection even 
> when it is not required by subsequent operators e.g. DrillJoinRel.
> EXPLAIN PLAN without implementation FOR SELECT L.L_QUANTITY FROM 
> cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O WHERE 
> cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int);
> *+--+--+*
> *|* *text* *|* *json* *|*
> *+--+--+*
> *|* DrillScreenRel
>   DrillProjectRel(L_QUANTITY=[$1])
>     DrillJoinRel(condition=[=($2, $4)], joinType=[inner])
>       DrillProjectRel(L_ORDERKEY=[$0], L_QUANTITY=[$1], 
> $f2=[CAST($0):INTEGER])
>         DrillScanRel(table=[[cp, tpch/lineitem.parquet]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/lineitem.parquet]], 
> selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_QUANTITY`]]])
>       DrillProjectRel(O_ORDERKEY=[$0], $f1=[CAST($0):INTEGER])
>         DrillScanRel(table=[[cp, tpch/orders.parquet]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`O_ORDERKEY`]]])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6093) Unneeded columns in Drill logical project

2018-01-22 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335187#comment-16335187
 ] 

Gautam Kumar Parai commented on DRILL-6093:
---

[~amansinha100] I needed to update the testcase. The project only contains the 
required cols but was renamed to L_ORDERKEY0=CAST I was checking that 
L_ORDERKEY should be absent by the regex L_ORDERKEY.* I have modified it to 
L_ORDERKEY=.*

Maybe something changed after the latest rebase since it passed earlier if I 
remember correctly.

> Unneeded columns in Drill logical project
> -
>
> Key: DRILL-6093
> URL: https://issues.apache.org/jira/browse/DRILL-6093
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Here is an example query with the corresponding logical plan. The project 
> contains unnecessary columns L_ORDERKEY, O_ORDERKEY in the projection even 
> when it is not required by subsequent operators e.g. DrillJoinRel.
> EXPLAIN PLAN without implementation FOR SELECT L.L_QUANTITY FROM 
> cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O WHERE 
> cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int);
> *+--+--+*
> *|* *text* *|* *json* *|*
> *+--+--+*
> *|* DrillScreenRel
>   DrillProjectRel(L_QUANTITY=[$1])
>     DrillJoinRel(condition=[=($2, $4)], joinType=[inner])
>       DrillProjectRel(L_ORDERKEY=[$0], L_QUANTITY=[$1], 
> $f2=[CAST($0):INTEGER])
>         DrillScanRel(table=[[cp, tpch/lineitem.parquet]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/lineitem.parquet]], 
> selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_QUANTITY`]]])
>       DrillProjectRel(O_ORDERKEY=[$0], $f1=[CAST($0):INTEGER])
>         DrillScanRel(table=[[cp, tpch/orders.parquet]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`O_ORDERKEY`]]])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6093) Unneeded columns in Drill logical project

2018-01-22 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-6093:
--
Labels: ready-to-commit  (was: )

> Unneeded columns in Drill logical project
> -
>
> Key: DRILL-6093
> URL: https://issues.apache.org/jira/browse/DRILL-6093
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Here is an example query with the corresponding logical plan. The project 
> contains unnecessary columns L_ORDERKEY, O_ORDERKEY in the projection even 
> when it is not required by subsequent operators e.g. DrillJoinRel.
> EXPLAIN PLAN without implementation FOR SELECT L.L_QUANTITY FROM 
> cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O WHERE 
> cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int);
> *+--+--+*
> *|* *text* *|* *json* *|*
> *+--+--+*
> *|* DrillScreenRel
>   DrillProjectRel(L_QUANTITY=[$1])
>     DrillJoinRel(condition=[=($2, $4)], joinType=[inner])
>       DrillProjectRel(L_ORDERKEY=[$0], L_QUANTITY=[$1], 
> $f2=[CAST($0):INTEGER])
>         DrillScanRel(table=[[cp, tpch/lineitem.parquet]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/lineitem.parquet]], 
> selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_QUANTITY`]]])
>       DrillProjectRel(O_ORDERKEY=[$0], $f1=[CAST($0):INTEGER])
>         DrillScanRel(table=[[cp, tpch/orders.parquet]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`O_ORDERKEY`]]])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6093) Unneeded columns in Drill logical project

2018-01-22 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334634#comment-16334634
 ] 

Gautam Kumar Parai commented on DRILL-6093:
---

[~amansinha100] I see that you removed the ready-to-commit label. Do I need to 
address something in the PR? Please let me know. Thanks!

> Unneeded columns in Drill logical project
> -
>
> Key: DRILL-6093
> URL: https://issues.apache.org/jira/browse/DRILL-6093
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.13.0
>
>
> Here is an example query with the corresponding logical plan. The project 
> contains unnecessary columns L_ORDERKEY, O_ORDERKEY in the projection even 
> when it is not required by subsequent operators e.g. DrillJoinRel.
> EXPLAIN PLAN without implementation FOR SELECT L.L_QUANTITY FROM 
> cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O WHERE 
> cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int);
> *+--+--+*
> *|* *text* *|* *json* *|*
> *+--+--+*
> *|* DrillScreenRel
>   DrillProjectRel(L_QUANTITY=[$1])
>     DrillJoinRel(condition=[=($2, $4)], joinType=[inner])
>       DrillProjectRel(L_ORDERKEY=[$0], L_QUANTITY=[$1], 
> $f2=[CAST($0):INTEGER])
>         DrillScanRel(table=[[cp, tpch/lineitem.parquet]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/lineitem.parquet]], 
> selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_QUANTITY`]]])
>       DrillProjectRel(O_ORDERKEY=[$0], $f1=[CAST($0):INTEGER])
>         DrillScanRel(table=[[cp, tpch/orders.parquet]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`O_ORDERKEY`]]])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan

2018-01-18 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-6099:
-

 Summary: Drill does not push limit past project (flatten) if it 
cannot be pushed into scan
 Key: DRILL-6099
 URL: https://issues.apache.org/jira/browse/DRILL-6099
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.12.0
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai
 Fix For: 1.13.0


It would be useful to have pushdown occur past flatten(project). Here is an 
example to illustrate the issue:

{{explain plan without implementation for }}{{select name, flatten(categories) 
as category from dfs.`/tmp/t_json_20` LIMIT 1;}}

{{DrillScreenRel}}{{  }}

{{  DrillLimitRel(fetch=[1])}}{{    }}

{{    DrillProjectRel(name=[$0], category=[FLATTEN($1)])}}

{{      DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, 
`categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}}

= 

Content of 0_0_0.json

=

{

  "name" : "Eric Goldberg, MD",

  "categories" : [ "Doctors", "Health & Medical" ]

} {

  "name" : "Pine Cone Restaurant",

  "categories" : [ "Restaurants" ]

} {

  "name" : "Deforest Family Restaurant",

  "categories" : [ "American (Traditional)", "Restaurants" ]

} {

  "name" : "Culver's",

  "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", 
"Restaurants" ]

} {

  "name" : "Chang Jiang Chinese Kitchen",

  "categories" : [ "Chinese", "Restaurants" ]

} 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6093) Unneeded columns in Drill logical project

2018-01-18 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-6093:
--
Labels: ready-to-commit  (was: )

> Unneeded columns in Drill logical project
> -
>
> Key: DRILL-6093
> URL: https://issues.apache.org/jira/browse/DRILL-6093
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Here is an example query with the corresponding logical plan. The project 
> contains unnecessary columns L_ORDERKEY, O_ORDERKEY in the projection even 
> when it is not required by subsequent operators e.g. DrillJoinRel.
> EXPLAIN PLAN without implementation FOR SELECT L.L_QUANTITY FROM 
> cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O WHERE 
> cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int);
> *+--+--+*
> *|* *text* *|* *json* *|*
> *+--+--+*
> *|* DrillScreenRel
>   DrillProjectRel(L_QUANTITY=[$1])
>     DrillJoinRel(condition=[=($2, $4)], joinType=[inner])
>       DrillProjectRel(L_ORDERKEY=[$0], L_QUANTITY=[$1], 
> $f2=[CAST($0):INTEGER])
>         DrillScanRel(table=[[cp, tpch/lineitem.parquet]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/lineitem.parquet]], 
> selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_QUANTITY`]]])
>       DrillProjectRel(O_ORDERKEY=[$0], $f1=[CAST($0):INTEGER])
>         DrillScanRel(table=[[cp, tpch/orders.parquet]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`O_ORDERKEY`]]])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6093) Unneeded columns in Drill logical project

2018-01-16 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-6093:
-

 Summary: Unneeded columns in Drill logical project
 Key: DRILL-6093
 URL: https://issues.apache.org/jira/browse/DRILL-6093
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.12.0, 1.11.0
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai
 Fix For: 1.12.0


Here is an example query with the corresponding logical plan. The project 
contains unnecessary columns L_ORDERKEY, O_ORDERKEY in the projection even when 
it is not required by subsequent operators e.g. DrillJoinRel.

EXPLAIN PLAN without implementation FOR SELECT L.L_QUANTITY FROM 
cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O WHERE 
cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int);

*+--+--+*

*|* *text* *|* *json* *|*

*+--+--+*

*|* DrillScreenRel

  DrillProjectRel(L_QUANTITY=[$1])

    DrillJoinRel(condition=[=($2, $4)], joinType=[inner])

      DrillProjectRel(L_ORDERKEY=[$0], L_QUANTITY=[$1], $f2=[CAST($0):INTEGER])

        DrillScanRel(table=[[cp, tpch/lineitem.parquet]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=classpath:/tpch/lineitem.parquet]], 
selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_QUANTITY`]]])

      DrillProjectRel(O_ORDERKEY=[$0], $f1=[CAST($0):INTEGER])

        DrillScanRel(table=[[cp, tpch/orders.parquet]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=classpath:/tpch/orders.parquet]], 
selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`O_ORDERKEY`]]])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-5853) Sort removal based on NULL direction

2017-10-06 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-5853:
-

 Summary: Sort removal based on NULL direction
 Key: DRILL-5853
 URL: https://issues.apache.org/jira/browse/DRILL-5853
 Project: Apache Drill
  Issue Type: Bug
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai


Calcite bugs fixes 969, 970 should be pulled in Drill to correctly apply NULL 
direction for sort-removal. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5794) Projection pushdown does not preserve collation

2017-09-14 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-5794:
-

 Summary: Projection pushdown does not preserve collation
 Key: DRILL-5794
 URL: https://issues.apache.org/jira/browse/DRILL-5794
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai


While look at the projection pushdown into scan rule in Drill it seems like we 
do not consider changes to collation. This would happen in general and not just 
for the projection pushdown across other rels.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-1328) Support table statistics

2017-06-23 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061374#comment-16061374
 ] 

Gautam Kumar Parai commented on DRILL-1328:
---

Based on the last discussion with the reviewers and Drill community members, we 
would hold off on the PR because it also causes regressions in queries in 
TPC-H, TPC-DS benchmarks. We identified that we need histograms and other 
enhancements to fully address the regressions. I would post a new PR once these 
issues are addressed.

> Support table statistics
> 
>
> Key: DRILL-1328
> URL: https://issues.apache.org/jira/browse/DRILL-1328
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Cliff Buchanan
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
> Attachments: 0001-PRE-Set-value-count-in-splitAndTransfer.patch
>
>
> This consists of several subtasks
> * implement operators to generate statistics
> * add "analyze table" support to parser/planner
> * create a metadata provider to allow statistics to be used by optiq in 
> planning optimization
> * implement statistics functions
> Right now, the bulk of this functionality is implemented, but it hasn't been 
> rigorously tested and needs to have some definite answers for some of the 
> parts "around the edges" (how analyze table figures out where the table 
> statistics are located, how a table "append" should work in a read only file 
> system)
> Also, here are a few known caveats:
> * table statistics are collected by creating a sql query based on the string 
> path of the table. This should probably be done with a Table reference.
> * Case sensitivity for column statistics is probably iffy
> * Math for combining two column NDVs into a joint NDV should be checked.
> * Schema changes aren't really being considered yet.
> * adding getDrillTable is probably unnecessary; it might be better to do 
> getTable().unwrap(DrillTable.class)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5319) Refactor FragmentContext and OptionManager for unit testing

2017-04-06 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-5319:
--
Labels: ready-to-commit  (was: )

> Refactor FragmentContext and OptionManager for unit testing
> ---
>
> Key: DRILL-5319
> URL: https://issues.apache.org/jira/browse/DRILL-5319
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> Roll-up task for two refactorings, see the sub-tasks for details. This ticket 
> allows a single PR for the two different refactorings since the work heavily 
> overlaps. See DRILL-5320 and DRILL-5321 for details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5319) Refactor FragmentContext and OptionManager for unit testing

2017-04-06 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959873#comment-15959873
 ] 

Gautam Kumar Parai commented on DRILL-5319:
---

[~paul-rogers] since this can be merged independently and the rest of the 
Drill-5318 changes depends on this PR marking it as ready-to-commit.

> Refactor FragmentContext and OptionManager for unit testing
> ---
>
> Key: DRILL-5319
> URL: https://issues.apache.org/jira/browse/DRILL-5319
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> Roll-up task for two refactorings, see the sub-tasks for details. This ticket 
> allows a single PR for the two different refactorings since the work heavily 
> overlaps. See DRILL-5320 and DRILL-5321 for details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5394) Optimize query planning for MapR-DB tables by caching row counts

2017-03-31 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951400#comment-15951400
 ] 

Gautam Kumar Parai commented on DRILL-5394:
---

[~ppenumarthy] is the code ready to go in Apache? If so , then we should mark 
it as with ready-to-commit tag.

> Optimize query planning for MapR-DB tables by caching row counts
> 
>
> Key: DRILL-5394
> URL: https://issues.apache.org/jira/browse/DRILL-5394
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, Storage - MapRDB
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: MapR-DB-Binary, ready-to-commit
> Fix For: 1.11.0
>
>
> On large MapR-DB tables, it was observed that the query planning time was 
> longer than expected. With DEBUG logs, it was understood that there were 
> multiple calls being made to get MapR-DB region locations and to fetch total 
> row count for tables.
> {code}
> 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Function
> ...
> 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms):
> {code}
> We should cache these stats and reuse them where all required during query 
> planning. This should help reduce query planning time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5049) wrong results - correlated subquery interacting with null equality join

2017-03-13 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai reassigned DRILL-5049:
-

Assignee: Gautam Kumar Parai

> wrong results - correlated subquery interacting with null equality join
> ---
>
> Key: DRILL-5049
> URL: https://issues.apache.org/jira/browse/DRILL-5049
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>Priority: Critical
> Attachments: nullEqJoin_17.drill_res, nullEqJoin_17.postgres, 
> t_alltype.parquet
>
>
> Here is a query that uses null equality join. Drill 1.9.0 returns 124 
> records, whereas Postgres 9.3 returns 145 records. I am on Drill 1.9.0 git 
> commit id: db308549
> I have attached the results from Drill 1.9.0 and Postgres, please review.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for
> . . . . . . . . . . . . . . > SELECT *
> . . . . . . . . . . . . . . > FROM `t_alltype.parquet` t1
> . . . . . . . . . . . . . . > WHERE EXISTS
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > SELECT *
> . . . . . . . . . . . . . . > FROM `t_alltype.parquet` t2
> . . . . . . . . . . . . . . > WHERE t1.c4 = t2.c4 OR (t1.c4 
> IS NULL AND t2.c4 IS NULL)
> . . . . . . . . . . . . . . > );
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(*=[$0])
> 00-02Project(T30¦¦*=[$0])
> 00-03  HashJoin(condition=[AND(=($1, $2), =($1, $3))], 
> joinType=[inner])
> 00-05Project(T30¦¦*=[$0], c4=[$1])
> 00-07  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/t_alltype.parquet]], 
> selectionRoot=maprfs:/tmp/t_alltype.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`*`]]])
> 00-04HashAgg(group=[{0, 1}], agg#0=[MIN($2)])
> 00-06  Project(c40=[$1], c400=[$1], $f0=[true])
> 00-08HashJoin(condition=[IS NOT DISTINCT FROM($0, $1)], 
> joinType=[inner])
> 00-10  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/t_alltype.parquet]], 
> selectionRoot=maprfs:/tmp/t_alltype.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`c4`]]])
> 00-09  Project(c40=[$0])
> 00-11HashAgg(group=[{0}])
> 00-12  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/t_alltype.parquet]], 
> selectionRoot=maprfs:/tmp/t_alltype.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[`c4`]]])
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-3029) Wrong result with correlated not exists subquery

2017-03-13 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai reassigned DRILL-3029:
-

Assignee: Gautam Kumar Parai  (was: Jinfeng Ni)

> Wrong result with correlated not exists subquery
> 
>
> Key: DRILL-3029
> URL: https://issues.apache.org/jira/browse/DRILL-3029
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.0.0
>Reporter: Victoria Markman
>Assignee: Gautam Kumar Parai
>Priority: Critical
> Fix For: Future
>
> Attachments: t1_t2_t3.tar
>
>
> Subquery has correlation to two outer tables in the previous blocks.
> Postgres returns empty result set in this case:
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . > distinct a1
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > t1
> . . . . . . . . . . . . > where   not exists
> . . . . . . . . . . . . > (
> . . . . . . . . . . . . > select
> . . . . . . . . . . . . > *
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > t2
> . . . . . . . . . . . . > where not exists
> . . . . . . . . . . . . > (
> . . . . . . . . . . . . > select
> . . . . . . . . . . . . > *
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > t3
> . . . . . . . . . . . . > where
> . . . . . . . . . . . . > t3.b3 = t2.b2 and
> . . . . . . . . . . . . > t3.a3 = t1.a1
> . . . . . . . . . . . . > )
> . . . . . . . . . . . . > )
> . . . . . . . . . . . . > ;
> ++
> | a1 |
> ++
> | 1  |
> | 2  |
> | 3  |
> | 4  |
> | 5  |
> | 6  |
> | 7  |
> | 9  |
> | 10 |
> | null   |
> ++
> 10 rows selected (0.991 seconds)
> {code}
> Copy/paste reproduction:
> {code}
> select
> distinct a1
> from
> t1
> where   not exists
> (
> select
> *
> from
> t2
> where not exists
> (
> select
> *
> from
> t3
> where
> t3.b3 = t2.b2 and
> t3.a3 = t1.a1
> )
> )
> ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3029) Wrong result with correlated not exists subquery

2017-03-13 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923368#comment-15923368
 ] 

Gautam Kumar Parai commented on DRILL-3029:
---

[~agirish] Instead of empty results I see the following error {quote} ERROR:  
correlated subquery with skip-level correlations is not supported 
(subselect.c:394) {quote}. Can you please take a look?


> Wrong result with correlated not exists subquery
> 
>
> Key: DRILL-3029
> URL: https://issues.apache.org/jira/browse/DRILL-3029
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.0.0
>Reporter: Victoria Markman
>Assignee: Jinfeng Ni
>Priority: Critical
> Fix For: Future
>
> Attachments: t1_t2_t3.tar
>
>
> Subquery has correlation to two outer tables in the previous blocks.
> Postgres returns empty result set in this case:
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . > distinct a1
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > t1
> . . . . . . . . . . . . > where   not exists
> . . . . . . . . . . . . > (
> . . . . . . . . . . . . > select
> . . . . . . . . . . . . > *
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > t2
> . . . . . . . . . . . . > where not exists
> . . . . . . . . . . . . > (
> . . . . . . . . . . . . > select
> . . . . . . . . . . . . > *
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > t3
> . . . . . . . . . . . . > where
> . . . . . . . . . . . . > t3.b3 = t2.b2 and
> . . . . . . . . . . . . > t3.a3 = t1.a1
> . . . . . . . . . . . . > )
> . . . . . . . . . . . . > )
> . . . . . . . . . . . . > ;
> ++
> | a1 |
> ++
> | 1  |
> | 2  |
> | 3  |
> | 4  |
> | 5  |
> | 6  |
> | 7  |
> | 9  |
> | 10 |
> | null   |
> ++
> 10 rows selected (0.991 seconds)
> {code}
> Copy/paste reproduction:
> {code}
> select
> distinct a1
> from
> t1
> where   not exists
> (
> select
> *
> from
> t2
> where not exists
> (
> select
> *
> from
> t3
> where
> t3.b3 = t2.b2 and
> t3.a3 = t1.a1
> )
> )
> ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5088) Error when reading DBRef column

2017-02-04 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852859#comment-15852859
 ] 

Gautam Kumar Parai commented on DRILL-5088:
---

[~cshi] I think you should mark this issue as `blocked by` DRILL-5196 (using 
issue links)? Putting a `ready-to-commit` tag may cause this to be committed 
prior to DRILL-5196 which will break the testcase.

> Error when reading DBRef column
> ---
>
> Key: DRILL-5088
> URL: https://issues.apache.org/jira/browse/DRILL-5088
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
> Environment: drill 1.9.0
> mongo 3.2
>Reporter: Guillaume Champion
>Assignee: Chunhui Shi
>
> In a mongo database with DBRef, when a DBRef is inserted in the first line of 
> a mongo's collection drill query failed :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> {code}
> Simple example to reproduce:
> In mongo instance
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> [Error Id: 2944d766-e483-4453-a706-3d481397b186 on Analytics-Biznet:31010] 
> (state=,code=0)
> {code}
> If the first line doesn't contain de DBRef, drill will querying correctly :
> In a mongo instance :
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8939") });
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> +--+---+
> | _id  |account   
>  |
> +--+---+
> | {"$oid":"582081d96b69060001fd8939"}  | {"$id":{}}   
>  |
> | {"$oid":"582081d96b69060001fd8938"}  | 
> {"$ref":"contact","$id":{"$oid":"999cbf116b69060001fd8611"}}  |
> +--+---+
> 2 rows selected (0,563 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5088) Error when reading DBRef column

2017-01-13 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822090#comment-15822090
 ] 

Gautam Kumar Parai commented on DRILL-5088:
---

Sorry, jumped the gun on the +1. I had one more question - Can we add a 
testcase for the bug? Thanks.

> Error when reading DBRef column
> ---
>
> Key: DRILL-5088
> URL: https://issues.apache.org/jira/browse/DRILL-5088
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
> Environment: drill 1.9.0
> mongo 3.2
>Reporter: Guillaume Champion
>Assignee: Chunhui Shi
>
> In a mongo database with DBRef, when a DBRef is inserted in the first line of 
> a mongo's collection drill query failed :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> {code}
> Simple example to reproduce:
> In mongo instance
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> [Error Id: 2944d766-e483-4453-a706-3d481397b186 on Analytics-Biznet:31010] 
> (state=,code=0)
> {code}
> If the first line doesn't contain de DBRef, drill will querying correctly :
> In a mongo instance :
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8939") });
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> +--+---+
> | _id  |account   
>  |
> +--+---+
> | {"$oid":"582081d96b69060001fd8939"}  | {"$id":{}}   
>  |
> | {"$oid":"582081d96b69060001fd8938"}  | 
> {"$ref":"contact","$id":{"$oid":"999cbf116b69060001fd8611"}}  |
> +--+---+
> 2 rows selected (0,563 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4919) Fix select count(1) / count(*) on csv with header

2017-01-13 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-4919:
--
Labels: ready-to-commit  (was: )

> Fix select count(1) / count(*) on csv with header
> -
>
> Key: DRILL-4919
> URL: https://issues.apache.org/jira/browse/DRILL-4919
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: F Méthot
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: Future
>
>
> This happens since  1.8
> Dataset (I used extended char for display purpose) test.csvh:
> a,b,c,d\n
> 1,2,3,4\n
> 5,6,7,8\n
> Storage config:
> "csvh": {
>   "type": "text",
>   "extensions" : [
>   "csvh"
>],
>"extractHeader": true,
>"delimiter": ","
>   }
> select count(1) from dfs.`test.csvh`
> Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header 
> names are supported
> coumn name columns
> column index
> Fragment 0:0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5043) Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID()

2016-12-06 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726877#comment-15726877
 ] 

Gautam Kumar Parai commented on DRILL-5043:
---

Hi Nagarajan,

Any updates? 

Regarding your questions:

{quote} I am not sure about one change I made in BitControl.java in the 
following block: {quote}
It should be 32 instead of 36

{quote} Also, I am not sure how to incorporate session_id into "descriptorData" 
static variable that is initialized at line number 9073 in BitControl.java. 
Please advice. {quote}
I think no change is required.

Can you please post the pull request? We can only start the review process when 
we have a pull request.

> Function that returns a unique id per session/connection similar to MySQL's 
> CONNECTION_ID()
> ---
>
> Key: DRILL-5043
> URL: https://issues.apache.org/jira/browse/DRILL-5043
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Nagarajan Chinnasamy
>Priority: Minor
>  Labels: CONNECTION_ID, SESSION, UDF
> Attachments: 01_session_id_sqlline.png, 
> 02_session_id_webconsole_query.png, 03_session_id_webconsole_result.png
>
>
> Design and implement a function that returns a unique id per 
> session/connection similar to MySQL's CONNECTION_ID().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-5048) AssertionError when case statement is used with timestamp and null

2016-12-05 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-5048:
--
Reviewer: Gautam Kumar Parai

> AssertionError when case statement is used with timestamp and null
> --
>
> Key: DRILL-5048
> URL: https://issues.apache.org/jira/browse/DRILL-5048
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
>  Labels: ready-to-commit
> Fix For: Future
>
>
> AssertionError when we use case with timestamp and null:
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT res, CASE res WHEN true THEN 
> CAST('1990-10-10 22:40:50' AS TIMESTAMP) ELSE null END
> . . . . . . . . . . . . . . > FROM
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > SELECT
> . . . . . . . . . . . . . . > (CASE WHEN (false) THEN null ELSE 
> CAST('1990-10-10 22:40:50' AS TIMESTAMP) END) res
> . . . . . . . . . . . . . . > FROM (values(1)) foo
> . . . . . . . . . . . . . . > ) foobar;
> Error: SYSTEM ERROR: AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL
> rowtype of set:
> RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL
> [Error Id: b56e0a4d-2f9e-4afd-8c60-5bc2f9d31f8f on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> Caused by: java.lang.AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL
> rowtype of set:
> RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL
> at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:1696) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at org.apache.calcite.plan.volcano.RelSubset.add(RelSubset.java:295) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at org.apache.calcite.plan.volcano.RelSet.add(RelSet.java:147) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1818)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1760)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1017)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1037)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1940)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:138)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> ... 16 common frames omitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-5048) AssertionError when case statement is used with timestamp and null

2016-12-05 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-5048:
--
Labels: ready-to-commit  (was: )

> AssertionError when case statement is used with timestamp and null
> --
>
> Key: DRILL-5048
> URL: https://issues.apache.org/jira/browse/DRILL-5048
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Serhii Harnyk
>Assignee: Gautam Kumar Parai
>  Labels: ready-to-commit
> Fix For: Future
>
>
> AssertionError when we use case with timestamp and null:
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT res, CASE res WHEN true THEN 
> CAST('1990-10-10 22:40:50' AS TIMESTAMP) ELSE null END
> . . . . . . . . . . . . . . > FROM
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > SELECT
> . . . . . . . . . . . . . . > (CASE WHEN (false) THEN null ELSE 
> CAST('1990-10-10 22:40:50' AS TIMESTAMP) END) res
> . . . . . . . . . . . . . . > FROM (values(1)) foo
> . . . . . . . . . . . . . . > ) foobar;
> Error: SYSTEM ERROR: AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL
> rowtype of set:
> RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL
> [Error Id: b56e0a4d-2f9e-4afd-8c60-5bc2f9d31f8f on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> Caused by: java.lang.AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL
> rowtype of set:
> RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL
> at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:1696) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at org.apache.calcite.plan.volcano.RelSubset.add(RelSubset.java:295) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at org.apache.calcite.plan.volcano.RelSet.add(RelSet.java:147) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1818)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1760)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1017)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1037)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1940)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:138)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> ... 16 common frames omitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-5048) AssertionError when case statement is used with timestamp and null

2016-12-05 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-5048:
--
Assignee: Serhii Harnyk  (was: Gautam Kumar Parai)

> AssertionError when case statement is used with timestamp and null
> --
>
> Key: DRILL-5048
> URL: https://issues.apache.org/jira/browse/DRILL-5048
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
>  Labels: ready-to-commit
> Fix For: Future
>
>
> AssertionError when we use case with timestamp and null:
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT res, CASE res WHEN true THEN 
> CAST('1990-10-10 22:40:50' AS TIMESTAMP) ELSE null END
> . . . . . . . . . . . . . . > FROM
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > SELECT
> . . . . . . . . . . . . . . > (CASE WHEN (false) THEN null ELSE 
> CAST('1990-10-10 22:40:50' AS TIMESTAMP) END) res
> . . . . . . . . . . . . . . > FROM (values(1)) foo
> . . . . . . . . . . . . . . > ) foobar;
> Error: SYSTEM ERROR: AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL
> rowtype of set:
> RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL
> [Error Id: b56e0a4d-2f9e-4afd-8c60-5bc2f9d31f8f on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> Caused by: java.lang.AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL
> rowtype of set:
> RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL
> at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:1696) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at org.apache.calcite.plan.volcano.RelSubset.add(RelSubset.java:295) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at org.apache.calcite.plan.volcano.RelSet.add(RelSet.java:147) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1818)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1760)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1017)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1037)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1940)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:138)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> ... 16 common frames omitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5043) Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID()

2016-11-21 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684196#comment-15684196
 ] 

Gautam Kumar Parai commented on DRILL-5043:
---

Could you please post a link to your GitHub branch which has the code? It would 
be easier for the community to discuss/review the changes. 

FYI - Sorry, I do not have the answers to your questions. Let's wait for input 
from the community.

> Function that returns a unique id per session/connection similar to MySQL's 
> CONNECTION_ID()
> ---
>
> Key: DRILL-5043
> URL: https://issues.apache.org/jira/browse/DRILL-5043
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Nagarajan Chinnasamy
>Priority: Minor
>  Labels: CONNECTION_ID, SESSION, UDF
> Attachments: 01_session_id_sqlline.png, 
> 02_session_id_webconsole_query.png, 03_session_id_webconsole_result.png
>
>
> Design and implement a function that returns a unique id per 
> session/connection similar to MySQL's CONNECTION_ID().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4792) Include session options used for a query as part of the profile

2016-11-16 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15672432#comment-15672432
 ] 

Gautam Kumar Parai commented on DRILL-4792:
---

It does not seem to work with the option `store.format`

> Include session options used for a query as part of the profile
> ---
>
> Key: DRILL-4792
> URL: https://issues.apache.org/jira/browse/DRILL-4792
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.7.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.9.0
>
> Attachments: no_session_options.JPG, session_options_block.JPG, 
> session_options_collapsed.JPG, session_options_json.JPG
>
>
> Include session options used for a query as part of the profile.
> This will be very useful for debugging/diagnostics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-5043) Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID()

2016-11-16 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-5043:
--
Affects Version/s: 1.8.0

> Function that returns a unique id per session/connection similar to MySQL's 
> CONNECTION_ID()
> ---
>
> Key: DRILL-5043
> URL: https://issues.apache.org/jira/browse/DRILL-5043
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Nagarajan Chinnasamy
>Priority: Minor
>  Labels: CONNECTION_ID, SESSION, UDF
>
> Design and implement a function that returns a unique id per 
> session/connection similar to MySQL's CONNECTION_ID().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5043) Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID()

2016-11-16 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671468#comment-15671468
 ] 

Gautam Kumar Parai commented on DRILL-5043:
---

Thanks for creating the JIRA. Please see the following:

[1] on how to contribute to Drill.
[2] on how to create a custom UDF.
[3] on how to pass some param (sessionID in your case) to the UDF.

1. https://drill.apache.org/docs/contribute-to-drill/
2. https://drill.apache.org/docs/develop-custom-functions/
3. QueryContext.java. See how session.getDefaultSchemaName() is passed using 
the QueryContextInformation.


> Function that returns a unique id per session/connection similar to MySQL's 
> CONNECTION_ID()
> ---
>
> Key: DRILL-5043
> URL: https://issues.apache.org/jira/browse/DRILL-5043
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Nagarajan Chinnasamy
>  Labels: CONNECTION_ID, SESSION, UDF
>
> Design and implement a function that returns a unique id per 
> session/connection similar to MySQL's CONNECTION_ID().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-5043) Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID()

2016-11-16 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-5043:
--
Priority: Minor  (was: Major)

> Function that returns a unique id per session/connection similar to MySQL's 
> CONNECTION_ID()
> ---
>
> Key: DRILL-5043
> URL: https://issues.apache.org/jira/browse/DRILL-5043
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Nagarajan Chinnasamy
>Priority: Minor
>  Labels: CONNECTION_ID, SESSION, UDF
>
> Design and implement a function that returns a unique id per 
> session/connection similar to MySQL's CONNECTION_ID().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4992) Failing query with same case statement in both select and order by clause using using hash aggregate

2016-11-02 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-4992:
-

 Summary: Failing query with same case statement in both select and 
order by clause using using hash aggregate
 Key: DRILL-4992
 URL: https://issues.apache.org/jira/browse/DRILL-4992
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.8.0, 1.7.0
Reporter: Gautam Kumar Parai


Queries that contain case statement in both the select and order by clause as a 
sub query. Here is an example of such query:


dummy
+-++
|   c_date|  c_timestamp   |
+-++
| 2015-04-23  | 2014-03-16 03:55:21.0  |
+-++

alter session set `planner.enable_streamagg` = false;
select distinct a1 from ( select SUM(case when c_timestamp is null then 0 else 
1 end) from dummy group by c_date order by SUM(case when c_timestamp is null 
then 0 else 1 end)) as dt(a1);

Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema 
changes
Note that this table is a single table. 
Below is the stacktrace from log file:
{code}
2016-10-31 15:57:45,643 [27e83395-8074-7c94-e318-a6b54176ea9d:frag:0:0] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 27e83395-8074-7c94-e318-a6b54176ea9d:0:0: 
State to report: RUNNING
2016-10-31 15:57:45,665 [27e83395-8074-7c94-e318-a6b54176ea9d:frag:0:0] INFO  
o.a.d.e.p.i.aggregate.HashAggBatch - User Error Occurred
org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION ERROR: 
Hash aggregate does not support schema changes


[Error Id: ad36df25-3ea8-4c07-87a0-f105b1ce5ae1 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
 ~[drill-common-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:144)
 [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
 [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
[drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
 [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
[drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:232)
 [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:226)
 [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at java.security.AccessController.doPrivileged(Native Method) 
[na:1.7.0_67]
at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_67]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
 [hadoop-common-2.7.0-mapr-1607.jar:na]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:226)
 [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_67]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_67]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]
2016-10-31 15:57:45,665 [27e83395-8074-7c94-e318-a6b54176ea9d:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 27e83395-8074-7c94-e318-a6b54176ea9d:0:0: 
State change requested RUNNING --> FAILED
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4674) Allow casting to boolean the same literals as in Postgre

2016-11-01 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-4674:
--
Assignee: Arina Ielchiieva  (was: Gautam Kumar Parai)

> Allow casting to boolean the same literals as in Postgre
> 
>
> Key: DRILL-4674
> URL: https://issues.apache.org/jira/browse/DRILL-4674
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
>Affects Versions: 1.7.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> Drill does not return results when we try to cast 0 and 1 to boolean inside a 
> value constructor.
> Drill version : 1.7.0-SNAPSHOT  commit ID : 09b26277
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> values(cast(1 as boolean));
> Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 1
> Fragment 0:0
> [Error Id: 35dcc4bb-0c5d-466f-8fb5-cf7f0a892155 on centos-02.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> values(cast(0 as boolean));
> Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 0
> Fragment 0:0
> [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Where as we get results on Postgres for same query.
> {noformat}
> postgres=# values(cast(1 as boolean));
>  column1
> -
>  t
> (1 row)
> postgres=# values(cast(0 as boolean));
>  column1
> -
>  f
> (1 row)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-05-13 07:16:16,578 [28ca80bf-0af9-bc05-258b-6b5744739ed8:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalArgumentException: 
> Invalid value for boolean: 0
> Fragment 0:0
> [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalArgumentException: Invalid value for boolean: 0
> Fragment 0:0
> [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: java.lang.IllegalArgumentException: Invalid value for boolean: 0
> at 
> org.apache.drill.exec.test.generated.ProjectorGen9.doSetup(ProjectorTemplate.java:95)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.ProjectorGen9.setup(ProjectorTemplate.java:93)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:444)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:257)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> 

[jira] [Commented] (DRILL-4864) Add ANSI format for date/time functions

2016-10-31 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622571#comment-15622571
 ] 

Gautam Kumar Parai commented on DRILL-4864:
---

Sorry, I forgot to reassign it back to the developer. I approved the pull 
request earlier. [~sharnyk] Please commit unless [~adeneche] has further 
comments? 

> Add ANSI format for date/time functions
> ---
>
> Key: DRILL-4864
> URL: https://issues.apache.org/jira/browse/DRILL-4864
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
> layer. This is not following SQL conventions used by ANSI and many other 
> database engines on the market.
> Add new UDF "ansi_to_joda(string)", that takes string that represents ANSI 
> datetime format and returns string that represents equal Joda format.
> Add new session option "drill.exec.fn.to_date_format" that can be one of two 
> values - "JODA"(default) and "ANSI".
> If option is set to "JODA" queries with to_date() function would work in 
> usual way.
> If option is set to "ANSI" second argument would be wrapped with 
> ansi_to_joda() function, that allows user to use ANSI datetime format
> Wrapping is used in to_date(), to_time() and to_timestamp() functions.
> Table of joda and ansi patterns which may be replaced
> ||Pattern name||  Ansi format ||  JodaTime format
> | Full name of day|   day |   
> | Day of year |   ddd |   D
> | Day of month|   dd  |   d
> | Day of week |   d   |   e
> | Name of month   |   month   |   
> | Abr name of month   |   mon |   MMM
> | Full era name   |   ee  |   G
> | Name of day |   dy  |   E
> | Time zone   |   tz  |   TZ
> | Hour 12 |   hh  |   h
> | Hour 12 |   hh12|   h
> | Hour 24 |   hh24|   H
> | Minute of hour  |   mi  |   m
> | Second of minute|   ss  |   s
> | Millisecond of minute   |   ms  |   S
> | Week of year|   ww  |   w
> | Month   |   mm  |   MM
> | Halfday am  |   am  |   aa
> | Halfday pm  |   pm  |   aa
> | ref.|   
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html| 
>   
> http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html
>  |
> Table of ansi pattern modifiers, which may be deleted from string
> ||Description ||  Pattern ||
> | fill mode (suppress padding blanks and zeroes)  |   fm  |
> | fixed format global option (see usage notes)|   fx  |
> | translation mode (print localized day and month names based on 
> lc_messages) |   tm  |
> | spell mode (not yet implemented)|   sp  |
> | ref.|   
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4864) Add ANSI format for date/time functions

2016-10-31 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-4864:
--
Assignee: Serhii Harnyk  (was: Gautam Kumar Parai)

> Add ANSI format for date/time functions
> ---
>
> Key: DRILL-4864
> URL: https://issues.apache.org/jira/browse/DRILL-4864
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
>  Labels: doc-impacting
> Fix For: 1.9.0
>
>
> The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
> layer. This is not following SQL conventions used by ANSI and many other 
> database engines on the market.
> Add new UDF "ansi_to_joda(string)", that takes string that represents ANSI 
> datetime format and returns string that represents equal Joda format.
> Add new session option "drill.exec.fn.to_date_format" that can be one of two 
> values - "JODA"(default) and "ANSI".
> If option is set to "JODA" queries with to_date() function would work in 
> usual way.
> If option is set to "ANSI" second argument would be wrapped with 
> ansi_to_joda() function, that allows user to use ANSI datetime format
> Wrapping is used in to_date(), to_time() and to_timestamp() functions.
> Table of joda and ansi patterns which may be replaced
> ||Pattern name||  Ansi format ||  JodaTime format
> | Full name of day|   day |   
> | Day of year |   ddd |   D
> | Day of month|   dd  |   d
> | Day of week |   d   |   e
> | Name of month   |   month   |   
> | Abr name of month   |   mon |   MMM
> | Full era name   |   ee  |   G
> | Name of day |   dy  |   E
> | Time zone   |   tz  |   TZ
> | Hour 12 |   hh  |   h
> | Hour 12 |   hh12|   h
> | Hour 24 |   hh24|   H
> | Minute of hour  |   mi  |   m
> | Second of minute|   ss  |   s
> | Millisecond of minute   |   ms  |   S
> | Week of year|   ww  |   w
> | Month   |   mm  |   MM
> | Halfday am  |   am  |   aa
> | Halfday pm  |   pm  |   aa
> | ref.|   
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html| 
>   
> http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html
>  |
> Table of ansi pattern modifiers, which may be deleted from string
> ||Description ||  Pattern ||
> | fill mode (suppress padding blanks and zeroes)  |   fm  |
> | fixed format global option (see usage notes)|   fx  |
> | translation mode (print localized day and month names based on 
> lc_messages) |   tm  |
> | spell mode (not yet implemented)|   sp  |
> | ref.|   
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1328) Support table statistics

2016-10-10 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563926#comment-15563926
 ] 

Gautam Kumar Parai commented on DRILL-1328:
---

I have created a new PR to address the review comments. [~amansinha100] can you 
please review the PR? Thanks!

> Support table statistics
> 
>
> Key: DRILL-1328
> URL: https://issues.apache.org/jira/browse/DRILL-1328
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Cliff Buchanan
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
> Attachments: 0001-PRE-Set-value-count-in-splitAndTransfer.patch
>
>
> This consists of several subtasks
> * implement operators to generate statistics
> * add "analyze table" support to parser/planner
> * create a metadata provider to allow statistics to be used by optiq in 
> planning optimization
> * implement statistics functions
> Right now, the bulk of this functionality is implemented, but it hasn't been 
> rigorously tested and needs to have some definite answers for some of the 
> parts "around the edges" (how analyze table figures out where the table 
> statistics are located, how a table "append" should work in a read only file 
> system)
> Also, here are a few known caveats:
> * table statistics are collected by creating a sql query based on the string 
> path of the table. This should probably be done with a Table reference.
> * Case sensitivity for column statistics is probably iffy
> * Math for combining two column NDVs into a joint NDV should be checked.
> * Schema changes aren't really being considered yet.
> * adding getDrillTable is probably unnecessary; it might be better to do 
> getTable().unwrap(DrillTable.class)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4862) wrong results - use of convert_from(binary_string(key),'UTF8') in filter results in wrong results

2016-10-06 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai reassigned DRILL-4862:
-

Assignee: Chunhui Shi  (was: Gautam Kumar Parai)

+1. The changes look good.

> wrong results - use of convert_from(binary_string(key),'UTF8') in filter 
> results in wrong results
> -
>
> Key: DRILL-4862
> URL: https://issues.apache.org/jira/browse/DRILL-4862
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Chunhui Shi
>
> These results do not look right, i.e when the predicate has 
> convert_from(binary_string(key),'UTF8')
> Apache drill 1.8.0-SNAPSHOT git commit ID: 57dc9f43
> {noformat}
> [root@centos-0x drill4478]# cat f1.json
> {"key":"\\x30\\x31\\x32\\x33"}
> {"key":"\\x34\\x35\\x36\\x37"}
> {"key":"\\x38\\x39\\x30\\x31"}
> {"key":"\\x30\\x30\\x30\\x30"}
> {"key":"\\x31\\x31\\x31\\x31"}
> {"key":"\\x35\\x35\\x35\\x35"}
> {"key":"\\x38\\x38\\x38\\x38"}
> {"key":"\\x39\\x39\\x39\\x39"}
> {"key":"\\x41\\x42\\x43\\x44"}
> {"key":"\\x45\\x46\\x47\\x48"}
> {"key":"\\x49\\x41\\x44\\x46"}
> {"key":"\\x4a\\x4b\\x4c\\x4d"}
> {"key":"\\x57\\x58\\x59\\x5a"}
> {"key":"\\x4e\\x4f\\x50\\x51"}
> {"key":"\\x46\\x46\\x46\\x46"}
> {noformat}
> results without the predicate - these are correct results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json`;
> +-+
> | EXPR$0  |
> +-+
> | 0123|
> | 4567|
> | 8901|
> | |
> | |
> | |
> | |
> | |
> | ABCD|
> | EFGH|
> | IADF|
> | JKLM|
> | WXYZ|
> | NOPQ|
> | |
> +-+
> 15 rows selected (0.256 seconds)
> {noformat}
> results with a predicate - these results don't look correct
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json` where convert_from(binary_string(key),'UTF8') is not null;
> +--+
> |  EXPR$0  |
> +--+
> | 0123123  |
> | 4567567  |
> | 8901901  |
> | 000  |
> | 111  |
> | 555  |
> | 888  |
> | 999  |
> | ABCDBCD  |
> | EFGHFGH  |
> | IADFADF  |
> | JKLMKLM  |
> | WXYZXYZ  |
> | NOPQOPQ  |
> | FFF  |
> +--+
> 15 rows selected (0.279 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4929) Drill unable to propagate selectivity/distinctrowcount through RelSubset

2016-10-04 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-4929:
-

 Summary: Drill unable to propagate selectivity/distinctrowcount 
through RelSubset
 Key: DRILL-4929
 URL: https://issues.apache.org/jira/browse/DRILL-4929
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai


Drill only has access to the best alternative plan. Calcite needs to expose the 
set.rel within RelSubset which can be utilized during Drill logical planning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4902) nested aggregate query does not complain about missing GROUP BY clause

2016-09-23 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai reassigned DRILL-4902:
-

Assignee: Gautam Kumar Parai

> nested aggregate query does not complain about missing GROUP BY clause
> --
>
> Key: DRILL-4902
> URL: https://issues.apache.org/jira/browse/DRILL-4902
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>  Labels: window_function
>
> A nested aggregate windowed query does not report an error when the 
> partitioning column is not used in the GROUP BY clause.
> Drill 1.9.0
> This is the correct expected behavior.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select count(max(c7)) over (partition by c8) 
> from `DRILL_4589`;
> Error: VALIDATION ERROR: From line 1, column 42 to line 1, column 43: 
> Expression 'c8' is not being grouped
> SQL Query null
> [Error Id: 09c837b9-7a66-4a1f-9fbc-522160947274 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> The below query too should report above error, as the GROUP BY on 
> partitioning column is missing.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select count(max(c7)) over (partition by c8) 
> from (select * from `DRILL_4589`);
> +-+
> | EXPR$0  |
> +-+
> | 1   |
> +-+
> 1 row selected (193.71 seconds)
> {noformat}
> Postgres 9.3 also reports an error for a similar query
> {noformat}
> postgres=# select count(max(c1)) over (partition by c2) from (select * from 
> t222) sub_query;
> ERROR:  column "sub_query.c2" must appear in the GROUP BY clause or be used 
> in an aggregate function
> LINE 1: select count(max(c1)) over (partition by c2) from (select * ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4771) Drill should avoid doing the same join twice if count(distinct) exists

2016-09-19 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai resolved DRILL-4771.
---
   Resolution: Fixed
Fix Version/s: 1.9.0

Closed with commit: 229571533bce1e37395d9675ea804ee97b1a2362

> Drill should avoid doing the same join twice if count(distinct) exists
> --
>
> Key: DRILL-4771
> URL: https://issues.apache.org/jira/browse/DRILL-4771
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
> Fix For: 1.9.0
>
>
> When the query has one distinct aggregate and one or more non-distinct 
> aggregates, the join instance need not produce the join-based plan. We can 
> generate multi-phase aggregates. Another approach would be to use grouping 
> sets. However, Drill is unable to support grouping sets and instead relies on 
> the join-based plan (see the plan below)
> {code}
> select emp.empno, count(*), avg(distinct dept.deptno) 
> from sales.emp emp inner join sales.dept dept 
> on emp.deptno = dept.deptno 
> group by emp.empno
> LogicalProject(EMPNO=[$0], EXPR$1=[$1], EXPR$2=[$3])
>   LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner])
> LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
>   LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
> LogicalJoin(condition=[=($7, $9)], joinType=[inner])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalAggregate(group=[{0}], EXPR$2=[AVG($1)])
>   LogicalAggregate(group=[{0, 1}])
> LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
>   LogicalJoin(condition=[=($7, $9)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> The more efficient form should look like this
> {code}
> select emp.empno, count(*), avg(distinct dept.deptno) 
> from sales.emp emp inner join sales.dept dept 
> on emp.deptno = dept.deptno 
> group by emp.empno
> LogicalAggregate(group=[{0}], EXPR$1=[SUM($2)], EXPR$2=[AVG($1)])
>   LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()])
> LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
>   LogicalJoin(condition=[=($7, $9)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4895) StreamingAggBatch code generation issues

2016-09-16 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15497884#comment-15497884
 ] 

Gautam Kumar Parai commented on DRILL-4895:
---

This is a potential performance issue - hence not critical.

> StreamingAggBatch code generation issues
> 
>
> Key: DRILL-4895
> URL: https://issues.apache.org/jira/browse/DRILL-4895
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.7.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>
> We unnecessarily re-generate the code for the StreamingAggBatch even without 
> schema changes. Also, we seem to generate many holder variables than what 
> maybe required. This also affects sub-classes. HashAggBatch does not have the 
> same issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4895) StreamingAggBatch code generation issues

2016-09-16 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-4895:
-

 Summary: StreamingAggBatch code generation issues
 Key: DRILL-4895
 URL: https://issues.apache.org/jira/browse/DRILL-4895
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.7.0
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai


We unnecessarily re-generate the code for the StreamingAggBatch even without 
schema changes. Also, we seem to generate many holder variables than what maybe 
required. This also affects sub-classes. HashAggBatch does not have the same 
issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4771) Drill should avoid doing the same join twice if count(distinct) exists

2016-09-16 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15497847#comment-15497847
 ] 

Gautam Kumar Parai edited comment on DRILL-4771 at 9/17/16 12:51 AM:
-

I have created the pull request https://github.com/apache/drill/pull/588. 
[~amansinha100] [~jni] Can you please review the PR? Thanks!


was (Author: gparai):
I have created the pull request https://github.com/apache/drill/pull/588. 
[~amansinha100] Can you please review the PR? Thanks!

> Drill should avoid doing the same join twice if count(distinct) exists
> --
>
> Key: DRILL-4771
> URL: https://issues.apache.org/jira/browse/DRILL-4771
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>
> When the query has one distinct aggregate and one or more non-distinct 
> aggregates, the join instance need not produce the join-based plan. We can 
> generate multi-phase aggregates. Another approach would be to use grouping 
> sets. However, Drill is unable to support grouping sets and instead relies on 
> the join-based plan (see the plan below)
> {code}
> select emp.empno, count(*), avg(distinct dept.deptno) 
> from sales.emp emp inner join sales.dept dept 
> on emp.deptno = dept.deptno 
> group by emp.empno
> LogicalProject(EMPNO=[$0], EXPR$1=[$1], EXPR$2=[$3])
>   LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner])
> LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
>   LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
> LogicalJoin(condition=[=($7, $9)], joinType=[inner])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalAggregate(group=[{0}], EXPR$2=[AVG($1)])
>   LogicalAggregate(group=[{0, 1}])
> LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
>   LogicalJoin(condition=[=($7, $9)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> The more efficient form should look like this
> {code}
> select emp.empno, count(*), avg(distinct dept.deptno) 
> from sales.emp emp inner join sales.dept dept 
> on emp.deptno = dept.deptno 
> group by emp.empno
> LogicalAggregate(group=[{0}], EXPR$1=[SUM($2)], EXPR$2=[AVG($1)])
>   LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()])
> LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
>   LogicalJoin(condition=[=($7, $9)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



[jira] [Commented] (DRILL-4771) Drill should avoid doing the same join twice if count(distinct) exists

2016-09-16 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15497847#comment-15497847
 ] 

Gautam Kumar Parai commented on DRILL-4771:
---

I have created the pull request https://github.com/apache/drill/pull/588. 
[~amansinha100] Can you please review the PR? Thanks!

> Drill should avoid doing the same join twice if count(distinct) exists
> --
>
> Key: DRILL-4771
> URL: https://issues.apache.org/jira/browse/DRILL-4771
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>
> When the query has one distinct aggregate and one or more non-distinct 
> aggregates, the join instance need not produce the join-based plan. We can 
> generate multi-phase aggregates. Another approach would be to use grouping 
> sets. However, Drill is unable to support grouping sets and instead relies on 
> the join-based plan (see the plan below)
> {code}
> select emp.empno, count(*), avg(distinct dept.deptno) 
> from sales.emp emp inner join sales.dept dept 
> on emp.deptno = dept.deptno 
> group by emp.empno
> LogicalProject(EMPNO=[$0], EXPR$1=[$1], EXPR$2=[$3])
>   LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner])
> LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
>   LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
> LogicalJoin(condition=[=($7, $9)], joinType=[inner])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalAggregate(group=[{0}], EXPR$2=[AVG($1)])
>   LogicalAggregate(group=[{0, 1}])
> LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
>   LogicalJoin(condition=[=($7, $9)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}
> The more efficient form should look like this
> {code}
> select emp.empno, count(*), avg(distinct dept.deptno) 
> from sales.emp emp inner join sales.dept dept 
> on emp.deptno = dept.deptno 
> group by emp.empno
> LogicalAggregate(group=[{0}], EXPR$1=[SUM($2)], EXPR$2=[AVG($1)])
>   LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()])
> LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
>   LogicalJoin(condition=[=($7, $9)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4796) AssertionError - Nested sum(avg(c1)) over window

2016-08-09 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai resolved DRILL-4796.
---
Resolution: Fixed

Closed with commit: 0bac42dec63a46ca787f6c5fe5a51b9a97e0d6cc

> AssertionError - Nested sum(avg(c1)) over window
> 
>
> Key: DRILL-4796
> URL: https://issues.apache.org/jira/browse/DRILL-4796
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>
> Nested window function query fails on MapR Drill 1.8.0 commit ID 34ca63ba
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select sum(avg(c1)) over (partition by c2) from 
> `tblWnulls.parquet`;
> Error: SYSTEM ERROR: AssertionError: todo: implement syntax 
> FUNCTION_STAR(COUNT($1))
> [Error Id: fa5e1751-87a2-4880-baf9-7e132253be7c on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> stack trace from drillbit.log
> {noformat}
> 2016-07-21 11:25:40,023 [286f4ecc-59bd-113e-1edf-d93411b255aa:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 286f4ecc-59bd-113e-1edf-d93411b255aa: select sum(avg(c1)) over (partition by 
> c2) from `tblWnulls.parquet`
> ...
> 2016-07-21 11:25:40,183 [286f4ecc-59bd-113e-1edf-d93411b255aa:foreman] ERROR 
> o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: AssertionError: todo: 
> implement syntax FUNCTION_STAR(COUNT($1))
> [Error Id: fa5e1751-87a2-4880-baf9-7e132253be7c on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> AssertionError: todo: implement syntax FUNCTION_STAR(COUNT($1))
> [Error Id: fa5e1751-87a2-4880-baf9-7e132253be7c on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:791)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:901) 
> [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:271) 
> [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: 
> Unexpected exception during fragment initialization: todo: implement syntax 
> FUNCTION_STAR(COUNT($1))
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: todo: implement syntax 
> FUNCTION_STAR(COUNT($1))
> at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:198)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:80)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) 
> ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.doFunction(DrillOptiq.java:205)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:105)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:80)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) 
> ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.drill.exec.planner.logical.DrillOptiq.toDrill(DrillOptiq.java:77) 
> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.common.DrillProjectRelBase.getProjectExpressions(DrillProjectRelBase.java:111)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.physical.ProjectPrel.getPhysicalOperator(ProjectPrel.java:59)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.physical.SortPrel.getPhysicalOperator(SortPrel.java:81)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.physical.SelectionVectorRemoverPrel.getPhysicalOperator(SelectionVectorRemoverPrel.java:48)
>  

[jira] [Resolved] (DRILL-4795) Nested aggregate windowed query fails - IllegalStateException

2016-08-09 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai resolved DRILL-4795.
---
Resolution: Fixed

Closed with commit: 0bac42dec63a46ca787f6c5fe5a51b9a97e0d6cc

> Nested aggregate windowed query fails - IllegalStateException 
> --
>
> Key: DRILL-4795
> URL: https://issues.apache.org/jira/browse/DRILL-4795
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node CentOS cluster
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>Priority: Critical
> Attachments: tblWnulls.parquet
>
>
> The below two window function queries fail on MapR Drill 1.8.0 commit ID 
> 34ca63ba
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select avg(sum(c1)) over (partition by c1) from 
> `tblWnulls.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
> mappings beyond
> Fragment 0:0
> [Error Id: b32ed6b0-6b81-4d5f-bce0-e4ea269c5af1 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select avg(sum(c1)) over (partition by c2) from 
> `tblWnulls.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
> mappings beyond
> Fragment 0:0
> [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> From drillbit.log
> {noformat}
> 2016-07-21 11:19:27,778 [286f503f-9b20-87e3-d7ec-2d3881f29e4a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 286f503f-9b20-87e3-d7ec-2d3881f29e4a: select avg(sum(c1)) over (partition by 
> c2) from `tblWnulls.parquet`
> ...
> 2016-07-21 11:19:27,979 [286f503f-9b20-87e3-d7ec-2d3881f29e4a:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> This generator does not support mappings beyond
> Fragment 0:0
> [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: This generator does not support mappings beyond
> Fragment 0:0
> [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: java.lang.IllegalStateException: This generator does not support 
> mappings beyond
> at 
> org.apache.drill.exec.compile.sig.MappingSet.enterChild(MappingSet.java:102) 
> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression(EvaluationVisitor.java:188)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression(EvaluationVisitor.java:1077)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression(EvaluationVisitor.java:815)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression(EvaluationVisitor.java:795)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.expression.FunctionHolderExpression.accept(FunctionHolderExpression.java:47)
>  ~[drill-logical-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitValueVectorWriteExpression(EvaluationVisitor.java:359)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> 

[jira] [Commented] (DRILL-4795) Nested aggregate windowed query fails - IllegalStateException

2016-08-08 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412798#comment-15412798
 ] 

Gautam Kumar Parai commented on DRILL-4795:
---

I have created a pull request (https://github.com/apache/drill/pull/563). 
[~jni] [~amansinha100] can you please review it? Thanks!

> Nested aggregate windowed query fails - IllegalStateException 
> --
>
> Key: DRILL-4795
> URL: https://issues.apache.org/jira/browse/DRILL-4795
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node CentOS cluster
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>Priority: Critical
> Attachments: tblWnulls.parquet
>
>
> The below two window function queries fail on MapR Drill 1.8.0 commit ID 
> 34ca63ba
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select avg(sum(c1)) over (partition by c1) from 
> `tblWnulls.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
> mappings beyond
> Fragment 0:0
> [Error Id: b32ed6b0-6b81-4d5f-bce0-e4ea269c5af1 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select avg(sum(c1)) over (partition by c2) from 
> `tblWnulls.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
> mappings beyond
> Fragment 0:0
> [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> From drillbit.log
> {noformat}
> 2016-07-21 11:19:27,778 [286f503f-9b20-87e3-d7ec-2d3881f29e4a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 286f503f-9b20-87e3-d7ec-2d3881f29e4a: select avg(sum(c1)) over (partition by 
> c2) from `tblWnulls.parquet`
> ...
> 2016-07-21 11:19:27,979 [286f503f-9b20-87e3-d7ec-2d3881f29e4a:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> This generator does not support mappings beyond
> Fragment 0:0
> [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: This generator does not support mappings beyond
> Fragment 0:0
> [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: java.lang.IllegalStateException: This generator does not support 
> mappings beyond
> at 
> org.apache.drill.exec.compile.sig.MappingSet.enterChild(MappingSet.java:102) 
> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression(EvaluationVisitor.java:188)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression(EvaluationVisitor.java:1077)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression(EvaluationVisitor.java:815)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression(EvaluationVisitor.java:795)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.expression.FunctionHolderExpression.accept(FunctionHolderExpression.java:47)
>  ~[drill-logical-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitValueVectorWriteExpression(EvaluationVisitor.java:359)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> 

[jira] [Commented] (DRILL-4469) SUM window query returns incorrect results over integer data

2016-08-08 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412145#comment-15412145
 ] 

Gautam Kumar Parai commented on DRILL-4469:
---

[~zfong] No, I do not think so. Those address validity checks i.e. failing 
invalid queries during compilation. However, this looks like wrong results for 
a valid query.

> SUM window query returns incorrect results over integer data
> 
>
> Key: DRILL-4469
> URL: https://issues.apache.org/jira/browse/DRILL-4469
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.6.0
> Environment: 4 node CentOS cluster
>Reporter: Khurram Faraaz
>Priority: Critical
>  Labels: window_function
> Attachments: t_alltype.csv, t_alltype.parquet
>
>
> SUM window query returns incorrect results as compared to Postgres, with or 
> without the frame clause in the window definition. Note that there is a sub 
> query involved and data in column c1 is sorted integer data with no nulls.
> Drill 1.6.0 commit ID: 6d5f4983
> Results from Drill 1.6.0
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT SUM(c1) OVER w FROM (select * from 
> dfs.tmp.`t_alltype`) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1 RANGE 
> BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
> +-+
> | EXPR$0  |
> +-+
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> ...
> | 10585  |
> | 10585  |
> | 10585  |
> ++
> 145 rows selected (0.257 seconds)
> {noformat}
> results from Postgres 9.3
> {noformat}
> postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW 
> w AS (PARTITION BY c8 ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND 
> UNBOUNDED FOLLOWING);
>  sum
> --
>  4499
>  4499
>  4499
>  4499
>  4499
>  4499
> ...
>  5613
>  5613
>  5613
>   473
>   473
>   473
>   473
>   473
> (145 rows)
> {noformat}
> Removing the frame clause from window definition, still results in completely 
> different results on Postgres vs Drill
> Results from Drill 1.6.0
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp>SELECT SUM(c1) OVER w FROM (select * from 
> t_alltype) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1);
> +-+
> | EXPR$0  |
> +-+
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> | 10585   |
> ...
> | 10585  |
> | 10585  |
> | 10585  |
> | 10585  |
> | 10585  |
> ++
> 145 rows selected (0.28 seconds)
> {noformat}
> Results from Postgres
> {noformat}
> postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW 
> w AS (PARTITION BY c8 ORDER BY c1);
>  sum
> --
> 5
>12
>21
>33
>47
>62
>78
>96
>   115
>   135
>   158
>   182
>   207
>   233
>   260
>   289
> ...
> 4914
>  5051
>  5189
>  5328
>  5470
>  5613
> 8
>70
>   198
>   332
>   473
> (145 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4806) need a better error message

2016-07-28 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398262#comment-15398262
 ] 

Gautam Kumar Parai commented on DRILL-4806:
---

This issue seems unrelated to nested aggregates. We will continue work on this 
in the future. 

> need a better error message 
> 
>
> Key: DRILL-4806
> URL: https://issues.apache.org/jira/browse/DRILL-4806
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>Priority: Minor
>  Labels: window_function
>
> Need a better error message, column c2 is of type CHAR.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT MAX(AVG(c2)) OVER ( PARTITION BY c2 
> ORDER BY c2 ), c2 FROM `tblWnulls.parquet` GROUP BY c2;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: 076464ff-7385-4bee-9704-38dec781af32 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4806) need a better error message

2016-07-28 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-4806:
--
Assignee: (was: Gautam Kumar Parai)

> need a better error message 
> 
>
> Key: DRILL-4806
> URL: https://issues.apache.org/jira/browse/DRILL-4806
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
>Reporter: Khurram Faraaz
>Priority: Minor
>  Labels: window_function
>
> Need a better error message, column c2 is of type CHAR.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT MAX(AVG(c2)) OVER ( PARTITION BY c2 
> ORDER BY c2 ), c2 FROM `tblWnulls.parquet` GROUP BY c2;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: 076464ff-7385-4bee-9704-38dec781af32 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4806) need a better error message

2016-07-27 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396815#comment-15396815
 ] 

Gautam Kumar Parai edited comment on DRILL-4806 at 7/28/16 2:47 AM:


[~khfaraaz] This can be reduced to a simple testcase which seems unrelated to 
Drill-2330. Can you please confirm?

{code}select avg(first_name) from cp.`employee.json`;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:
Error in expression at index -1.  Error: Missing function implementation: 
[castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
Fragment 0:0
[Error Id: c36a67a8-c45e-4844-b988-c767316ea834 on 10.250.50.33:31010] 
(state=,code=0)
{code}

[~amansinha100] Can you please suggest if an error should be thrown by Calcite 
during planning instead of Drill?



was (Author: gparai):
[~khfaraaz] This can be reduced to a simple testcase which seems unrelated to 
Drill-2330. Can you please confirm?

{code}select avg(first_name) from cp.`employee.json`;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:
Error in expression at index -1.  Error: Missing function implementation: 
[castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
Fragment 0:0
[Error Id: c36a67a8-c45e-4844-b988-c767316ea834 on 10.250.50.33:31010] 
(state=,code=0)
{code}

> need a better error message 
> 
>
> Key: DRILL-4806
> URL: https://issues.apache.org/jira/browse/DRILL-4806
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>Priority: Minor
>  Labels: window_function
>
> Need a better error message, column c2 is of type CHAR.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT MAX(AVG(c2)) OVER ( PARTITION BY c2 
> ORDER BY c2 ), c2 FROM `tblWnulls.parquet` GROUP BY c2;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: 076464ff-7385-4bee-9704-38dec781af32 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4806) need a better error message

2016-07-27 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396815#comment-15396815
 ] 

Gautam Kumar Parai commented on DRILL-4806:
---

[~khfaraaz] This can be reduced to a simple testcase which seems unrelated to 
Drill-2330. Can you please confirm?

{code}select avg(first_name) from cp.`employee.json`;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:
Error in expression at index -1.  Error: Missing function implementation: 
[castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
Fragment 0:0
[Error Id: c36a67a8-c45e-4844-b988-c767316ea834 on 10.250.50.33:31010] 
(state=,code=0)
{code}

> need a better error message 
> 
>
> Key: DRILL-4806
> URL: https://issues.apache.org/jira/browse/DRILL-4806
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>Priority: Minor
>  Labels: window_function
>
> Need a better error message, column c2 is of type CHAR.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT MAX(AVG(c2)) OVER ( PARTITION BY c2 
> ORDER BY c2 ), c2 FROM `tblWnulls.parquet` GROUP BY c2;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: 076464ff-7385-4bee-9704-38dec781af32 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4797) Partition by aggregate function in a window query results in IllegalStateException

2016-07-27 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396800#comment-15396800
 ] 

Gautam Kumar Parai edited comment on DRILL-4797 at 7/28/16 2:27 AM:


[~khfaraaz] Is this a Drill-2330 specific issue? I see the same for window 
aggregates 
{code}select avg(l_quantity) over (partition by min(l_quantity)) from 
cp.`tpch/lineitem.parquet`; 
Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
mappings beyond{code}
Can you please confirm?


was (Author: gparai):
[~khfaraaz] Is this Drill-2330 specific issue? I see the same for a window 
aggregates 
{code}select avg(l_quantity) over (partition by min(l_quantity)) from 
cp.`tpch/lineitem.parquet`; 
Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
mappings beyond{code}
Can you please confirm?

> Partition by aggregate function in a window query results in 
> IllegalStateException
> --
>
> Key: DRILL-4797
> URL: https://issues.apache.org/jira/browse/DRILL-4797
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>
> Use of aggregate function in the partitioning column in a windowed query 
> results in an IllegalStateException
> {noformat}
> 0: jdbc:drill:zk=local> select avg(sum(l_quantity)) over (partition by 
> min(l_quantity)) from cp.`tpch/lineitem.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
> mappings beyond
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4797) Partition by aggregate function in a window query results in IllegalStateException

2016-07-27 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396800#comment-15396800
 ] 

Gautam Kumar Parai commented on DRILL-4797:
---

[~khfaraaz] Is this Drill-2330 specific issue? I see the same for a window 
aggregates 
{code}select avg(l_quantity) over (partition by min(l_quantity)) from 
cp.`tpch/lineitem.parquet`; 
Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
mappings beyond{code}
Can you please confirm?

> Partition by aggregate function in a window query results in 
> IllegalStateException
> --
>
> Key: DRILL-4797
> URL: https://issues.apache.org/jira/browse/DRILL-4797
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>
> Use of aggregate function in the partitioning column in a windowed query 
> results in an IllegalStateException
> {noformat}
> 0: jdbc:drill:zk=local> select avg(sum(l_quantity)) over (partition by 
> min(l_quantity)) from cp.`tpch/lineitem.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
> mappings beyond
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4808) CTE query with window function results in AssertionError

2016-07-27 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-4808:
--
Assignee: (was: Gautam Kumar Parai)

> CTE query with window function results in AssertionError
> 
>
> Key: DRILL-4808
> URL: https://issues.apache.org/jira/browse/DRILL-4808
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Khurram Faraaz
>  Labels: window_function
>
> Below query that uses CTE and window functions results in AssertionError
> Same query over same data works on Postgres.
> MapR Drill 1.8.0 commit ID : 34ca63ba
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> WITH v1 ( a, b, c, d ) AS
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > SELECT col0, col8, MAX(MIN(col8)) over 
> (partition by col7 order by col8) as max_col8, col7 from 
> `allTypsUniq.parquet` GROUP BY col0,col7,col8
> . . . . . . . . . . . . . . > )
> . . . . . . . . . . . . . . > select * from ( select a, b, c, d from v1 where 
> c > 'IN' GROUP BY a,b,c,d order by a,b,c,d);
> Error: SYSTEM ERROR: AssertionError: Internal error: Type 'RecordType(ANY 
> col0, ANY col8, ANY max_col8, ANY col7)' has no field 'a'
> [Error Id: 5c058176-741a-42cd-8433-0cd81115776b on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log for above failing query
> {noformat}
> 2016-07-26 16:57:04,627 [2868699e-ae56-66f4-9439-8db2132ef265:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 2868699e-ae56-66f4-9439-8db2132ef265: WITH v1 ( a, b, c, d ) AS
> (
> SELECT col0, col8, MAX(MIN(col8)) over (partition by col7 order by col8) 
> as max_col8, col7 from `allTypsUniq.parquet` GROUP BY col0,col7,col8
> )
> select * from ( select a, b, c, d from v1 where c > 'IN' GROUP BY a,b,c,d 
> order by a,b,c,d)
> 2016-07-26 16:57:04,666 [2868699e-ae56-66f4-9439-8db2132ef265:foreman] ERROR 
> o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: AssertionError: Internal 
> error: Type 'RecordType(ANY col0, ANY col8, ANY max_col8, ANY col7)' has no 
> field 'a'
> [Error Id: 5c058176-741a-42cd-8433-0cd81115776b on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> AssertionError: Internal error: Type 'RecordType(ANY col0, ANY col8, ANY 
> max_col8, ANY col7)' has no field 'a'
> [Error Id: 5c058176-741a-42cd-8433-0cd81115776b on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:791)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:901) 
> [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:271) 
> [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: Internal error: Type 
> 'RecordType(ANY col0, ANY col8, ANY max_col8, ANY col7)' has no field 'a'
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: Internal error: Type 'RecordType(ANY 
> col0, ANY col8, ANY max_col8, ANY col7)' has no field 'a'
> at org.apache.calcite.util.Util.newInternal(Util.java:777) 
> ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:167) 
> ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3225)
>  ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.access$1500(SqlToRelConverter.java:185)
>  ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4181)
>  ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:3603)
>  ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
> at 
> org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:274) 
> 

[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-27 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-4743:
--
Description: 
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the following options. The 
selectivity can be varied between 0 and 1 with min selectivity always less than 
or equal to max selectivity.
{code}planner.filter.min_selectivity_estimate_factor 
planner.filter.max_selectivity_estimate_factor 
{code} 

When using 'explain plan including all attributes for ' it should cap the 
estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
downstream is not directly controlled by these options. However, they may 
change as a result of dependency between different operators. The FILTER 
operator only operates on the input of its immediate upstream operator (e.g. 
SCAN, AGG). If two different filters are present in the same plan, they might 
have different selectivities based on their immediate upstream operators 
ROWCOUNT.

  was:
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the following options. The 
selectivity can be varied between 0 and 1 with min selectivity always less than 
or equal to max selectivity.
{code}planner.filter.min_selectivity_estimate_factor 
planner.filter.max_selectivity_estimate_factor 
{code} 

When using 'explain plan including all attributes for ' it should cap the 
estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
downstream is not directly controlled by these options. However, they may 
change as a result of dependency between different operators.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.
> For now, the fix is to provide options for controlling the lower and upper 
> bounds for filter selectivity. The user can use the following options. The 
> selectivity can be varied between 0 and 1 with min selectivity always less 
> than or equal to max selectivity.
> {code}planner.filter.min_selectivity_estimate_factor 
> planner.filter.max_selectivity_estimate_factor 
> {code} 
> When using 'explain plan including all attributes for ' it should cap the 
> estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
> downstream is not directly controlled by these options. However, they may 
> change as a result of dependency between different operators. The FILTER 
> operator only operates on the input of its immediate upstream operator (e.g. 
> SCAN, AGG). If two different filters are present in the same plan, they might 
> have different selectivities based on their immediate upstream operators 
> ROWCOUNT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable

2016-07-26 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394980#comment-15394980
 ] 

Gautam Kumar Parai commented on DRILL-3710:
---

[~amansinha100]] I have updated the pull request 
(https://github.com/apache/drill/pull/552) to account for latest Calcite 
changes. Can you please take a look? Thanks!

> Make the 20 in-list optimization configurable
> -
>
> Key: DRILL-3710
> URL: https://issues.apache.org/jira/browse/DRILL-3710
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.1.0
>Reporter: Hao Zhu
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
>
> If Drill has more than 20 in-lists , Drill can do an optimization to convert 
> that in-lists into a small hash table in memory, and then do a table join 
> instead.
> This can improve the performance of the query which has many in-lists.
> Could we make "20" configurable? So that we do not need to add duplicate/junk 
> in-list to make it more than 20.
> Sample query is :
> select count(*) from table where col in 
> (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4806) need a better error message

2016-07-26 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai reassigned DRILL-4806:
-

Assignee: Gautam Kumar Parai

> need a better error message 
> 
>
> Key: DRILL-4806
> URL: https://issues.apache.org/jira/browse/DRILL-4806
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>Priority: Minor
>  Labels: window_function
>
> Need a better error message, column c2 is of type CHAR.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT MAX(AVG(c2)) OVER ( PARTITION BY c2 
> ORDER BY c2 ), c2 FROM `tblWnulls.parquet` GROUP BY c2;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: 076464ff-7385-4bee-9704-38dec781af32 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-25 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-4743:
--
Description: 
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the following options. The 
selectivity can be varied between 0 and 1 with min selectivity always less than 
or equal to max selectivity.
{code}planner.filter.min_selectivity_estimate_factor 
planner.filter.max_selectivity_estimate_factor 
{code} 

When using 'explain plan including all attributes for ' it should cap the 
estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
downstream is not directly controlled by these options. However, they may 
change as a result of dependency between different operators.

  was:
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the following options. The 
selectivity can be varied between 0 and 1 with min selectivity always less than 
or equal to max selectivity.
{code} planner.filter.min_selectivity_estimate_factor 
planner.filter.max_selectivity_estimate_factor 
{code} 

When using 'explain plan including all attributes for ' it should cap the 
estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
downstream is not directly controlled by these options. However, they may 
change as a result of dependency between different operators.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.
> For now, the fix is to provide options for controlling the lower and upper 
> bounds for filter selectivity. The user can use the following options. The 
> selectivity can be varied between 0 and 1 with min selectivity always less 
> than or equal to max selectivity.
> {code}planner.filter.min_selectivity_estimate_factor 
> planner.filter.max_selectivity_estimate_factor 
> {code} 
> When using 'explain plan including all attributes for ' it should cap the 
> estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
> downstream is not directly controlled by these options. However, they may 
> change as a result of dependency between different operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-25 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-4743:
--
Description: 
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the following options. The 
selectivity can be varied between 0 and 1 with min selectivity always less than 
or equal to max selectivity.
{code} planner.filter.min_selectivity_estimate_factor 
planner.filter.max_selectivity_estimate_factor 
{code} 

When using 'explain plan including all attributes for ' it should cap the 
estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
downstream is not directly controlled by these options. However, they may 
change as a result of dependency between different operators.

  was:
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the following options. The 
selectivity can be varied between 0 and 1 with min selectivity always less than 
or equal to max selectivity.
{code} 
planner.filter.min_selectivity_estimate_factor 
planner.filter.max_selectivity_estimate_factor 
{code} 

When using 'explain plan including all attributes for ' it should cap the 
estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
downstream is not directly controlled by these options. However, they may 
change as a result of dependency between different operators.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.
> For now, the fix is to provide options for controlling the lower and upper 
> bounds for filter selectivity. The user can use the following options. The 
> selectivity can be varied between 0 and 1 with min selectivity always less 
> than or equal to max selectivity.
> {code} planner.filter.min_selectivity_estimate_factor 
> planner.filter.max_selectivity_estimate_factor 
> {code} 
> When using 'explain plan including all attributes for ' it should cap the 
> estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
> downstream is not directly controlled by these options. However, they may 
> change as a result of dependency between different operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-25 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-4743:
--
Description: 
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the following options. The 
selectivity can be varied between 0 and 1 with min selectivity always less than 
or equal to max selectivity.
{code} 
planner.filter.min_selectivity_estimate_factor 
planner.filter.max_selectivity_estimate_factor 
{code} 

When using 'explain plan including all attributes for ' it should cap the 
estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
downstream is not directly controlled by these options. However, they may 
change as a result of dependency between different operators.

  was:
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the options
{code} 
planner.filter.min_selectivity_estimate_factor 
{code} 


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.
> For now, the fix is to provide options for controlling the lower and upper 
> bounds for filter selectivity. The user can use the following options. The 
> selectivity can be varied between 0 and 1 with min selectivity always less 
> than or equal to max selectivity.
> {code} 
> planner.filter.min_selectivity_estimate_factor 
> planner.filter.max_selectivity_estimate_factor 
> {code} 
> When using 'explain plan including all attributes for ' it should cap the 
> estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
> downstream is not directly controlled by these options. However, they may 
> change as a result of dependency between different operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-25 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-4743:
--
Description: 
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the options
{code} 
planner.filter.min_selectivity_estimate_factor 
{code} 

  was:
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.
> For now, the fix is to provide options for controlling the lower and upper 
> bounds for filter selectivity. The user can use the options
> {code} 
> planner.filter.min_selectivity_estimate_factor 
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable

2016-07-22 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390107#comment-15390107
 ] 

Gautam Kumar Parai commented on DRILL-3710:
---

I have updated the pull request (https://github.com/apache/drill/pull/552) 
based on your comments [~sudheeshkatkam][~amansinha100]. Can you please take a 
look? Thanks!

> Make the 20 in-list optimization configurable
> -
>
> Key: DRILL-3710
> URL: https://issues.apache.org/jira/browse/DRILL-3710
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.1.0
>Reporter: Hao Zhu
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
>
> If Drill has more than 20 in-lists , Drill can do an optimization to convert 
> that in-lists into a small hash table in memory, and then do a table join 
> instead.
> This can improve the performance of the query which has many in-lists.
> Could we make "20" configurable? So that we do not need to add duplicate/junk 
> in-list to make it more than 20.
> Sample query is :
> select count(*) from table where col in 
> (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-22 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389157#comment-15389157
 ] 

Gautam Kumar Parai commented on DRILL-4743:
---

I have updated the pull request (https://github.com/apache/drill/pull/534). 
[~amansinha100] [~sudheeshkatkam] can you please take a look? Thanks!

> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable

2016-07-22 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389127#comment-15389127
 ] 

Gautam Kumar Parai commented on DRILL-3710:
---

I have created the pull request (https://github.com/apache/drill/pull/552). 
[~jni] [~amansinha100] can you please take a look? Thanks!

> Make the 20 in-list optimization configurable
> -
>
> Key: DRILL-3710
> URL: https://issues.apache.org/jira/browse/DRILL-3710
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.1.0
>Reporter: Hao Zhu
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
>
> If Drill has more than 20 in-lists , Drill can do an optimization to convert 
> that in-lists into a small hash table in memory, and then do a table join 
> instead.
> This can improve the performance of the query which has many in-lists.
> Could we make "20" configurable? So that we do not need to add duplicate/junk 
> in-list to make it more than 20.
> Sample query is :
> select count(*) from table where col in 
> (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4795) Nested aggregate windowed query fails - IllegalStateException

2016-07-21 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388181#comment-15388181
 ] 

Gautam Kumar Parai commented on DRILL-4795:
---

Yes, we should group by on the partitioning column

> Nested aggregate windowed query fails - IllegalStateException 
> --
>
> Key: DRILL-4795
> URL: https://issues.apache.org/jira/browse/DRILL-4795
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node CentOS cluster
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>Priority: Critical
> Attachments: tblWnulls.parquet
>
>
> The below two window function queries fail on MapR Drill 1.8.0 commit ID 
> 34ca63ba
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select avg(sum(c1)) over (partition by c1) from 
> `tblWnulls.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
> mappings beyond
> Fragment 0:0
> [Error Id: b32ed6b0-6b81-4d5f-bce0-e4ea269c5af1 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select avg(sum(c1)) over (partition by c2) from 
> `tblWnulls.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
> mappings beyond
> Fragment 0:0
> [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> From drillbit.log
> {noformat}
> 2016-07-21 11:19:27,778 [286f503f-9b20-87e3-d7ec-2d3881f29e4a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 286f503f-9b20-87e3-d7ec-2d3881f29e4a: select avg(sum(c1)) over (partition by 
> c2) from `tblWnulls.parquet`
> ...
> 2016-07-21 11:19:27,979 [286f503f-9b20-87e3-d7ec-2d3881f29e4a:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> This generator does not support mappings beyond
> Fragment 0:0
> [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: This generator does not support mappings beyond
> Fragment 0:0
> [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: java.lang.IllegalStateException: This generator does not support 
> mappings beyond
> at 
> org.apache.drill.exec.compile.sig.MappingSet.enterChild(MappingSet.java:102) 
> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression(EvaluationVisitor.java:188)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression(EvaluationVisitor.java:1077)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression(EvaluationVisitor.java:815)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression(EvaluationVisitor.java:795)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.expression.FunctionHolderExpression.accept(FunctionHolderExpression.java:47)
>  ~[drill-logical-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitValueVectorWriteExpression(EvaluationVisitor.java:359)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> 

[jira] [Assigned] (DRILL-3710) Make the 20 in-list optimization configurable

2016-07-19 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai reassigned DRILL-3710:
-

Assignee: Gautam Kumar Parai

> Make the 20 in-list optimization configurable
> -
>
> Key: DRILL-3710
> URL: https://issues.apache.org/jira/browse/DRILL-3710
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.1.0
>Reporter: Hao Zhu
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
>
> If Drill has more than 20 in-lists , Drill can do an optimization to convert 
> that in-lists into a small hash table in memory, and then do a table join 
> instead.
> This can improve the performance of the query which has many in-lists.
> Could we make "20" configurable? So that we do not need to add duplicate/junk 
> in-list to make it more than 20.
> Sample query is :
> select count(*) from table where col in 
> (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4789) In-list to join optimization should have configurable in-list size

2016-07-19 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai closed DRILL-4789.
-
Resolution: Duplicate

> In-list to join optimization should have configurable in-list size 
> ---
>
> Key: DRILL-4789
> URL: https://issues.apache.org/jira/browse/DRILL-4789
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>
> We current have a default in-list size of 20. Instead of the magic number 20, 
> we should make this configurable.
> {code}
> select count * from table where col in 
> (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4789) In-list to join optimization should have configurable in-list size

2016-07-19 Thread Gautam Kumar Parai (JIRA)
Gautam Kumar Parai created DRILL-4789:
-

 Summary: In-list to join optimization should have configurable 
in-list size 
 Key: DRILL-4789
 URL: https://issues.apache.org/jira/browse/DRILL-4789
 Project: Apache Drill
  Issue Type: Bug
Reporter: Gautam Kumar Parai
Assignee: Gautam Kumar Parai


We current have a default in-list size of 20. Instead of the magic number 20, 
we should make this configurable.

{code}
select count * from table where col in 
(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1328) Support table statistics

2016-07-18 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383080#comment-15383080
 ] 

Gautam Kumar Parai commented on DRILL-1328:
---

I have uploaded a new design specification based on the original design. It 
aims to address some concerns in the original design. Please review and provide 
feedback. Thanks!

> Support table statistics
> 
>
> Key: DRILL-1328
> URL: https://issues.apache.org/jira/browse/DRILL-1328
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Cliff Buchanan
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
> Attachments: 0001-PRE-Set-value-count-in-splitAndTransfer.patch
>
>
> This consists of several subtasks
> * implement operators to generate statistics
> * add "analyze table" support to parser/planner
> * create a metadata provider to allow statistics to be used by optiq in 
> planning optimization
> * implement statistics functions
> Right now, the bulk of this functionality is implemented, but it hasn't been 
> rigorously tested and needs to have some definite answers for some of the 
> parts "around the edges" (how analyze table figures out where the table 
> statistics are located, how a table "append" should work in a read only file 
> system)
> Also, here are a few known caveats:
> * table statistics are collected by creating a sql query based on the string 
> path of the table. This should probably be done with a Table reference.
> * Case sensitivity for column statistics is probably iffy
> * Math for combining two column NDVs into a joint NDV should be checked.
> * Schema changes aren't really being considered yet.
> * adding getDrillTable is probably unnecessary; it might be better to do 
> getTable().unwrap(DrillTable.class)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2330) Add support for nested aggregate expressions for window aggregates

2016-07-14 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378326#comment-15378326
 ] 

Gautam Kumar Parai commented on DRILL-2330:
---

I have updated the pull request (https://github.com/apache/drill/pull/529). 
[~amansinha100] can you please take a look?

> Add support for nested aggregate expressions for window aggregates
> --
>
> Key: DRILL-2330
> URL: https://issues.apache.org/jira/browse/DRILL-2330
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 0.8.0
>Reporter: Abhishek Girish
>Assignee: Gautam Kumar Parai
> Fix For: Future
>
> Attachments: drillbit.log
>
>
> Aggregate expressions currently cannot be nested. 
> *The following query fails to validate:*
> {code:sql}
> select avg(sum(i_item_sk)) from item;
> {code}
> Error:
> Query failed: SqlValidatorException: Aggregate expressions cannot be nested
> Log attached. 
> Reference: TPCDS queries (20, 63, 98, ...) fail to execute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-12 Thread Gautam Kumar Parai (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373539#comment-15373539
 ] 

Gautam Kumar Parai commented on DRILL-4743:
---

[~amansinha100]] I have updated the pull request 
(https://github.com/apache/drill/pull/534). Please take a look.

> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >