[jira] [Created] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects
Gautam Kumar Parai created DRILL-6589: - Summary: Push transitive closure generated predicates past aggregates/projects Key: DRILL-6589 URL: https://issues.apache.org/jira/browse/DRILL-6589 Project: Apache Drill Issue Type: Bug Affects Versions: 1.13.0 Reporter: Gautam Kumar Parai Assignee: Gautam Kumar Parai Fix For: 1.14.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects
[ https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-6589: -- Description: Here is a sample query that may benefit from this optimization: SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); Here the transitive predicate a2 = 5 would be pushed past the aggregate due to this optimization. > Push transitive closure generated predicates past aggregates/projects > - > > Key: DRILL-6589 > URL: https://issues.apache.org/jira/browse/DRILL-6589 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai >Priority: Major > Fix For: 1.14.0 > > > Here is a sample query that may benefit from this optimization: > SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); > Here the transitive predicate a2 = 5 would be pushed past the aggregate due > to this optimization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6589) Push transitive closure generated predicates past aggregates
[ https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-6589: -- Summary: Push transitive closure generated predicates past aggregates (was: Push transitive closure generated predicates past aggregates/projects) > Push transitive closure generated predicates past aggregates > > > Key: DRILL-6589 > URL: https://issues.apache.org/jira/browse/DRILL-6589 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.15.0 > > > Here is a sample query that may benefit from this optimization: > SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); > Here the transitive predicate a2 = 5 would be pushed past the aggregate due > to this optimization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6552) Drill Metadata management "Drill MetaStore"
[ https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621266#comment-16621266 ] Gautam Kumar Parai commented on DRILL-6552: --- I would like to mention that two-phase aggregation along with custom operators for computing statistics (instead of e.g. count(*)) was done as part of DRILL-1328 similar to the approach suggested by [~okalinin]. However, the perf numbers were nowhere near earth-shattering :( The future improvements were identified as either have a multi-phase agg approach OR use sampling in order to speed it up further. Another option would be to re-visit the code to see if we can speed up the existing implementation further. [~paul-rogers] had reviewed the code at the time - he is certainly a ton more versed with execution efficiency than I am. Any suggestions Paul and others? [~vitalii] in addition to the metadata-at-scale problem we should also consider the functional completeness. For performance benchmarks like TPC-H/TPCH-DS, we had identified histograms as critical for improving planning. Last time when you and [~vvysotskyi] had presented the proposal, it seemed like another limitation of HMS would be the inability to store histograms. Do you have a proposal or workaround for handling histograms - or is it not feasible at all? > Drill Metadata management "Drill MetaStore" > --- > > Key: DRILL-6552 > URL: https://issues.apache.org/jira/browse/DRILL-6552 > Project: Apache Drill > Issue Type: New Feature > Components: Metadata >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 2.0.0 > > > It would be useful for Drill to have some sort of metastore which would > enable Drill to remember previously defined schemata so Drill doesn’t have to > do the same work over and over again. > It allows to store schema and statistics, which will allow to accelerate > queries validation, planning and execution time. Also it increases stability > of Drill and allows to avoid different kind if issues: "schema change > Exceptions", "limit 0" optimization and so on. > One of the main candidates is Hive Metastore. > Starting from 3.0 version Hive Metastore can be the separate service from > Hive server: > [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration] > Optional enhancement is storing Drill's profiles, UDFs, plugins configs in > some kind of metastore as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-5794) Projection pushdown does not preserve collation
Gautam Kumar Parai created DRILL-5794: - Summary: Projection pushdown does not preserve collation Key: DRILL-5794 URL: https://issues.apache.org/jira/browse/DRILL-5794 Project: Apache Drill Issue Type: Bug Affects Versions: 1.11.0 Reporter: Gautam Kumar Parai Assignee: Gautam Kumar Parai While look at the projection pushdown into scan rule in Drill it seems like we do not consider changes to collation. This would happen in general and not just for the projection pushdown across other rels. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5853) Sort removal based on NULL direction
Gautam Kumar Parai created DRILL-5853: - Summary: Sort removal based on NULL direction Key: DRILL-5853 URL: https://issues.apache.org/jira/browse/DRILL-5853 Project: Apache Drill Issue Type: Bug Reporter: Gautam Kumar Parai Assignee: Gautam Kumar Parai Calcite bugs fixes 969, 970 should be pulled in Drill to correctly apply NULL direction for sort-removal. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-4929) Drill unable to propagate selectivity/distinctrowcount through RelSubset
Gautam Kumar Parai created DRILL-4929: - Summary: Drill unable to propagate selectivity/distinctrowcount through RelSubset Key: DRILL-4929 URL: https://issues.apache.org/jira/browse/DRILL-4929 Project: Apache Drill Issue Type: Improvement Reporter: Gautam Kumar Parai Assignee: Gautam Kumar Parai Drill only has access to the best alternative plan. Calcite needs to expose the set.rel within RelSubset which can be utilized during Drill logical planning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4862) wrong results - use of convert_from(binary_string(key),'UTF8') in filter results in wrong results
[ https://issues.apache.org/jira/browse/DRILL-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai reassigned DRILL-4862: - Assignee: Chunhui Shi (was: Gautam Kumar Parai) +1. The changes look good. > wrong results - use of convert_from(binary_string(key),'UTF8') in filter > results in wrong results > - > > Key: DRILL-4862 > URL: https://issues.apache.org/jira/browse/DRILL-4862 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.8.0 > Environment: 4 node cluster CentOS >Reporter: Khurram Faraaz >Assignee: Chunhui Shi > > These results do not look right, i.e when the predicate has > convert_from(binary_string(key),'UTF8') > Apache drill 1.8.0-SNAPSHOT git commit ID: 57dc9f43 > {noformat} > [root@centos-0x drill4478]# cat f1.json > {"key":"\\x30\\x31\\x32\\x33"} > {"key":"\\x34\\x35\\x36\\x37"} > {"key":"\\x38\\x39\\x30\\x31"} > {"key":"\\x30\\x30\\x30\\x30"} > {"key":"\\x31\\x31\\x31\\x31"} > {"key":"\\x35\\x35\\x35\\x35"} > {"key":"\\x38\\x38\\x38\\x38"} > {"key":"\\x39\\x39\\x39\\x39"} > {"key":"\\x41\\x42\\x43\\x44"} > {"key":"\\x45\\x46\\x47\\x48"} > {"key":"\\x49\\x41\\x44\\x46"} > {"key":"\\x4a\\x4b\\x4c\\x4d"} > {"key":"\\x57\\x58\\x59\\x5a"} > {"key":"\\x4e\\x4f\\x50\\x51"} > {"key":"\\x46\\x46\\x46\\x46"} > {noformat} > results without the predicate - these are correct results > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') > from `f1.json`; > +-+ > | EXPR$0 | > +-+ > | 0123| > | 4567| > | 8901| > | | > | | > | | > | | > | | > | ABCD| > | EFGH| > | IADF| > | JKLM| > | WXYZ| > | NOPQ| > | | > +-+ > 15 rows selected (0.256 seconds) > {noformat} > results with a predicate - these results don't look correct > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') > from `f1.json` where convert_from(binary_string(key),'UTF8') is not null; > +--+ > | EXPR$0 | > +--+ > | 0123123 | > | 4567567 | > | 8901901 | > | 000 | > | 111 | > | 555 | > | 888 | > | 999 | > | ABCDBCD | > | EFGHFGH | > | IADFADF | > | JKLMKLM | > | WXYZXYZ | > | NOPQOPQ | > | FFF | > +--+ > 15 rows selected (0.279 seconds) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1328) Support table statistics
[ https://issues.apache.org/jira/browse/DRILL-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563926#comment-15563926 ] Gautam Kumar Parai commented on DRILL-1328: --- I have created a new PR to address the review comments. [~amansinha100] can you please review the PR? Thanks! > Support table statistics > > > Key: DRILL-1328 > URL: https://issues.apache.org/jira/browse/DRILL-1328 > Project: Apache Drill > Issue Type: Improvement >Reporter: Cliff Buchanan >Assignee: Gautam Kumar Parai > Fix For: Future > > Attachments: 0001-PRE-Set-value-count-in-splitAndTransfer.patch > > > This consists of several subtasks > * implement operators to generate statistics > * add "analyze table" support to parser/planner > * create a metadata provider to allow statistics to be used by optiq in > planning optimization > * implement statistics functions > Right now, the bulk of this functionality is implemented, but it hasn't been > rigorously tested and needs to have some definite answers for some of the > parts "around the edges" (how analyze table figures out where the table > statistics are located, how a table "append" should work in a read only file > system) > Also, here are a few known caveats: > * table statistics are collected by creating a sql query based on the string > path of the table. This should probably be done with a Table reference. > * Case sensitivity for column statistics is probably iffy > * Math for combining two column NDVs into a joint NDV should be checked. > * Schema changes aren't really being considered yet. > * adding getDrillTable is probably unnecessary; it might be better to do > getTable().unwrap(DrillTable.class) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4864) Add ANSI format for date/time functions
[ https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-4864: -- Assignee: Serhii Harnyk (was: Gautam Kumar Parai) > Add ANSI format for date/time functions > --- > > Key: DRILL-4864 > URL: https://issues.apache.org/jira/browse/DRILL-4864 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Serhii Harnyk >Assignee: Serhii Harnyk > Labels: doc-impacting > Fix For: 1.9.0 > > > The TO_DATE() is exposing the Joda string formatting conventions into the SQL > layer. This is not following SQL conventions used by ANSI and many other > database engines on the market. > Add new UDF "ansi_to_joda(string)", that takes string that represents ANSI > datetime format and returns string that represents equal Joda format. > Add new session option "drill.exec.fn.to_date_format" that can be one of two > values - "JODA"(default) and "ANSI". > If option is set to "JODA" queries with to_date() function would work in > usual way. > If option is set to "ANSI" second argument would be wrapped with > ansi_to_joda() function, that allows user to use ANSI datetime format > Wrapping is used in to_date(), to_time() and to_timestamp() functions. > Table of joda and ansi patterns which may be replaced > ||Pattern name|| Ansi format || JodaTime format > | Full name of day| day | > | Day of year | ddd | D > | Day of month| dd | d > | Day of week | d | e > | Name of month | month | > | Abr name of month | mon | MMM > | Full era name | ee | G > | Name of day | dy | E > | Time zone | tz | TZ > | Hour 12 | hh | h > | Hour 12 | hh12| h > | Hour 24 | hh24| H > | Minute of hour | mi | m > | Second of minute| ss | s > | Millisecond of minute | ms | S > | Week of year| ww | w > | Month | mm | MM > | Halfday am | am | aa > | Halfday pm | pm | aa > | ref.| > https://www.postgresql.org/docs/8.2/static/functions-formatting.html| > > http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html > | > Table of ansi pattern modifiers, which may be deleted from string > ||Description || Pattern || > | fill mode (suppress padding blanks and zeroes) | fm | > | fixed format global option (see usage notes)| fx | > | translation mode (print localized day and month names based on > lc_messages) | tm | > | spell mode (not yet implemented)| sp | > | ref.| > https://www.postgresql.org/docs/8.2/static/functions-formatting.html| -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4864) Add ANSI format for date/time functions
[ https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622571#comment-15622571 ] Gautam Kumar Parai commented on DRILL-4864: --- Sorry, I forgot to reassign it back to the developer. I approved the pull request earlier. [~sharnyk] Please commit unless [~adeneche] has further comments? > Add ANSI format for date/time functions > --- > > Key: DRILL-4864 > URL: https://issues.apache.org/jira/browse/DRILL-4864 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Serhii Harnyk >Assignee: Serhii Harnyk > Labels: doc-impacting > Fix For: 1.9.0 > > > The TO_DATE() is exposing the Joda string formatting conventions into the SQL > layer. This is not following SQL conventions used by ANSI and many other > database engines on the market. > Add new UDF "ansi_to_joda(string)", that takes string that represents ANSI > datetime format and returns string that represents equal Joda format. > Add new session option "drill.exec.fn.to_date_format" that can be one of two > values - "JODA"(default) and "ANSI". > If option is set to "JODA" queries with to_date() function would work in > usual way. > If option is set to "ANSI" second argument would be wrapped with > ansi_to_joda() function, that allows user to use ANSI datetime format > Wrapping is used in to_date(), to_time() and to_timestamp() functions. > Table of joda and ansi patterns which may be replaced > ||Pattern name|| Ansi format || JodaTime format > | Full name of day| day | > | Day of year | ddd | D > | Day of month| dd | d > | Day of week | d | e > | Name of month | month | > | Abr name of month | mon | MMM > | Full era name | ee | G > | Name of day | dy | E > | Time zone | tz | TZ > | Hour 12 | hh | h > | Hour 12 | hh12| h > | Hour 24 | hh24| H > | Minute of hour | mi | m > | Second of minute| ss | s > | Millisecond of minute | ms | S > | Week of year| ww | w > | Month | mm | MM > | Halfday am | am | aa > | Halfday pm | pm | aa > | ref.| > https://www.postgresql.org/docs/8.2/static/functions-formatting.html| > > http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html > | > Table of ansi pattern modifiers, which may be deleted from string > ||Description || Pattern || > | fill mode (suppress padding blanks and zeroes) | fm | > | fixed format global option (see usage notes)| fx | > | translation mode (print localized day and month names based on > lc_messages) | tm | > | spell mode (not yet implemented)| sp | > | ref.| > https://www.postgresql.org/docs/8.2/static/functions-formatting.html| -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4674) Allow casting to boolean the same literals as in Postgre
[ https://issues.apache.org/jira/browse/DRILL-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-4674: -- Assignee: Arina Ielchiieva (was: Gautam Kumar Parai) > Allow casting to boolean the same literals as in Postgre > > > Key: DRILL-4674 > URL: https://issues.apache.org/jira/browse/DRILL-4674 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Data Types >Affects Versions: 1.7.0 > Environment: 4 node cluster CentOS >Reporter: Khurram Faraaz >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.9.0 > > > Drill does not return results when we try to cast 0 and 1 to boolean inside a > value constructor. > Drill version : 1.7.0-SNAPSHOT commit ID : 09b26277 > {noformat} > 0: jdbc:drill:schema=dfs.tmp> values(cast(1 as boolean)); > Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 1 > Fragment 0:0 > [Error Id: 35dcc4bb-0c5d-466f-8fb5-cf7f0a892155 on centos-02.qa.lab:31010] > (state=,code=0) > 0: jdbc:drill:schema=dfs.tmp> values(cast(0 as boolean)); > Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 0 > Fragment 0:0 > [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010] > (state=,code=0) > {noformat} > Where as we get results on Postgres for same query. > {noformat} > postgres=# values(cast(1 as boolean)); > column1 > - > t > (1 row) > postgres=# values(cast(0 as boolean)); > column1 > - > f > (1 row) > {noformat} > Stack trace from drillbit.log > {noformat} > 2016-05-13 07:16:16,578 [28ca80bf-0af9-bc05-258b-6b5744739ed8:frag:0:0] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalArgumentException: > Invalid value for boolean: 0 > Fragment 0:0 > [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalArgumentException: Invalid value for boolean: 0 > Fragment 0:0 > [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) > ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > Caused by: java.lang.IllegalArgumentException: Invalid value for boolean: 0 > at > org.apache.drill.exec.test.generated.ProjectorGen9.doSetup(ProjectorTemplate.java:95) > ~[na:na] > at > org.apache.drill.exec.test.generated.ProjectorGen9.setup(ProjectorTemplate.java:93) > ~[na:na] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:444) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:257) > ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apac
[jira] [Created] (DRILL-4992) Failing query with same case statement in both select and order by clause using using hash aggregate
Gautam Kumar Parai created DRILL-4992: - Summary: Failing query with same case statement in both select and order by clause using using hash aggregate Key: DRILL-4992 URL: https://issues.apache.org/jira/browse/DRILL-4992 Project: Apache Drill Issue Type: Bug Affects Versions: 1.8.0, 1.7.0 Reporter: Gautam Kumar Parai Queries that contain case statement in both the select and order by clause as a sub query. Here is an example of such query: dummy +-++ | c_date| c_timestamp | +-++ | 2015-04-23 | 2014-03-16 03:55:21.0 | +-++ alter session set `planner.enable_streamagg` = false; select distinct a1 from ( select SUM(case when c_timestamp is null then 0 else 1 end) from dummy group by c_date order by SUM(case when c_timestamp is null then 0 else 1 end)) as dt(a1); Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes Note that this table is a single table. Below is the stacktrace from log file: {code} 2016-10-31 15:57:45,643 [27e83395-8074-7c94-e318-a6b54176ea9d:frag:0:0] INFO o.a.d.e.w.f.FragmentStatusReporter - 27e83395-8074-7c94-e318-a6b54176ea9d:0:0: State to report: RUNNING 2016-10-31 15:57:45,665 [27e83395-8074-7c94-e318-a6b54176ea9d:frag:0:0] INFO o.a.d.e.p.i.aggregate.HashAggBatch - User Error Occurred org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes [Error Id: ad36df25-3ea8-4c07-87a0-f105b1ce5ae1 ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) ~[drill-common-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:144) [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135) [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81) [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:232) [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:226) [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_67] at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_67] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) [hadoop-common-2.7.0-mapr-1607.jar:na] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:226) [drill-java-exec-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.9.0-SNAPSHOT.jar:1.9.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_67] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_67] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67] 2016-10-31 15:57:45,665 [27e83395-8074-7c94-e318-a6b54176ea9d:frag:0:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 27e83395-8074-7c94-e318-a6b54176ea9d:0:0: State change requested RUNNING --> FAILED {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5043) Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID()
[ https://issues.apache.org/jira/browse/DRILL-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-5043: -- Priority: Minor (was: Major) > Function that returns a unique id per session/connection similar to MySQL's > CONNECTION_ID() > --- > > Key: DRILL-5043 > URL: https://issues.apache.org/jira/browse/DRILL-5043 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.8.0 >Reporter: Nagarajan Chinnasamy >Priority: Minor > Labels: CONNECTION_ID, SESSION, UDF > > Design and implement a function that returns a unique id per > session/connection similar to MySQL's CONNECTION_ID(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5043) Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID()
[ https://issues.apache.org/jira/browse/DRILL-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671468#comment-15671468 ] Gautam Kumar Parai commented on DRILL-5043: --- Thanks for creating the JIRA. Please see the following: [1] on how to contribute to Drill. [2] on how to create a custom UDF. [3] on how to pass some param (sessionID in your case) to the UDF. 1. https://drill.apache.org/docs/contribute-to-drill/ 2. https://drill.apache.org/docs/develop-custom-functions/ 3. QueryContext.java. See how session.getDefaultSchemaName() is passed using the QueryContextInformation. > Function that returns a unique id per session/connection similar to MySQL's > CONNECTION_ID() > --- > > Key: DRILL-5043 > URL: https://issues.apache.org/jira/browse/DRILL-5043 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.8.0 >Reporter: Nagarajan Chinnasamy > Labels: CONNECTION_ID, SESSION, UDF > > Design and implement a function that returns a unique id per > session/connection similar to MySQL's CONNECTION_ID(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5043) Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID()
[ https://issues.apache.org/jira/browse/DRILL-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-5043: -- Affects Version/s: 1.8.0 > Function that returns a unique id per session/connection similar to MySQL's > CONNECTION_ID() > --- > > Key: DRILL-5043 > URL: https://issues.apache.org/jira/browse/DRILL-5043 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.8.0 >Reporter: Nagarajan Chinnasamy >Priority: Minor > Labels: CONNECTION_ID, SESSION, UDF > > Design and implement a function that returns a unique id per > session/connection similar to MySQL's CONNECTION_ID(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4792) Include session options used for a query as part of the profile
[ https://issues.apache.org/jira/browse/DRILL-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15672432#comment-15672432 ] Gautam Kumar Parai commented on DRILL-4792: --- It does not seem to work with the option `store.format` > Include session options used for a query as part of the profile > --- > > Key: DRILL-4792 > URL: https://issues.apache.org/jira/browse/DRILL-4792 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.7.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Labels: doc-impacting > Fix For: 1.9.0 > > Attachments: no_session_options.JPG, session_options_block.JPG, > session_options_collapsed.JPG, session_options_json.JPG > > > Include session options used for a query as part of the profile. > This will be very useful for debugging/diagnostics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5043) Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID()
[ https://issues.apache.org/jira/browse/DRILL-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684196#comment-15684196 ] Gautam Kumar Parai commented on DRILL-5043: --- Could you please post a link to your GitHub branch which has the code? It would be easier for the community to discuss/review the changes. FYI - Sorry, I do not have the answers to your questions. Let's wait for input from the community. > Function that returns a unique id per session/connection similar to MySQL's > CONNECTION_ID() > --- > > Key: DRILL-5043 > URL: https://issues.apache.org/jira/browse/DRILL-5043 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.8.0 >Reporter: Nagarajan Chinnasamy >Priority: Minor > Labels: CONNECTION_ID, SESSION, UDF > Attachments: 01_session_id_sqlline.png, > 02_session_id_webconsole_query.png, 03_session_id_webconsole_result.png > > > Design and implement a function that returns a unique id per > session/connection similar to MySQL's CONNECTION_ID(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5048) AssertionError when case statement is used with timestamp and null
[ https://issues.apache.org/jira/browse/DRILL-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-5048: -- Assignee: Serhii Harnyk (was: Gautam Kumar Parai) > AssertionError when case statement is used with timestamp and null > -- > > Key: DRILL-5048 > URL: https://issues.apache.org/jira/browse/DRILL-5048 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Serhii Harnyk >Assignee: Serhii Harnyk > Labels: ready-to-commit > Fix For: Future > > > AssertionError when we use case with timestamp and null: > {noformat} > 0: jdbc:drill:schema=dfs.tmp> SELECT res, CASE res WHEN true THEN > CAST('1990-10-10 22:40:50' AS TIMESTAMP) ELSE null END > . . . . . . . . . . . . . . > FROM > . . . . . . . . . . . . . . > ( > . . . . . . . . . . . . . . > SELECT > . . . . . . . . . . . . . . > (CASE WHEN (false) THEN null ELSE > CAST('1990-10-10 22:40:50' AS TIMESTAMP) END) res > . . . . . . . . . . . . . . > FROM (values(1)) foo > . . . . . . . . . . . . . . > ) foobar; > Error: SYSTEM ERROR: AssertionError: Type mismatch: > rowtype of new rel: > RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL > rowtype of set: > RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL > [Error Id: b56e0a4d-2f9e-4afd-8c60-5bc2f9d31f8f on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} > Stack trace from drillbit.log > {noformat} > Caused by: java.lang.AssertionError: Type mismatch: > rowtype of new rel: > RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL > rowtype of set: > RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL > at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:1696) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at org.apache.calcite.plan.volcano.RelSubset.add(RelSubset.java:295) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at org.apache.calcite.plan.volcano.RelSet.add(RelSet.java:147) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1818) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1760) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1017) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1037) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1940) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:138) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > ... 16 common frames omitted > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5048) AssertionError when case statement is used with timestamp and null
[ https://issues.apache.org/jira/browse/DRILL-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-5048: -- Labels: ready-to-commit (was: ) > AssertionError when case statement is used with timestamp and null > -- > > Key: DRILL-5048 > URL: https://issues.apache.org/jira/browse/DRILL-5048 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Serhii Harnyk >Assignee: Gautam Kumar Parai > Labels: ready-to-commit > Fix For: Future > > > AssertionError when we use case with timestamp and null: > {noformat} > 0: jdbc:drill:schema=dfs.tmp> SELECT res, CASE res WHEN true THEN > CAST('1990-10-10 22:40:50' AS TIMESTAMP) ELSE null END > . . . . . . . . . . . . . . > FROM > . . . . . . . . . . . . . . > ( > . . . . . . . . . . . . . . > SELECT > . . . . . . . . . . . . . . > (CASE WHEN (false) THEN null ELSE > CAST('1990-10-10 22:40:50' AS TIMESTAMP) END) res > . . . . . . . . . . . . . . > FROM (values(1)) foo > . . . . . . . . . . . . . . > ) foobar; > Error: SYSTEM ERROR: AssertionError: Type mismatch: > rowtype of new rel: > RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL > rowtype of set: > RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL > [Error Id: b56e0a4d-2f9e-4afd-8c60-5bc2f9d31f8f on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} > Stack trace from drillbit.log > {noformat} > Caused by: java.lang.AssertionError: Type mismatch: > rowtype of new rel: > RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL > rowtype of set: > RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL > at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:1696) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at org.apache.calcite.plan.volcano.RelSubset.add(RelSubset.java:295) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at org.apache.calcite.plan.volcano.RelSet.add(RelSet.java:147) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1818) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1760) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1017) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1037) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1940) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:138) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > ... 16 common frames omitted > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5048) AssertionError when case statement is used with timestamp and null
[ https://issues.apache.org/jira/browse/DRILL-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-5048: -- Reviewer: Gautam Kumar Parai > AssertionError when case statement is used with timestamp and null > -- > > Key: DRILL-5048 > URL: https://issues.apache.org/jira/browse/DRILL-5048 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Serhii Harnyk >Assignee: Serhii Harnyk > Labels: ready-to-commit > Fix For: Future > > > AssertionError when we use case with timestamp and null: > {noformat} > 0: jdbc:drill:schema=dfs.tmp> SELECT res, CASE res WHEN true THEN > CAST('1990-10-10 22:40:50' AS TIMESTAMP) ELSE null END > . . . . . . . . . . . . . . > FROM > . . . . . . . . . . . . . . > ( > . . . . . . . . . . . . . . > SELECT > . . . . . . . . . . . . . . > (CASE WHEN (false) THEN null ELSE > CAST('1990-10-10 22:40:50' AS TIMESTAMP) END) res > . . . . . . . . . . . . . . > FROM (values(1)) foo > . . . . . . . . . . . . . . > ) foobar; > Error: SYSTEM ERROR: AssertionError: Type mismatch: > rowtype of new rel: > RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL > rowtype of set: > RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL > [Error Id: b56e0a4d-2f9e-4afd-8c60-5bc2f9d31f8f on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} > Stack trace from drillbit.log > {noformat} > Caused by: java.lang.AssertionError: Type mismatch: > rowtype of new rel: > RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL > rowtype of set: > RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL > at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:1696) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at org.apache.calcite.plan.volcano.RelSubset.add(RelSubset.java:295) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at org.apache.calcite.plan.volcano.RelSet.add(RelSet.java:147) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1818) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1760) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1017) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1037) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1940) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:138) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > ... 16 common frames omitted > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5043) Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID()
[ https://issues.apache.org/jira/browse/DRILL-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726877#comment-15726877 ] Gautam Kumar Parai commented on DRILL-5043: --- Hi Nagarajan, Any updates? Regarding your questions: {quote} I am not sure about one change I made in BitControl.java in the following block: {quote} It should be 32 instead of 36 {quote} Also, I am not sure how to incorporate session_id into "descriptorData" static variable that is initialized at line number 9073 in BitControl.java. Please advice. {quote} I think no change is required. Can you please post the pull request? We can only start the review process when we have a pull request. > Function that returns a unique id per session/connection similar to MySQL's > CONNECTION_ID() > --- > > Key: DRILL-5043 > URL: https://issues.apache.org/jira/browse/DRILL-5043 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.8.0 >Reporter: Nagarajan Chinnasamy >Priority: Minor > Labels: CONNECTION_ID, SESSION, UDF > Attachments: 01_session_id_sqlline.png, > 02_session_id_webconsole_query.png, 03_session_id_webconsole_result.png > > > Design and implement a function that returns a unique id per > session/connection similar to MySQL's CONNECTION_ID(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4919) Fix select count(1) / count(*) on csv with header
[ https://issues.apache.org/jira/browse/DRILL-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-4919: -- Labels: ready-to-commit (was: ) > Fix select count(1) / count(*) on csv with header > - > > Key: DRILL-4919 > URL: https://issues.apache.org/jira/browse/DRILL-4919 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.8.0 >Reporter: F Méthot >Assignee: Arina Ielchiieva >Priority: Minor > Labels: ready-to-commit > Fix For: Future > > > This happens since 1.8 > Dataset (I used extended char for display purpose) test.csvh: > a,b,c,d\n > 1,2,3,4\n > 5,6,7,8\n > Storage config: > "csvh": { > "type": "text", > "extensions" : [ > "csvh" >], >"extractHeader": true, >"delimiter": "," > } > select count(1) from dfs.`test.csvh` > Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header > names are supported > coumn name columns > column index > Fragment 0:0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5088) Error when reading DBRef column
[ https://issues.apache.org/jira/browse/DRILL-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822090#comment-15822090 ] Gautam Kumar Parai commented on DRILL-5088: --- Sorry, jumped the gun on the +1. I had one more question - Can we add a testcase for the bug? Thanks. > Error when reading DBRef column > --- > > Key: DRILL-5088 > URL: https://issues.apache.org/jira/browse/DRILL-5088 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types > Environment: drill 1.9.0 > mongo 3.2 >Reporter: Guillaume Champion >Assignee: Chunhui Shi > > In a mongo database with DBRef, when a DBRef is inserted in the first line of > a mongo's collection drill query failed : > {code} > 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2; > Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for > class com.mongodb.DBRef. > {code} > Simple example to reproduce: > In mongo instance > {code} > db.contact2.drop(); > db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" > : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) }); > {code} > In drill : > {code} > 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2; > Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for > class com.mongodb.DBRef. > [Error Id: 2944d766-e483-4453-a706-3d481397b186 on Analytics-Biznet:31010] > (state=,code=0) > {code} > If the first line doesn't contain de DBRef, drill will querying correctly : > In a mongo instance : > {code} > db.contact2.drop(); > db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8939") }); > db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" > : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) }); > {code} > In drill : > {code} > 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2; > +--+---+ > | _id |account > | > +--+---+ > | {"$oid":"582081d96b69060001fd8939"} | {"$id":{}} > | > | {"$oid":"582081d96b69060001fd8938"} | > {"$ref":"contact","$id":{"$oid":"999cbf116b69060001fd8611"}} | > +--+---+ > 2 rows selected (0,563 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5088) Error when reading DBRef column
[ https://issues.apache.org/jira/browse/DRILL-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852859#comment-15852859 ] Gautam Kumar Parai commented on DRILL-5088: --- [~cshi] I think you should mark this issue as `blocked by` DRILL-5196 (using issue links)? Putting a `ready-to-commit` tag may cause this to be committed prior to DRILL-5196 which will break the testcase. > Error when reading DBRef column > --- > > Key: DRILL-5088 > URL: https://issues.apache.org/jira/browse/DRILL-5088 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types > Environment: drill 1.9.0 > mongo 3.2 >Reporter: Guillaume Champion >Assignee: Chunhui Shi > > In a mongo database with DBRef, when a DBRef is inserted in the first line of > a mongo's collection drill query failed : > {code} > 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2; > Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for > class com.mongodb.DBRef. > {code} > Simple example to reproduce: > In mongo instance > {code} > db.contact2.drop(); > db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" > : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) }); > {code} > In drill : > {code} > 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2; > Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for > class com.mongodb.DBRef. > [Error Id: 2944d766-e483-4453-a706-3d481397b186 on Analytics-Biznet:31010] > (state=,code=0) > {code} > If the first line doesn't contain de DBRef, drill will querying correctly : > In a mongo instance : > {code} > db.contact2.drop(); > db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8939") }); > db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" > : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) }); > {code} > In drill : > {code} > 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2; > +--+---+ > | _id |account > | > +--+---+ > | {"$oid":"582081d96b69060001fd8939"} | {"$id":{}} > | > | {"$oid":"582081d96b69060001fd8938"} | > {"$ref":"contact","$id":{"$oid":"999cbf116b69060001fd8611"}} | > +--+---+ > 2 rows selected (0,563 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-1328) Support table statistics
[ https://issues.apache.org/jira/browse/DRILL-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877317#comment-15877317 ] Gautam Kumar Parai commented on DRILL-1328: --- I have addressed the comments in the earlier pull request. [~amansinha100] [~paul-rogers] could you please review the changes? The pull request: https://github.com/apache/drill/pull/729 > Support table statistics > > > Key: DRILL-1328 > URL: https://issues.apache.org/jira/browse/DRILL-1328 > Project: Apache Drill > Issue Type: Improvement >Reporter: Cliff Buchanan >Assignee: Gautam Kumar Parai > Fix For: Future > > Attachments: 0001-PRE-Set-value-count-in-splitAndTransfer.patch > > > This consists of several subtasks > * implement operators to generate statistics > * add "analyze table" support to parser/planner > * create a metadata provider to allow statistics to be used by optiq in > planning optimization > * implement statistics functions > Right now, the bulk of this functionality is implemented, but it hasn't been > rigorously tested and needs to have some definite answers for some of the > parts "around the edges" (how analyze table figures out where the table > statistics are located, how a table "append" should work in a read only file > system) > Also, here are a few known caveats: > * table statistics are collected by creating a sql query based on the string > path of the table. This should probably be done with a Table reference. > * Case sensitivity for column statistics is probably iffy > * Math for combining two column NDVs into a joint NDV should be checked. > * Schema changes aren't really being considered yet. > * adding getDrillTable is probably unnecessary; it might be better to do > getTable().unwrap(DrillTable.class) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-3029) Wrong result with correlated not exists subquery
[ https://issues.apache.org/jira/browse/DRILL-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923368#comment-15923368 ] Gautam Kumar Parai commented on DRILL-3029: --- [~agirish] Instead of empty results I see the following error {quote} ERROR: correlated subquery with skip-level correlations is not supported (subselect.c:394) {quote}. Can you please take a look? > Wrong result with correlated not exists subquery > > > Key: DRILL-3029 > URL: https://issues.apache.org/jira/browse/DRILL-3029 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.0.0 >Reporter: Victoria Markman >Assignee: Jinfeng Ni >Priority: Critical > Fix For: Future > > Attachments: t1_t2_t3.tar > > > Subquery has correlation to two outer tables in the previous blocks. > Postgres returns empty result set in this case: > {code} > 0: jdbc:drill:schema=dfs> select > . . . . . . . . . . . . > distinct a1 > . . . . . . . . . . . . > from > . . . . . . . . . . . . > t1 > . . . . . . . . . . . . > where not exists > . . . . . . . . . . . . > ( > . . . . . . . . . . . . > select > . . . . . . . . . . . . > * > . . . . . . . . . . . . > from > . . . . . . . . . . . . > t2 > . . . . . . . . . . . . > where not exists > . . . . . . . . . . . . > ( > . . . . . . . . . . . . > select > . . . . . . . . . . . . > * > . . . . . . . . . . . . > from > . . . . . . . . . . . . > t3 > . . . . . . . . . . . . > where > . . . . . . . . . . . . > t3.b3 = t2.b2 and > . . . . . . . . . . . . > t3.a3 = t1.a1 > . . . . . . . . . . . . > ) > . . . . . . . . . . . . > ) > . . . . . . . . . . . . > ; > ++ > | a1 | > ++ > | 1 | > | 2 | > | 3 | > | 4 | > | 5 | > | 6 | > | 7 | > | 9 | > | 10 | > | null | > ++ > 10 rows selected (0.991 seconds) > {code} > Copy/paste reproduction: > {code} > select > distinct a1 > from > t1 > where not exists > ( > select > * > from > t2 > where not exists > ( > select > * > from > t3 > where > t3.b3 = t2.b2 and > t3.a3 = t1.a1 > ) > ) > ; > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (DRILL-3029) Wrong result with correlated not exists subquery
[ https://issues.apache.org/jira/browse/DRILL-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai reassigned DRILL-3029: - Assignee: Gautam Kumar Parai (was: Jinfeng Ni) > Wrong result with correlated not exists subquery > > > Key: DRILL-3029 > URL: https://issues.apache.org/jira/browse/DRILL-3029 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.0.0 >Reporter: Victoria Markman >Assignee: Gautam Kumar Parai >Priority: Critical > Fix For: Future > > Attachments: t1_t2_t3.tar > > > Subquery has correlation to two outer tables in the previous blocks. > Postgres returns empty result set in this case: > {code} > 0: jdbc:drill:schema=dfs> select > . . . . . . . . . . . . > distinct a1 > . . . . . . . . . . . . > from > . . . . . . . . . . . . > t1 > . . . . . . . . . . . . > where not exists > . . . . . . . . . . . . > ( > . . . . . . . . . . . . > select > . . . . . . . . . . . . > * > . . . . . . . . . . . . > from > . . . . . . . . . . . . > t2 > . . . . . . . . . . . . > where not exists > . . . . . . . . . . . . > ( > . . . . . . . . . . . . > select > . . . . . . . . . . . . > * > . . . . . . . . . . . . > from > . . . . . . . . . . . . > t3 > . . . . . . . . . . . . > where > . . . . . . . . . . . . > t3.b3 = t2.b2 and > . . . . . . . . . . . . > t3.a3 = t1.a1 > . . . . . . . . . . . . > ) > . . . . . . . . . . . . > ) > . . . . . . . . . . . . > ; > ++ > | a1 | > ++ > | 1 | > | 2 | > | 3 | > | 4 | > | 5 | > | 6 | > | 7 | > | 9 | > | 10 | > | null | > ++ > 10 rows selected (0.991 seconds) > {code} > Copy/paste reproduction: > {code} > select > distinct a1 > from > t1 > where not exists > ( > select > * > from > t2 > where not exists > ( > select > * > from > t3 > where > t3.b3 = t2.b2 and > t3.a3 = t1.a1 > ) > ) > ; > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (DRILL-5049) wrong results - correlated subquery interacting with null equality join
[ https://issues.apache.org/jira/browse/DRILL-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai reassigned DRILL-5049: - Assignee: Gautam Kumar Parai > wrong results - correlated subquery interacting with null equality join > --- > > Key: DRILL-5049 > URL: https://issues.apache.org/jira/browse/DRILL-5049 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.9.0 >Reporter: Khurram Faraaz >Assignee: Gautam Kumar Parai >Priority: Critical > Attachments: nullEqJoin_17.drill_res, nullEqJoin_17.postgres, > t_alltype.parquet > > > Here is a query that uses null equality join. Drill 1.9.0 returns 124 > records, whereas Postgres 9.3 returns 145 records. I am on Drill 1.9.0 git > commit id: db308549 > I have attached the results from Drill 1.9.0 and Postgres, please review. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> explain plan for > . . . . . . . . . . . . . . > SELECT * > . . . . . . . . . . . . . . > FROM `t_alltype.parquet` t1 > . . . . . . . . . . . . . . > WHERE EXISTS > . . . . . . . . . . . . . . > ( > . . . . . . . . . . . . . . > SELECT * > . . . . . . . . . . . . . . > FROM `t_alltype.parquet` t2 > . . . . . . . . . . . . . . > WHERE t1.c4 = t2.c4 OR (t1.c4 > IS NULL AND t2.c4 IS NULL) > . . . . . . . . . . . . . . > ); > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(*=[$0]) > 00-02Project(T30¦¦*=[$0]) > 00-03 HashJoin(condition=[AND(=($1, $2), =($1, $3))], > joinType=[inner]) > 00-05Project(T30¦¦*=[$0], c4=[$1]) > 00-07 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:///tmp/t_alltype.parquet]], > selectionRoot=maprfs:/tmp/t_alltype.parquet, numFiles=1, > usedMetadataFile=false, columns=[`*`]]]) > 00-04HashAgg(group=[{0, 1}], agg#0=[MIN($2)]) > 00-06 Project(c40=[$1], c400=[$1], $f0=[true]) > 00-08HashJoin(condition=[IS NOT DISTINCT FROM($0, $1)], > joinType=[inner]) > 00-10 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:///tmp/t_alltype.parquet]], > selectionRoot=maprfs:/tmp/t_alltype.parquet, numFiles=1, > usedMetadataFile=false, columns=[`c4`]]]) > 00-09 Project(c40=[$0]) > 00-11HashAgg(group=[{0}]) > 00-12 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:///tmp/t_alltype.parquet]], > selectionRoot=maprfs:/tmp/t_alltype.parquet, numFiles=1, > usedMetadataFile=false, columns=[`c4`]]]) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5049) wrong results - correlated subquery interacting with null equality join
[ https://issues.apache.org/jira/browse/DRILL-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925299#comment-15925299 ] Gautam Kumar Parai commented on DRILL-5049: --- Please refer CALCITE-714, CALCITE-1200 for a more complete reference. > wrong results - correlated subquery interacting with null equality join > --- > > Key: DRILL-5049 > URL: https://issues.apache.org/jira/browse/DRILL-5049 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.9.0 >Reporter: Khurram Faraaz >Assignee: Gautam Kumar Parai >Priority: Critical > Attachments: nullEqJoin_17.drill_res, nullEqJoin_17.postgres, > t_alltype.parquet > > > Here is a query that uses null equality join. Drill 1.9.0 returns 124 > records, whereas Postgres 9.3 returns 145 records. I am on Drill 1.9.0 git > commit id: db308549 > I have attached the results from Drill 1.9.0 and Postgres, please review. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> explain plan for > . . . . . . . . . . . . . . > SELECT * > . . . . . . . . . . . . . . > FROM `t_alltype.parquet` t1 > . . . . . . . . . . . . . . > WHERE EXISTS > . . . . . . . . . . . . . . > ( > . . . . . . . . . . . . . . > SELECT * > . . . . . . . . . . . . . . > FROM `t_alltype.parquet` t2 > . . . . . . . . . . . . . . > WHERE t1.c4 = t2.c4 OR (t1.c4 > IS NULL AND t2.c4 IS NULL) > . . . . . . . . . . . . . . > ); > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(*=[$0]) > 00-02Project(T30¦¦*=[$0]) > 00-03 HashJoin(condition=[AND(=($1, $2), =($1, $3))], > joinType=[inner]) > 00-05Project(T30¦¦*=[$0], c4=[$1]) > 00-07 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:///tmp/t_alltype.parquet]], > selectionRoot=maprfs:/tmp/t_alltype.parquet, numFiles=1, > usedMetadataFile=false, columns=[`*`]]]) > 00-04HashAgg(group=[{0, 1}], agg#0=[MIN($2)]) > 00-06 Project(c40=[$1], c400=[$1], $f0=[true]) > 00-08HashJoin(condition=[IS NOT DISTINCT FROM($0, $1)], > joinType=[inner]) > 00-10 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:///tmp/t_alltype.parquet]], > selectionRoot=maprfs:/tmp/t_alltype.parquet, numFiles=1, > usedMetadataFile=false, columns=[`c4`]]]) > 00-09 Project(c40=[$0]) > 00-11HashAgg(group=[{0}]) > 00-12 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:///tmp/t_alltype.parquet]], > selectionRoot=maprfs:/tmp/t_alltype.parquet, numFiles=1, > usedMetadataFile=false, columns=[`c4`]]]) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5394) Optimize query planning for MapR-DB tables by caching row counts
[ https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951400#comment-15951400 ] Gautam Kumar Parai commented on DRILL-5394: --- [~ppenumarthy] is the code ready to go in Apache? If so , then we should mark it as with ready-to-commit tag. > Optimize query planning for MapR-DB tables by caching row counts > > > Key: DRILL-5394 > URL: https://issues.apache.org/jira/browse/DRILL-5394 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization, Storage - MapRDB >Affects Versions: 1.9.0, 1.10.0 >Reporter: Abhishek Girish >Assignee: Padma Penumarthy > Labels: MapR-DB-Binary, ready-to-commit > Fix For: 1.11.0 > > > On large MapR-DB tables, it was observed that the query planning time was > longer than expected. With DEBUG logs, it was understood that there were > multiple calls being made to get MapR-DB region locations and to fetch total > row count for tables. > {code} > 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Function > ... > 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Special > ... > 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Special > ... > 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Special > ... > 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms): > {code} > We should cache these stats and reuse them where all required during query > planning. This should help reduce query planning time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5319) Refactor FragmentContext and OptionManager for unit testing
[ https://issues.apache.org/jira/browse/DRILL-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959873#comment-15959873 ] Gautam Kumar Parai commented on DRILL-5319: --- [~paul-rogers] since this can be merged independently and the rest of the Drill-5318 changes depends on this PR marking it as ready-to-commit. > Refactor FragmentContext and OptionManager for unit testing > --- > > Key: DRILL-5319 > URL: https://issues.apache.org/jira/browse/DRILL-5319 > Project: Apache Drill > Issue Type: Sub-task > Components: Tools, Build & Test >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > Roll-up task for two refactorings, see the sub-tasks for details. This ticket > allows a single PR for the two different refactorings since the work heavily > overlaps. See DRILL-5320 and DRILL-5321 for details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5319) Refactor FragmentContext and OptionManager for unit testing
[ https://issues.apache.org/jira/browse/DRILL-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-5319: -- Labels: ready-to-commit (was: ) > Refactor FragmentContext and OptionManager for unit testing > --- > > Key: DRILL-5319 > URL: https://issues.apache.org/jira/browse/DRILL-5319 > Project: Apache Drill > Issue Type: Sub-task > Components: Tools, Build & Test >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.11.0 > > > Roll-up task for two refactorings, see the sub-tasks for details. This ticket > allows a single PR for the two different refactorings since the work heavily > overlaps. See DRILL-5320 and DRILL-5321 for details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-1328) Support table statistics
[ https://issues.apache.org/jira/browse/DRILL-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061374#comment-16061374 ] Gautam Kumar Parai commented on DRILL-1328: --- Based on the last discussion with the reviewers and Drill community members, we would hold off on the PR because it also causes regressions in queries in TPC-H, TPC-DS benchmarks. We identified that we need histograms and other enhancements to fully address the regressions. I would post a new PR once these issues are addressed. > Support table statistics > > > Key: DRILL-1328 > URL: https://issues.apache.org/jira/browse/DRILL-1328 > Project: Apache Drill > Issue Type: Improvement >Reporter: Cliff Buchanan >Assignee: Gautam Kumar Parai > Fix For: Future > > Attachments: 0001-PRE-Set-value-count-in-splitAndTransfer.patch > > > This consists of several subtasks > * implement operators to generate statistics > * add "analyze table" support to parser/planner > * create a metadata provider to allow statistics to be used by optiq in > planning optimization > * implement statistics functions > Right now, the bulk of this functionality is implemented, but it hasn't been > rigorously tested and needs to have some definite answers for some of the > parts "around the edges" (how analyze table figures out where the table > statistics are located, how a table "append" should work in a read only file > system) > Also, here are a few known caveats: > * table statistics are collected by creating a sql query based on the string > path of the table. This should probably be done with a Table reference. > * Case sensitivity for column statistics is probably iffy > * Math for combining two column NDVs into a joint NDV should be checked. > * Schema changes aren't really being considered yet. > * adding getDrillTable is probably unnecessary; it might be better to do > getTable().unwrap(DrillTable.class) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-4665) Partition pruning not working for hive partitioned table with 'LIKE' and '=' filter
[ https://issues.apache.org/jira/browse/DRILL-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336911#comment-15336911 ] Gautam Kumar Parai commented on DRILL-4665: --- I have created the pull request https://github.com/apache/drill/pull/526 [~amansinha100] could you please take a look? Thanks! > Partition pruning not working for hive partitioned table with 'LIKE' and '=' > filter > --- > > Key: DRILL-4665 > URL: https://issues.apache.org/jira/browse/DRILL-4665 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Hive >Affects Versions: 1.6.0 >Reporter: Chun Chang >Assignee: Gautam Kumar Parai > > This problem was initially reported by Shankar Mane > > > Problem: > > > > 1. In drill, we are using hive partition table. But explain plan (same > > query) for like and = operator differs and used all partitions in case of > > like operator. > > 2. If you see below drill explain plans: Like operator uses *all* > > partitions where > > = operator uses *only* partition filtered by log_date condition. > I reproduced the reported issue. I have a partitioned hive external table > with the following schema: > {noformat} > create external table if not exists lineitem_partitioned ( > l_orderkey bigint, > l_partkey bigint, > l_suppkey bigint, > l_linenumber bigint, > l_quantity double, > l_extendedprice double, > l_discount double, > l_tax double, > l_returnflag string, > l_linestatus string, > l_shipdate string, > l_commitdate string, > l_receiptdate string, > l_shipinstruct string, > l_shipmode string, > l_comment string > ) > partitioned by (year int, month int, day int) > STORED AS PARQUET > LOCATION '/drill/testdata/tpch100_dir_partitioned_5files/lineitem'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=1) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/1'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=2) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/2'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=3) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/3'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=4) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/4'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=5) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/5'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=6) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/6'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=7) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/7'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=8) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/8'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=9) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/9'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=10) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/10'; > {noformat} > Without 'LIKE', drill plans right: > {noformat} > 0: jdbc:drill:schema=dfs.drillTestDir> explain plan for select l_shipdate, > l_linestatus, l_shipinstruct, `day` from lineitem_partitioned_db2k_2_999 > where `day` = 2 limit 10; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(l_shipdate=[$0], l_linestatus=[$1], l_shipinstruct=[$2], > day=[$3]) > 00-02SelectionVectorRemover > 00-03 Limit(fetch=[10]) > 00-04Limit(fetch=[10]) > 00-05 Project(l_shipdate=[$1], l_linestatus=[$0], > l_shipinstruct=[$2], day=[$3]) > 00-06Scan(groupscan=[HiveScan [table=Table(dbName:md815_db2k, > tableName:lineitem_partitioned_db2k_2_999), columns=[`l_linestatus`, > `l_shipdate`, `l_shipinstruct`, `day`], numPartitions=1, partitions= > [Partition(values:[2015, 1, 2])], > inputDirectories=[maprfs:/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/2]]]) > {noformat} > With 'LIKE', pruning is not happening: > {noformat} > 0: jdbc:drill:schema=dfs.drillTestDir> explain plan for select l_shipdate, > l_linestatus, l_shipinstruct, `day` from lineitem_partitioned_db2k_2_999 > where `day` = 2 and l_shipinstruct like '%BACK%' limit 10; > +--+--+ > | text | json
[jira] [Commented] (DRILL-2330) Add support for nested aggregate expressions for window aggregates
[ https://issues.apache.org/jira/browse/DRILL-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338566#comment-15338566 ] Gautam Kumar Parai commented on DRILL-2330: --- [~amansinha100] Please review the pull request above. It adds testcase for nested aggregates support in Drill. Thanks! > Add support for nested aggregate expressions for window aggregates > -- > > Key: DRILL-2330 > URL: https://issues.apache.org/jira/browse/DRILL-2330 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 0.8.0 >Reporter: Abhishek Girish >Assignee: Gautam Kumar Parai > Fix For: Future > > Attachments: drillbit.log > > > Aggregate expressions currently cannot be nested. > *The following query fails to validate:* > {code:sql} > select avg(sum(i_item_sk)) from item; > {code} > Error: > Query failed: SqlValidatorException: Aggregate expressions cannot be nested > Log attached. > Reference: TPCDS queries (20, 63, 98, ...) fail to execute. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4743) HashJoin's not fully parallelized in query plan
Gautam Kumar Parai created DRILL-4743: - Summary: HashJoin's not fully parallelized in query plan Key: DRILL-4743 URL: https://issues.apache.org/jira/browse/DRILL-4743 Project: Apache Drill Issue Type: Bug Affects Versions: 1.5.0 Reporter: Gautam Kumar Parai Assignee: Gautam Kumar Parai The underlying problem is filter selectivity under-estimate for a query with complicated predicates e.g. deeply nested and/or predicates. This leads to under parallelization of the major fragment doing the join. To really resolve this problem we need table/column statistics to correctly estimate the selectivity. However, in the absence of statistics OR even when existing statistics are insufficient to get a correct estimate of selectivity this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342826#comment-15342826 ] Gautam Kumar Parai commented on DRILL-4743: --- I have created a pull request https://github.com/apache/drill/pull/534 [~amansinha100] can you please take a look and provide the feedback > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343061#comment-15343061 ] Gautam Kumar Parai commented on DRILL-4743: --- [~amansinha100] I have updated the pull request. Please take a look. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-1328) Support table statistics
[ https://issues.apache.org/jira/browse/DRILL-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai reassigned DRILL-1328: - Assignee: Gautam Kumar Parai > Support table statistics > > > Key: DRILL-1328 > URL: https://issues.apache.org/jira/browse/DRILL-1328 > Project: Apache Drill > Issue Type: Improvement >Reporter: Cliff Buchanan >Assignee: Gautam Kumar Parai > Fix For: Future > > Attachments: 0001-PRE-Set-value-count-in-splitAndTransfer.patch > > > This consists of several subtasks > * implement operators to generate statistics > * add "analyze table" support to parser/planner > * create a metadata provider to allow statistics to be used by optiq in > planning optimization > * implement statistics functions > Right now, the bulk of this functionality is implemented, but it hasn't been > rigorously tested and needs to have some definite answers for some of the > parts "around the edges" (how analyze table figures out where the table > statistics are located, how a table "append" should work in a read only file > system) > Also, here are a few known caveats: > * table statistics are collected by creating a sql query based on the string > path of the table. This should probably be done with a Table reference. > * Case sensitivity for column statistics is probably iffy > * Math for combining two column NDVs into a joint NDV should be checked. > * Schema changes aren't really being considered yet. > * adding getDrillTable is probably unnecessary; it might be better to do > getTable().unwrap(DrillTable.class) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4771) Drill should avoid doing the same join twice if count(distinct) exists
Gautam Kumar Parai created DRILL-4771: - Summary: Drill should avoid doing the same join twice if count(distinct) exists Key: DRILL-4771 URL: https://issues.apache.org/jira/browse/DRILL-4771 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.6.0 Reporter: Gautam Kumar Parai Assignee: Gautam Kumar Parai When the query has one distinct aggregate and one or more non-distinct aggregates, the join instance need not produce the join-based plan. We can generate multi-phase aggregates. Another approach would be to use grouping sets. However, Drill is unable to support grouping sets and instead relies on the join-based plan (see the plan below) {code} select emp.empno, count(*), avg(distinct dept.deptno) from sales.emp emp inner join sales.dept dept on emp.deptno = dept.deptno group by emp.empno LogicalProject(EMPNO=[$0], EXPR$1=[$1], EXPR$2=[$3]) LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner]) LogicalAggregate(group=[{0}], EXPR$1=[COUNT()]) LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) LogicalJoin(condition=[=($7, $9)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) LogicalAggregate(group=[{0}], EXPR$2=[AVG($1)]) LogicalAggregate(group=[{0, 1}]) LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) LogicalJoin(condition=[=($7, $9)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code} The more efficient form should look like this {code} select emp.empno, count(*), avg(distinct dept.deptno) from sales.emp emp inner join sales.dept dept on emp.deptno = dept.deptno group by emp.empno LogicalAggregate(group=[{0}], EXPR$1=[SUM($2)], EXPR$2=[AVG($1)]) LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()]) LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) LogicalJoin(condition=[=($7, $9)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4771) Drill should avoid doing the same join twice if count(distinct) exists
[ https://issues.apache.org/jira/browse/DRILL-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-4771: -- Affects Version/s: (was: 1.6.0) 1.2.0 > Drill should avoid doing the same join twice if count(distinct) exists > -- > > Key: DRILL-4771 > URL: https://issues.apache.org/jira/browse/DRILL-4771 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > > When the query has one distinct aggregate and one or more non-distinct > aggregates, the join instance need not produce the join-based plan. We can > generate multi-phase aggregates. Another approach would be to use grouping > sets. However, Drill is unable to support grouping sets and instead relies on > the join-based plan (see the plan below) > {code} > select emp.empno, count(*), avg(distinct dept.deptno) > from sales.emp emp inner join sales.dept dept > on emp.deptno = dept.deptno > group by emp.empno > LogicalProject(EMPNO=[$0], EXPR$1=[$1], EXPR$2=[$3]) > LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner]) > LogicalAggregate(group=[{0}], EXPR$1=[COUNT()]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > LogicalAggregate(group=[{0}], EXPR$2=[AVG($1)]) > LogicalAggregate(group=[{0, 1}]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} > The more efficient form should look like this > {code} > select emp.empno, count(*), avg(distinct dept.deptno) > from sales.emp emp inner join sales.dept dept > on emp.deptno = dept.deptno > group by emp.empno > LogicalAggregate(group=[{0}], EXPR$1=[SUM($2)], EXPR$2=[AVG($1)]) > LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372060#comment-15372060 ] Gautam Kumar Parai commented on DRILL-4743: --- [~amansinha100] I have updated the pull request (https://github.com/apache/drill/pull/534) according to your comments. Please take a look. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4665) Partition pruning not working for hive partitioned table with 'LIKE' and '=' filter
[ https://issues.apache.org/jira/browse/DRILL-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372097#comment-15372097 ] Gautam Kumar Parai commented on DRILL-4665: --- [~amansinha100] I have updated the pull request(https://github.com/apache/drill/pull/526) as per your comments. Please take a look. > Partition pruning not working for hive partitioned table with 'LIKE' and '=' > filter > --- > > Key: DRILL-4665 > URL: https://issues.apache.org/jira/browse/DRILL-4665 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Hive >Affects Versions: 1.6.0 >Reporter: Chun Chang >Assignee: Gautam Kumar Parai > > This problem was initially reported by Shankar Mane > > > Problem: > > > > 1. In drill, we are using hive partition table. But explain plan (same > > query) for like and = operator differs and used all partitions in case of > > like operator. > > 2. If you see below drill explain plans: Like operator uses *all* > > partitions where > > = operator uses *only* partition filtered by log_date condition. > I reproduced the reported issue. I have a partitioned hive external table > with the following schema: > {noformat} > create external table if not exists lineitem_partitioned ( > l_orderkey bigint, > l_partkey bigint, > l_suppkey bigint, > l_linenumber bigint, > l_quantity double, > l_extendedprice double, > l_discount double, > l_tax double, > l_returnflag string, > l_linestatus string, > l_shipdate string, > l_commitdate string, > l_receiptdate string, > l_shipinstruct string, > l_shipmode string, > l_comment string > ) > partitioned by (year int, month int, day int) > STORED AS PARQUET > LOCATION '/drill/testdata/tpch100_dir_partitioned_5files/lineitem'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=1) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/1'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=2) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/2'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=3) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/3'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=4) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/4'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=5) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/5'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=6) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/6'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=7) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/7'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=8) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/8'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=9) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/9'; > alter table lineitem_partitioned add partition(year=2015, month=1, day=10) > location > '/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/10'; > {noformat} > Without 'LIKE', drill plans right: > {noformat} > 0: jdbc:drill:schema=dfs.drillTestDir> explain plan for select l_shipdate, > l_linestatus, l_shipinstruct, `day` from lineitem_partitioned_db2k_2_999 > where `day` = 2 limit 10; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(l_shipdate=[$0], l_linestatus=[$1], l_shipinstruct=[$2], > day=[$3]) > 00-02SelectionVectorRemover > 00-03 Limit(fetch=[10]) > 00-04Limit(fetch=[10]) > 00-05 Project(l_shipdate=[$1], l_linestatus=[$0], > l_shipinstruct=[$2], day=[$3]) > 00-06Scan(groupscan=[HiveScan [table=Table(dbName:md815_db2k, > tableName:lineitem_partitioned_db2k_2_999), columns=[`l_linestatus`, > `l_shipdate`, `l_shipinstruct`, `day`], numPartitions=1, partitions= > [Partition(values:[2015, 1, 2])], > inputDirectories=[maprfs:/drill/testdata/tpch100_dir_partitioned_5files/lineitem/2015/1/2]]]) > {noformat} > With 'LIKE', pruning is not happening: > {noformat} > 0: jdbc:drill:schema=dfs.drillTestDir> explain plan for select l_shipdate, > l_linestatus, l_shipinstruct, `day` from lineitem_partitioned_db2k_2_999 > where `day` = 2 and l_shipinstruct like '%BACK%' limit 10; > +--+--+ > | text |
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373539#comment-15373539 ] Gautam Kumar Parai commented on DRILL-4743: --- [~amansinha100]] I have updated the pull request (https://github.com/apache/drill/pull/534). Please take a look. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2330) Add support for nested aggregate expressions for window aggregates
[ https://issues.apache.org/jira/browse/DRILL-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378326#comment-15378326 ] Gautam Kumar Parai commented on DRILL-2330: --- I have updated the pull request (https://github.com/apache/drill/pull/529). [~amansinha100] can you please take a look? > Add support for nested aggregate expressions for window aggregates > -- > > Key: DRILL-2330 > URL: https://issues.apache.org/jira/browse/DRILL-2330 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 0.8.0 >Reporter: Abhishek Girish >Assignee: Gautam Kumar Parai > Fix For: Future > > Attachments: drillbit.log > > > Aggregate expressions currently cannot be nested. > *The following query fails to validate:* > {code:sql} > select avg(sum(i_item_sk)) from item; > {code} > Error: > Query failed: SqlValidatorException: Aggregate expressions cannot be nested > Log attached. > Reference: TPCDS queries (20, 63, 98, ...) fail to execute. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-1328) Support table statistics
[ https://issues.apache.org/jira/browse/DRILL-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383080#comment-15383080 ] Gautam Kumar Parai commented on DRILL-1328: --- I have uploaded a new design specification based on the original design. It aims to address some concerns in the original design. Please review and provide feedback. Thanks! > Support table statistics > > > Key: DRILL-1328 > URL: https://issues.apache.org/jira/browse/DRILL-1328 > Project: Apache Drill > Issue Type: Improvement >Reporter: Cliff Buchanan >Assignee: Gautam Kumar Parai > Fix For: Future > > Attachments: 0001-PRE-Set-value-count-in-splitAndTransfer.patch > > > This consists of several subtasks > * implement operators to generate statistics > * add "analyze table" support to parser/planner > * create a metadata provider to allow statistics to be used by optiq in > planning optimization > * implement statistics functions > Right now, the bulk of this functionality is implemented, but it hasn't been > rigorously tested and needs to have some definite answers for some of the > parts "around the edges" (how analyze table figures out where the table > statistics are located, how a table "append" should work in a read only file > system) > Also, here are a few known caveats: > * table statistics are collected by creating a sql query based on the string > path of the table. This should probably be done with a Table reference. > * Case sensitivity for column statistics is probably iffy > * Math for combining two column NDVs into a joint NDV should be checked. > * Schema changes aren't really being considered yet. > * adding getDrillTable is probably unnecessary; it might be better to do > getTable().unwrap(DrillTable.class) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4789) In-list to join optimization should have configurable in-list size
Gautam Kumar Parai created DRILL-4789: - Summary: In-list to join optimization should have configurable in-list size Key: DRILL-4789 URL: https://issues.apache.org/jira/browse/DRILL-4789 Project: Apache Drill Issue Type: Bug Reporter: Gautam Kumar Parai Assignee: Gautam Kumar Parai We current have a default in-list size of 20. Instead of the magic number 20, we should make this configurable. {code} select count * from table where col in (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-4789) In-list to join optimization should have configurable in-list size
[ https://issues.apache.org/jira/browse/DRILL-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai closed DRILL-4789. - Resolution: Duplicate > In-list to join optimization should have configurable in-list size > --- > > Key: DRILL-4789 > URL: https://issues.apache.org/jira/browse/DRILL-4789 > Project: Apache Drill > Issue Type: Bug >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > > We current have a default in-list size of 20. Instead of the magic number 20, > we should make this configurable. > {code} > select count * from table where col in > (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-3710) Make the 20 in-list optimization configurable
[ https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai reassigned DRILL-3710: - Assignee: Gautam Kumar Parai > Make the 20 in-list optimization configurable > - > > Key: DRILL-3710 > URL: https://issues.apache.org/jira/browse/DRILL-3710 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.1.0 >Reporter: Hao Zhu >Assignee: Gautam Kumar Parai > Fix For: Future > > > If Drill has more than 20 in-lists , Drill can do an optimization to convert > that in-lists into a small hash table in memory, and then do a table join > instead. > This can improve the performance of the query which has many in-lists. > Could we make "20" configurable? So that we do not need to add duplicate/junk > in-list to make it more than 20. > Sample query is : > select count(*) from table where col in > (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4795) Nested aggregate windowed query fails - IllegalStateException
[ https://issues.apache.org/jira/browse/DRILL-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388181#comment-15388181 ] Gautam Kumar Parai commented on DRILL-4795: --- Yes, we should group by on the partitioning column > Nested aggregate windowed query fails - IllegalStateException > -- > > Key: DRILL-4795 > URL: https://issues.apache.org/jira/browse/DRILL-4795 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.8.0 > Environment: 4 node CentOS cluster >Reporter: Khurram Faraaz >Assignee: Gautam Kumar Parai >Priority: Critical > Attachments: tblWnulls.parquet > > > The below two window function queries fail on MapR Drill 1.8.0 commit ID > 34ca63ba > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select avg(sum(c1)) over (partition by c1) from > `tblWnulls.parquet`; > Error: SYSTEM ERROR: IllegalStateException: This generator does not support > mappings beyond > Fragment 0:0 > [Error Id: b32ed6b0-6b81-4d5f-bce0-e4ea269c5af1 on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select avg(sum(c1)) over (partition by c2) from > `tblWnulls.parquet`; > Error: SYSTEM ERROR: IllegalStateException: This generator does not support > mappings beyond > Fragment 0:0 > [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} > From drillbit.log > {noformat} > 2016-07-21 11:19:27,778 [286f503f-9b20-87e3-d7ec-2d3881f29e4a:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 286f503f-9b20-87e3-d7ec-2d3881f29e4a: select avg(sum(c1)) over (partition by > c2) from `tblWnulls.parquet` > ... > 2016-07-21 11:19:27,979 [286f503f-9b20-87e3-d7ec-2d3881f29e4a:frag:0:0] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: > This generator does not support mappings beyond > Fragment 0:0 > [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: This generator does not support mappings beyond > Fragment 0:0 > [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) > ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_101] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_101] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101] > Caused by: java.lang.IllegalStateException: This generator does not support > mappings beyond > at > org.apache.drill.exec.compile.sig.MappingSet.enterChild(MappingSet.java:102) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression(EvaluationVisitor.java:188) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression(EvaluationVisitor.java:1077) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression(EvaluationVisitor.java:815) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression(EvaluationVisitor.java:795) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.common.expression.FunctionHolderExpression.accept(FunctionHolderExpression.java:47) > ~[drill-logical-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitValueVectorWriteExpression(EvaluationVisitor.java:359) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitUnknown(Eva
[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable
[ https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389127#comment-15389127 ] Gautam Kumar Parai commented on DRILL-3710: --- I have created the pull request (https://github.com/apache/drill/pull/552). [~jni] [~amansinha100] can you please take a look? Thanks! > Make the 20 in-list optimization configurable > - > > Key: DRILL-3710 > URL: https://issues.apache.org/jira/browse/DRILL-3710 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.1.0 >Reporter: Hao Zhu >Assignee: Gautam Kumar Parai > Fix For: Future > > > If Drill has more than 20 in-lists , Drill can do an optimization to convert > that in-lists into a small hash table in memory, and then do a table join > instead. > This can improve the performance of the query which has many in-lists. > Could we make "20" configurable? So that we do not need to add duplicate/junk > in-list to make it more than 20. > Sample query is : > select count(*) from table where col in > (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389157#comment-15389157 ] Gautam Kumar Parai commented on DRILL-4743: --- I have updated the pull request (https://github.com/apache/drill/pull/534). [~amansinha100] [~sudheeshkatkam] can you please take a look? Thanks! > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable
[ https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390107#comment-15390107 ] Gautam Kumar Parai commented on DRILL-3710: --- I have updated the pull request (https://github.com/apache/drill/pull/552) based on your comments [~sudheeshkatkam][~amansinha100]. Can you please take a look? Thanks! > Make the 20 in-list optimization configurable > - > > Key: DRILL-3710 > URL: https://issues.apache.org/jira/browse/DRILL-3710 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.1.0 >Reporter: Hao Zhu >Assignee: Gautam Kumar Parai > Fix For: Future > > > If Drill has more than 20 in-lists , Drill can do an optimization to convert > that in-lists into a small hash table in memory, and then do a table join > instead. > This can improve the performance of the query which has many in-lists. > Could we make "20" configurable? So that we do not need to add duplicate/junk > in-list to make it more than 20. > Sample query is : > select count(*) from table where col in > (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-4743: -- Description: The underlying problem is filter selectivity under-estimate for a query with complicated predicates e.g. deeply nested and/or predicates. This leads to under parallelization of the major fragment doing the join. To really resolve this problem we need table/column statistics to correctly estimate the selectivity. However, in the absence of statistics OR even when existing statistics are insufficient to get a correct estimate of selectivity this will serve as a workaround. For now, the fix is to provide options for controlling the lower and upper bounds for filter selectivity. The user can use the options {code} planner.filter.min_selectivity_estimate_factor {code} was: The underlying problem is filter selectivity under-estimate for a query with complicated predicates e.g. deeply nested and/or predicates. This leads to under parallelization of the major fragment doing the join. To really resolve this problem we need table/column statistics to correctly estimate the selectivity. However, in the absence of statistics OR even when existing statistics are insufficient to get a correct estimate of selectivity this will serve as a workaround. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > Fix For: 1.8.0 > > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. > For now, the fix is to provide options for controlling the lower and upper > bounds for filter selectivity. The user can use the options > {code} > planner.filter.min_selectivity_estimate_factor > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-4743: -- Description: The underlying problem is filter selectivity under-estimate for a query with complicated predicates e.g. deeply nested and/or predicates. This leads to under parallelization of the major fragment doing the join. To really resolve this problem we need table/column statistics to correctly estimate the selectivity. However, in the absence of statistics OR even when existing statistics are insufficient to get a correct estimate of selectivity this will serve as a workaround. For now, the fix is to provide options for controlling the lower and upper bounds for filter selectivity. The user can use the following options. The selectivity can be varied between 0 and 1 with min selectivity always less than or equal to max selectivity. {code} planner.filter.min_selectivity_estimate_factor planner.filter.max_selectivity_estimate_factor {code} When using 'explain plan including all attributes for ' it should cap the estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators downstream is not directly controlled by these options. However, they may change as a result of dependency between different operators. was: The underlying problem is filter selectivity under-estimate for a query with complicated predicates e.g. deeply nested and/or predicates. This leads to under parallelization of the major fragment doing the join. To really resolve this problem we need table/column statistics to correctly estimate the selectivity. However, in the absence of statistics OR even when existing statistics are insufficient to get a correct estimate of selectivity this will serve as a workaround. For now, the fix is to provide options for controlling the lower and upper bounds for filter selectivity. The user can use the options {code} planner.filter.min_selectivity_estimate_factor {code} > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > Fix For: 1.8.0 > > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. > For now, the fix is to provide options for controlling the lower and upper > bounds for filter selectivity. The user can use the following options. The > selectivity can be varied between 0 and 1 with min selectivity always less > than or equal to max selectivity. > {code} > planner.filter.min_selectivity_estimate_factor > planner.filter.max_selectivity_estimate_factor > {code} > When using 'explain plan including all attributes for ' it should cap the > estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators > downstream is not directly controlled by these options. However, they may > change as a result of dependency between different operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-4743: -- Description: The underlying problem is filter selectivity under-estimate for a query with complicated predicates e.g. deeply nested and/or predicates. This leads to under parallelization of the major fragment doing the join. To really resolve this problem we need table/column statistics to correctly estimate the selectivity. However, in the absence of statistics OR even when existing statistics are insufficient to get a correct estimate of selectivity this will serve as a workaround. For now, the fix is to provide options for controlling the lower and upper bounds for filter selectivity. The user can use the following options. The selectivity can be varied between 0 and 1 with min selectivity always less than or equal to max selectivity. {code} planner.filter.min_selectivity_estimate_factor planner.filter.max_selectivity_estimate_factor {code} When using 'explain plan including all attributes for ' it should cap the estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators downstream is not directly controlled by these options. However, they may change as a result of dependency between different operators. was: The underlying problem is filter selectivity under-estimate for a query with complicated predicates e.g. deeply nested and/or predicates. This leads to under parallelization of the major fragment doing the join. To really resolve this problem we need table/column statistics to correctly estimate the selectivity. However, in the absence of statistics OR even when existing statistics are insufficient to get a correct estimate of selectivity this will serve as a workaround. For now, the fix is to provide options for controlling the lower and upper bounds for filter selectivity. The user can use the following options. The selectivity can be varied between 0 and 1 with min selectivity always less than or equal to max selectivity. {code} planner.filter.min_selectivity_estimate_factor planner.filter.max_selectivity_estimate_factor {code} When using 'explain plan including all attributes for ' it should cap the estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators downstream is not directly controlled by these options. However, they may change as a result of dependency between different operators. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > Fix For: 1.8.0 > > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. > For now, the fix is to provide options for controlling the lower and upper > bounds for filter selectivity. The user can use the following options. The > selectivity can be varied between 0 and 1 with min selectivity always less > than or equal to max selectivity. > {code} planner.filter.min_selectivity_estimate_factor > planner.filter.max_selectivity_estimate_factor > {code} > When using 'explain plan including all attributes for ' it should cap the > estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators > downstream is not directly controlled by these options. However, they may > change as a result of dependency between different operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-4743: -- Description: The underlying problem is filter selectivity under-estimate for a query with complicated predicates e.g. deeply nested and/or predicates. This leads to under parallelization of the major fragment doing the join. To really resolve this problem we need table/column statistics to correctly estimate the selectivity. However, in the absence of statistics OR even when existing statistics are insufficient to get a correct estimate of selectivity this will serve as a workaround. For now, the fix is to provide options for controlling the lower and upper bounds for filter selectivity. The user can use the following options. The selectivity can be varied between 0 and 1 with min selectivity always less than or equal to max selectivity. {code}planner.filter.min_selectivity_estimate_factor planner.filter.max_selectivity_estimate_factor {code} When using 'explain plan including all attributes for ' it should cap the estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators downstream is not directly controlled by these options. However, they may change as a result of dependency between different operators. was: The underlying problem is filter selectivity under-estimate for a query with complicated predicates e.g. deeply nested and/or predicates. This leads to under parallelization of the major fragment doing the join. To really resolve this problem we need table/column statistics to correctly estimate the selectivity. However, in the absence of statistics OR even when existing statistics are insufficient to get a correct estimate of selectivity this will serve as a workaround. For now, the fix is to provide options for controlling the lower and upper bounds for filter selectivity. The user can use the following options. The selectivity can be varied between 0 and 1 with min selectivity always less than or equal to max selectivity. {code} planner.filter.min_selectivity_estimate_factor planner.filter.max_selectivity_estimate_factor {code} When using 'explain plan including all attributes for ' it should cap the estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators downstream is not directly controlled by these options. However, they may change as a result of dependency between different operators. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > Fix For: 1.8.0 > > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. > For now, the fix is to provide options for controlling the lower and upper > bounds for filter selectivity. The user can use the following options. The > selectivity can be varied between 0 and 1 with min selectivity always less > than or equal to max selectivity. > {code}planner.filter.min_selectivity_estimate_factor > planner.filter.max_selectivity_estimate_factor > {code} > When using 'explain plan including all attributes for ' it should cap the > estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators > downstream is not directly controlled by these options. However, they may > change as a result of dependency between different operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4806) need a better error message
[ https://issues.apache.org/jira/browse/DRILL-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai reassigned DRILL-4806: - Assignee: Gautam Kumar Parai > need a better error message > > > Key: DRILL-4806 > URL: https://issues.apache.org/jira/browse/DRILL-4806 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.8.0 >Reporter: Khurram Faraaz >Assignee: Gautam Kumar Parai >Priority: Minor > Labels: window_function > > Need a better error message, column c2 is of type CHAR. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> SELECT MAX(AVG(c2)) OVER ( PARTITION BY c2 > ORDER BY c2 ), c2 FROM `tblWnulls.parquet` GROUP BY c2; > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > Error in expression at index -1. Error: Missing function implementation: > [castINT(BIT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > Fragment 0:0 > [Error Id: 076464ff-7385-4bee-9704-38dec781af32 on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3710) Make the 20 in-list optimization configurable
[ https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394980#comment-15394980 ] Gautam Kumar Parai commented on DRILL-3710: --- [~amansinha100]] I have updated the pull request (https://github.com/apache/drill/pull/552) to account for latest Calcite changes. Can you please take a look? Thanks! > Make the 20 in-list optimization configurable > - > > Key: DRILL-3710 > URL: https://issues.apache.org/jira/browse/DRILL-3710 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.1.0 >Reporter: Hao Zhu >Assignee: Gautam Kumar Parai > Fix For: Future > > > If Drill has more than 20 in-lists , Drill can do an optimization to convert > that in-lists into a small hash table in memory, and then do a table join > instead. > This can improve the performance of the query which has many in-lists. > Could we make "20" configurable? So that we do not need to add duplicate/junk > in-list to make it more than 20. > Sample query is : > select count(*) from table where col in > (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-4743: -- Description: The underlying problem is filter selectivity under-estimate for a query with complicated predicates e.g. deeply nested and/or predicates. This leads to under parallelization of the major fragment doing the join. To really resolve this problem we need table/column statistics to correctly estimate the selectivity. However, in the absence of statistics OR even when existing statistics are insufficient to get a correct estimate of selectivity this will serve as a workaround. For now, the fix is to provide options for controlling the lower and upper bounds for filter selectivity. The user can use the following options. The selectivity can be varied between 0 and 1 with min selectivity always less than or equal to max selectivity. {code}planner.filter.min_selectivity_estimate_factor planner.filter.max_selectivity_estimate_factor {code} When using 'explain plan including all attributes for ' it should cap the estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators downstream is not directly controlled by these options. However, they may change as a result of dependency between different operators. The FILTER operator only operates on the input of its immediate upstream operator (e.g. SCAN, AGG). If two different filters are present in the same plan, they might have different selectivities based on their immediate upstream operators ROWCOUNT. was: The underlying problem is filter selectivity under-estimate for a query with complicated predicates e.g. deeply nested and/or predicates. This leads to under parallelization of the major fragment doing the join. To really resolve this problem we need table/column statistics to correctly estimate the selectivity. However, in the absence of statistics OR even when existing statistics are insufficient to get a correct estimate of selectivity this will serve as a workaround. For now, the fix is to provide options for controlling the lower and upper bounds for filter selectivity. The user can use the following options. The selectivity can be varied between 0 and 1 with min selectivity always less than or equal to max selectivity. {code}planner.filter.min_selectivity_estimate_factor planner.filter.max_selectivity_estimate_factor {code} When using 'explain plan including all attributes for ' it should cap the estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators downstream is not directly controlled by these options. However, they may change as a result of dependency between different operators. > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > Fix For: 1.8.0 > > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. > For now, the fix is to provide options for controlling the lower and upper > bounds for filter selectivity. The user can use the following options. The > selectivity can be varied between 0 and 1 with min selectivity always less > than or equal to max selectivity. > {code}planner.filter.min_selectivity_estimate_factor > planner.filter.max_selectivity_estimate_factor > {code} > When using 'explain plan including all attributes for ' it should cap the > estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators > downstream is not directly controlled by these options. However, they may > change as a result of dependency between different operators. The FILTER > operator only operates on the input of its immediate upstream operator (e.g. > SCAN, AGG). If two different filters are present in the same plan, they might > have different selectivities based on their immediate upstream operators > ROWCOUNT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4808) CTE query with window function results in AssertionError
[ https://issues.apache.org/jira/browse/DRILL-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-4808: -- Assignee: (was: Gautam Kumar Parai) > CTE query with window function results in AssertionError > > > Key: DRILL-4808 > URL: https://issues.apache.org/jira/browse/DRILL-4808 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.8.0 >Reporter: Khurram Faraaz > Labels: window_function > > Below query that uses CTE and window functions results in AssertionError > Same query over same data works on Postgres. > MapR Drill 1.8.0 commit ID : 34ca63ba > {noformat} > 0: jdbc:drill:schema=dfs.tmp> WITH v1 ( a, b, c, d ) AS > . . . . . . . . . . . . . . > ( > . . . . . . . . . . . . . . > SELECT col0, col8, MAX(MIN(col8)) over > (partition by col7 order by col8) as max_col8, col7 from > `allTypsUniq.parquet` GROUP BY col0,col7,col8 > . . . . . . . . . . . . . . > ) > . . . . . . . . . . . . . . > select * from ( select a, b, c, d from v1 where > c > 'IN' GROUP BY a,b,c,d order by a,b,c,d); > Error: SYSTEM ERROR: AssertionError: Internal error: Type 'RecordType(ANY > col0, ANY col8, ANY max_col8, ANY col7)' has no field 'a' > [Error Id: 5c058176-741a-42cd-8433-0cd81115776b on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} > Stack trace from drillbit.log for above failing query > {noformat} > 2016-07-26 16:57:04,627 [2868699e-ae56-66f4-9439-8db2132ef265:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 2868699e-ae56-66f4-9439-8db2132ef265: WITH v1 ( a, b, c, d ) AS > ( > SELECT col0, col8, MAX(MIN(col8)) over (partition by col7 order by col8) > as max_col8, col7 from `allTypsUniq.parquet` GROUP BY col0,col7,col8 > ) > select * from ( select a, b, c, d from v1 where c > 'IN' GROUP BY a,b,c,d > order by a,b,c,d) > 2016-07-26 16:57:04,666 [2868699e-ae56-66f4-9439-8db2132ef265:foreman] ERROR > o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: AssertionError: Internal > error: Type 'RecordType(ANY col0, ANY col8, ANY max_col8, ANY col7)' has no > field 'a' > [Error Id: 5c058176-741a-42cd-8433-0cd81115776b on centos-01.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > AssertionError: Internal error: Type 'RecordType(ANY col0, ANY col8, ANY > max_col8, ANY col7)' has no field 'a' > [Error Id: 5c058176-741a-42cd-8433-0cd81115776b on centos-01.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) > ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:791) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:901) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:271) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_101] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_101] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101] > Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected > exception during fragment initialization: Internal error: Type > 'RecordType(ANY col0, ANY col8, ANY max_col8, ANY col7)' has no field 'a' > ... 4 common frames omitted > Caused by: java.lang.AssertionError: Internal error: Type 'RecordType(ANY > col0, ANY col8, ANY max_col8, ANY col7)' has no field 'a' > at org.apache.calcite.util.Util.newInternal(Util.java:777) > ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14] > at > org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:167) > ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14] > at > org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3225) > ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14] > at > org.apache.calcite.sql2rel.SqlToRelConverter.access$1500(SqlToRelConverter.java:185) > ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14] > at > org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4181) > ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14] > at > org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:3603) > ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14] > at > org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:274) > ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-dril
[jira] [Commented] (DRILL-4797) Partition by aggregate function in a window query results in IllegalStateException
[ https://issues.apache.org/jira/browse/DRILL-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396800#comment-15396800 ] Gautam Kumar Parai commented on DRILL-4797: --- [~khfaraaz] Is this Drill-2330 specific issue? I see the same for a window aggregates {code}select avg(l_quantity) over (partition by min(l_quantity)) from cp.`tpch/lineitem.parquet`; Error: SYSTEM ERROR: IllegalStateException: This generator does not support mappings beyond{code} Can you please confirm? > Partition by aggregate function in a window query results in > IllegalStateException > -- > > Key: DRILL-4797 > URL: https://issues.apache.org/jira/browse/DRILL-4797 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.8.0 >Reporter: Khurram Faraaz >Assignee: Gautam Kumar Parai > > Use of aggregate function in the partitioning column in a windowed query > results in an IllegalStateException > {noformat} > 0: jdbc:drill:zk=local> select avg(sum(l_quantity)) over (partition by > min(l_quantity)) from cp.`tpch/lineitem.parquet`; > Error: SYSTEM ERROR: IllegalStateException: This generator does not support > mappings beyond > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-4797) Partition by aggregate function in a window query results in IllegalStateException
[ https://issues.apache.org/jira/browse/DRILL-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396800#comment-15396800 ] Gautam Kumar Parai edited comment on DRILL-4797 at 7/28/16 2:27 AM: [~khfaraaz] Is this a Drill-2330 specific issue? I see the same for window aggregates {code}select avg(l_quantity) over (partition by min(l_quantity)) from cp.`tpch/lineitem.parquet`; Error: SYSTEM ERROR: IllegalStateException: This generator does not support mappings beyond{code} Can you please confirm? was (Author: gparai): [~khfaraaz] Is this Drill-2330 specific issue? I see the same for a window aggregates {code}select avg(l_quantity) over (partition by min(l_quantity)) from cp.`tpch/lineitem.parquet`; Error: SYSTEM ERROR: IllegalStateException: This generator does not support mappings beyond{code} Can you please confirm? > Partition by aggregate function in a window query results in > IllegalStateException > -- > > Key: DRILL-4797 > URL: https://issues.apache.org/jira/browse/DRILL-4797 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.8.0 >Reporter: Khurram Faraaz >Assignee: Gautam Kumar Parai > > Use of aggregate function in the partitioning column in a windowed query > results in an IllegalStateException > {noformat} > 0: jdbc:drill:zk=local> select avg(sum(l_quantity)) over (partition by > min(l_quantity)) from cp.`tpch/lineitem.parquet`; > Error: SYSTEM ERROR: IllegalStateException: This generator does not support > mappings beyond > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4806) need a better error message
[ https://issues.apache.org/jira/browse/DRILL-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396815#comment-15396815 ] Gautam Kumar Parai commented on DRILL-4806: --- [~khfaraaz] This can be reduced to a simple testcase which seems unrelated to Drill-2330. Can you please confirm? {code}select avg(first_name) from cp.`employee.json`; Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [castINT(BIT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: c36a67a8-c45e-4844-b988-c767316ea834 on 10.250.50.33:31010] (state=,code=0) {code} > need a better error message > > > Key: DRILL-4806 > URL: https://issues.apache.org/jira/browse/DRILL-4806 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.8.0 >Reporter: Khurram Faraaz >Assignee: Gautam Kumar Parai >Priority: Minor > Labels: window_function > > Need a better error message, column c2 is of type CHAR. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> SELECT MAX(AVG(c2)) OVER ( PARTITION BY c2 > ORDER BY c2 ), c2 FROM `tblWnulls.parquet` GROUP BY c2; > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > Error in expression at index -1. Error: Missing function implementation: > [castINT(BIT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > Fragment 0:0 > [Error Id: 076464ff-7385-4bee-9704-38dec781af32 on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-4806) need a better error message
[ https://issues.apache.org/jira/browse/DRILL-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396815#comment-15396815 ] Gautam Kumar Parai edited comment on DRILL-4806 at 7/28/16 2:47 AM: [~khfaraaz] This can be reduced to a simple testcase which seems unrelated to Drill-2330. Can you please confirm? {code}select avg(first_name) from cp.`employee.json`; Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [castINT(BIT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: c36a67a8-c45e-4844-b988-c767316ea834 on 10.250.50.33:31010] (state=,code=0) {code} [~amansinha100] Can you please suggest if an error should be thrown by Calcite during planning instead of Drill? was (Author: gparai): [~khfaraaz] This can be reduced to a simple testcase which seems unrelated to Drill-2330. Can you please confirm? {code}select avg(first_name) from cp.`employee.json`; Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [castINT(BIT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: c36a67a8-c45e-4844-b988-c767316ea834 on 10.250.50.33:31010] (state=,code=0) {code} > need a better error message > > > Key: DRILL-4806 > URL: https://issues.apache.org/jira/browse/DRILL-4806 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.8.0 >Reporter: Khurram Faraaz >Assignee: Gautam Kumar Parai >Priority: Minor > Labels: window_function > > Need a better error message, column c2 is of type CHAR. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> SELECT MAX(AVG(c2)) OVER ( PARTITION BY c2 > ORDER BY c2 ), c2 FROM `tblWnulls.parquet` GROUP BY c2; > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > Error in expression at index -1. Error: Missing function implementation: > [castINT(BIT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > Fragment 0:0 > [Error Id: 076464ff-7385-4bee-9704-38dec781af32 on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4806) need a better error message
[ https://issues.apache.org/jira/browse/DRILL-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-4806: -- Assignee: (was: Gautam Kumar Parai) > need a better error message > > > Key: DRILL-4806 > URL: https://issues.apache.org/jira/browse/DRILL-4806 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.8.0 >Reporter: Khurram Faraaz >Priority: Minor > Labels: window_function > > Need a better error message, column c2 is of type CHAR. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> SELECT MAX(AVG(c2)) OVER ( PARTITION BY c2 > ORDER BY c2 ), c2 FROM `tblWnulls.parquet` GROUP BY c2; > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > Error in expression at index -1. Error: Missing function implementation: > [castINT(BIT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > Fragment 0:0 > [Error Id: 076464ff-7385-4bee-9704-38dec781af32 on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4806) need a better error message
[ https://issues.apache.org/jira/browse/DRILL-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398262#comment-15398262 ] Gautam Kumar Parai commented on DRILL-4806: --- This issue seems unrelated to nested aggregates. We will continue work on this in the future. > need a better error message > > > Key: DRILL-4806 > URL: https://issues.apache.org/jira/browse/DRILL-4806 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.8.0 >Reporter: Khurram Faraaz >Assignee: Gautam Kumar Parai >Priority: Minor > Labels: window_function > > Need a better error message, column c2 is of type CHAR. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> SELECT MAX(AVG(c2)) OVER ( PARTITION BY c2 > ORDER BY c2 ), c2 FROM `tblWnulls.parquet` GROUP BY c2; > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > Error in expression at index -1. Error: Missing function implementation: > [castINT(BIT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > Fragment 0:0 > [Error Id: 076464ff-7385-4bee-9704-38dec781af32 on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4469) SUM window query returns incorrect results over integer data
[ https://issues.apache.org/jira/browse/DRILL-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412145#comment-15412145 ] Gautam Kumar Parai commented on DRILL-4469: --- [~zfong] No, I do not think so. Those address validity checks i.e. failing invalid queries during compilation. However, this looks like wrong results for a valid query. > SUM window query returns incorrect results over integer data > > > Key: DRILL-4469 > URL: https://issues.apache.org/jira/browse/DRILL-4469 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.6.0 > Environment: 4 node CentOS cluster >Reporter: Khurram Faraaz >Priority: Critical > Labels: window_function > Attachments: t_alltype.csv, t_alltype.parquet > > > SUM window query returns incorrect results as compared to Postgres, with or > without the frame clause in the window definition. Note that there is a sub > query involved and data in column c1 is sorted integer data with no nulls. > Drill 1.6.0 commit ID: 6d5f4983 > Results from Drill 1.6.0 > {noformat} > 0: jdbc:drill:schema=dfs.tmp> SELECT SUM(c1) OVER w FROM (select * from > dfs.tmp.`t_alltype`) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1 RANGE > BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING); > +-+ > | EXPR$0 | > +-+ > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > ... > | 10585 | > | 10585 | > | 10585 | > ++ > 145 rows selected (0.257 seconds) > {noformat} > results from Postgres 9.3 > {noformat} > postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW > w AS (PARTITION BY c8 ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND > UNBOUNDED FOLLOWING); > sum > -- > 4499 > 4499 > 4499 > 4499 > 4499 > 4499 > ... > 5613 > 5613 > 5613 > 473 > 473 > 473 > 473 > 473 > (145 rows) > {noformat} > Removing the frame clause from window definition, still results in completely > different results on Postgres vs Drill > Results from Drill 1.6.0 > {noformat} > 0: jdbc:drill:schema=dfs.tmp>SELECT SUM(c1) OVER w FROM (select * from > t_alltype) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1); > +-+ > | EXPR$0 | > +-+ > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > ... > | 10585 | > | 10585 | > | 10585 | > | 10585 | > | 10585 | > ++ > 145 rows selected (0.28 seconds) > {noformat} > Results from Postgres > {noformat} > postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW > w AS (PARTITION BY c8 ORDER BY c1); > sum > -- > 5 >12 >21 >33 >47 >62 >78 >96 > 115 > 135 > 158 > 182 > 207 > 233 > 260 > 289 > ... > 4914 > 5051 > 5189 > 5328 > 5470 > 5613 > 8 >70 > 198 > 332 > 473 > (145 rows) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4795) Nested aggregate windowed query fails - IllegalStateException
[ https://issues.apache.org/jira/browse/DRILL-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412798#comment-15412798 ] Gautam Kumar Parai commented on DRILL-4795: --- I have created a pull request (https://github.com/apache/drill/pull/563). [~jni] [~amansinha100] can you please review it? Thanks! > Nested aggregate windowed query fails - IllegalStateException > -- > > Key: DRILL-4795 > URL: https://issues.apache.org/jira/browse/DRILL-4795 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.8.0 > Environment: 4 node CentOS cluster >Reporter: Khurram Faraaz >Assignee: Gautam Kumar Parai >Priority: Critical > Attachments: tblWnulls.parquet > > > The below two window function queries fail on MapR Drill 1.8.0 commit ID > 34ca63ba > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select avg(sum(c1)) over (partition by c1) from > `tblWnulls.parquet`; > Error: SYSTEM ERROR: IllegalStateException: This generator does not support > mappings beyond > Fragment 0:0 > [Error Id: b32ed6b0-6b81-4d5f-bce0-e4ea269c5af1 on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select avg(sum(c1)) over (partition by c2) from > `tblWnulls.parquet`; > Error: SYSTEM ERROR: IllegalStateException: This generator does not support > mappings beyond > Fragment 0:0 > [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} > From drillbit.log > {noformat} > 2016-07-21 11:19:27,778 [286f503f-9b20-87e3-d7ec-2d3881f29e4a:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 286f503f-9b20-87e3-d7ec-2d3881f29e4a: select avg(sum(c1)) over (partition by > c2) from `tblWnulls.parquet` > ... > 2016-07-21 11:19:27,979 [286f503f-9b20-87e3-d7ec-2d3881f29e4a:frag:0:0] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: > This generator does not support mappings beyond > Fragment 0:0 > [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: This generator does not support mappings beyond > Fragment 0:0 > [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) > ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_101] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_101] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101] > Caused by: java.lang.IllegalStateException: This generator does not support > mappings beyond > at > org.apache.drill.exec.compile.sig.MappingSet.enterChild(MappingSet.java:102) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression(EvaluationVisitor.java:188) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression(EvaluationVisitor.java:1077) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression(EvaluationVisitor.java:815) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression(EvaluationVisitor.java:795) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.common.expression.FunctionHolderExpression.accept(FunctionHolderExpression.java:47) > ~[drill-logical-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitValueVectorWriteExpression(EvaluationVisitor.java:359) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] >
[jira] [Resolved] (DRILL-4795) Nested aggregate windowed query fails - IllegalStateException
[ https://issues.apache.org/jira/browse/DRILL-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai resolved DRILL-4795. --- Resolution: Fixed Closed with commit: 0bac42dec63a46ca787f6c5fe5a51b9a97e0d6cc > Nested aggregate windowed query fails - IllegalStateException > -- > > Key: DRILL-4795 > URL: https://issues.apache.org/jira/browse/DRILL-4795 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.8.0 > Environment: 4 node CentOS cluster >Reporter: Khurram Faraaz >Assignee: Gautam Kumar Parai >Priority: Critical > Attachments: tblWnulls.parquet > > > The below two window function queries fail on MapR Drill 1.8.0 commit ID > 34ca63ba > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select avg(sum(c1)) over (partition by c1) from > `tblWnulls.parquet`; > Error: SYSTEM ERROR: IllegalStateException: This generator does not support > mappings beyond > Fragment 0:0 > [Error Id: b32ed6b0-6b81-4d5f-bce0-e4ea269c5af1 on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select avg(sum(c1)) over (partition by c2) from > `tblWnulls.parquet`; > Error: SYSTEM ERROR: IllegalStateException: This generator does not support > mappings beyond > Fragment 0:0 > [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} > From drillbit.log > {noformat} > 2016-07-21 11:19:27,778 [286f503f-9b20-87e3-d7ec-2d3881f29e4a:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 286f503f-9b20-87e3-d7ec-2d3881f29e4a: select avg(sum(c1)) over (partition by > c2) from `tblWnulls.parquet` > ... > 2016-07-21 11:19:27,979 [286f503f-9b20-87e3-d7ec-2d3881f29e4a:frag:0:0] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: > This generator does not support mappings beyond > Fragment 0:0 > [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: This generator does not support mappings beyond > Fragment 0:0 > [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) > ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_101] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_101] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101] > Caused by: java.lang.IllegalStateException: This generator does not support > mappings beyond > at > org.apache.drill.exec.compile.sig.MappingSet.enterChild(MappingSet.java:102) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression(EvaluationVisitor.java:188) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression(EvaluationVisitor.java:1077) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression(EvaluationVisitor.java:815) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression(EvaluationVisitor.java:795) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.common.expression.FunctionHolderExpression.accept(FunctionHolderExpression.java:47) > ~[drill-logical-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitValueVectorWriteExpression(EvaluationVisitor.java:359) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitUnknown(EvaluationVisitor.java:341
[jira] [Resolved] (DRILL-4796) AssertionError - Nested sum(avg(c1)) over window
[ https://issues.apache.org/jira/browse/DRILL-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai resolved DRILL-4796. --- Resolution: Fixed Closed with commit: 0bac42dec63a46ca787f6c5fe5a51b9a97e0d6cc > AssertionError - Nested sum(avg(c1)) over window > > > Key: DRILL-4796 > URL: https://issues.apache.org/jira/browse/DRILL-4796 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.8.0 >Reporter: Khurram Faraaz >Assignee: Gautam Kumar Parai > > Nested window function query fails on MapR Drill 1.8.0 commit ID 34ca63ba > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select sum(avg(c1)) over (partition by c2) from > `tblWnulls.parquet`; > Error: SYSTEM ERROR: AssertionError: todo: implement syntax > FUNCTION_STAR(COUNT($1)) > [Error Id: fa5e1751-87a2-4880-baf9-7e132253be7c on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} > stack trace from drillbit.log > {noformat} > 2016-07-21 11:25:40,023 [286f4ecc-59bd-113e-1edf-d93411b255aa:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 286f4ecc-59bd-113e-1edf-d93411b255aa: select sum(avg(c1)) over (partition by > c2) from `tblWnulls.parquet` > ... > 2016-07-21 11:25:40,183 [286f4ecc-59bd-113e-1edf-d93411b255aa:foreman] ERROR > o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: AssertionError: todo: > implement syntax FUNCTION_STAR(COUNT($1)) > [Error Id: fa5e1751-87a2-4880-baf9-7e132253be7c on centos-01.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > AssertionError: todo: implement syntax FUNCTION_STAR(COUNT($1)) > [Error Id: fa5e1751-87a2-4880-baf9-7e132253be7c on centos-01.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) > ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:791) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:901) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:271) > [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_101] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_101] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101] > Caused by: org.apache.drill.exec.work.foreman.ForemanException: > Unexpected exception during fragment initialization: todo: implement syntax > FUNCTION_STAR(COUNT($1)) > ... 4 common frames omitted > Caused by: java.lang.AssertionError: todo: implement syntax > FUNCTION_STAR(COUNT($1)) > at > org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:198) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:80) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) > ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14] > at > org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.doFunction(DrillOptiq.java:205) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:105) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:80) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) > ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14] > at > org.apache.drill.exec.planner.logical.DrillOptiq.toDrill(DrillOptiq.java:77) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.planner.common.DrillProjectRelBase.getProjectExpressions(DrillProjectRelBase.java:111) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.planner.physical.ProjectPrel.getPhysicalOperator(ProjectPrel.java:59) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.planner.physical.SortPrel.getPhysicalOperator(SortPrel.java:81) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.planner.physical.SelectionVectorRemoverPrel.getPhysicalOperator(SelectionVectorRemoverPrel.java:48) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.
[jira] [Commented] (DRILL-4771) Drill should avoid doing the same join twice if count(distinct) exists
[ https://issues.apache.org/jira/browse/DRILL-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497847#comment-15497847 ] Gautam Kumar Parai commented on DRILL-4771: --- I have created the pull request https://github.com/apache/drill/pull/588. [~amansinha100] Can you please review the PR? Thanks! > Drill should avoid doing the same join twice if count(distinct) exists > -- > > Key: DRILL-4771 > URL: https://issues.apache.org/jira/browse/DRILL-4771 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > > When the query has one distinct aggregate and one or more non-distinct > aggregates, the join instance need not produce the join-based plan. We can > generate multi-phase aggregates. Another approach would be to use grouping > sets. However, Drill is unable to support grouping sets and instead relies on > the join-based plan (see the plan below) > {code} > select emp.empno, count(*), avg(distinct dept.deptno) > from sales.emp emp inner join sales.dept dept > on emp.deptno = dept.deptno > group by emp.empno > LogicalProject(EMPNO=[$0], EXPR$1=[$1], EXPR$2=[$3]) > LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner]) > LogicalAggregate(group=[{0}], EXPR$1=[COUNT()]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > LogicalAggregate(group=[{0}], EXPR$2=[AVG($1)]) > LogicalAggregate(group=[{0, 1}]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} > The more efficient form should look like this > {code} > select emp.empno, count(*), avg(distinct dept.deptno) > from sales.emp emp inner join sales.dept dept > on emp.deptno = dept.deptno > group by emp.empno > LogicalAggregate(group=[{0}], EXPR$1=[SUM($2)], EXPR$2=[AVG($1)]) > LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-4771) Drill should avoid doing the same join twice if count(distinct) exists
[ https://issues.apache.org/jira/browse/DRILL-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497847#comment-15497847 ] Gautam Kumar Parai edited comment on DRILL-4771 at 9/17/16 12:51 AM: - I have created the pull request https://github.com/apache/drill/pull/588. [~amansinha100] [~jni] Can you please review the PR? Thanks! was (Author: gparai): I have created the pull request https://github.com/apache/drill/pull/588. [~amansinha100] Can you please review the PR? Thanks! > Drill should avoid doing the same join twice if count(distinct) exists > -- > > Key: DRILL-4771 > URL: https://issues.apache.org/jira/browse/DRILL-4771 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > > When the query has one distinct aggregate and one or more non-distinct > aggregates, the join instance need not produce the join-based plan. We can > generate multi-phase aggregates. Another approach would be to use grouping > sets. However, Drill is unable to support grouping sets and instead relies on > the join-based plan (see the plan below) > {code} > select emp.empno, count(*), avg(distinct dept.deptno) > from sales.emp emp inner join sales.dept dept > on emp.deptno = dept.deptno > group by emp.empno > LogicalProject(EMPNO=[$0], EXPR$1=[$1], EXPR$2=[$3]) > LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner]) > LogicalAggregate(group=[{0}], EXPR$1=[COUNT()]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > LogicalAggregate(group=[{0}], EXPR$2=[AVG($1)]) > LogicalAggregate(group=[{0, 1}]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} > The more efficient form should look like this > {code} > select emp.empno, count(*), avg(distinct dept.deptno) > from sales.emp emp inner join sales.dept dept > on emp.deptno = dept.deptno > group by emp.empno > LogicalAggregate(group=[{0}], EXPR$1=[SUM($2)], EXPR$2=[AVG($1)]) > LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4895) StreamingAggBatch code generation issues
Gautam Kumar Parai created DRILL-4895: - Summary: StreamingAggBatch code generation issues Key: DRILL-4895 URL: https://issues.apache.org/jira/browse/DRILL-4895 Project: Apache Drill Issue Type: Bug Affects Versions: 1.7.0 Reporter: Gautam Kumar Parai Assignee: Gautam Kumar Parai We unnecessarily re-generate the code for the StreamingAggBatch even without schema changes. Also, we seem to generate many holder variables than what maybe required. This also affects sub-classes. HashAggBatch does not have the same issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4895) StreamingAggBatch code generation issues
[ https://issues.apache.org/jira/browse/DRILL-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497884#comment-15497884 ] Gautam Kumar Parai commented on DRILL-4895: --- This is a potential performance issue - hence not critical. > StreamingAggBatch code generation issues > > > Key: DRILL-4895 > URL: https://issues.apache.org/jira/browse/DRILL-4895 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.7.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > > We unnecessarily re-generate the code for the StreamingAggBatch even without > schema changes. Also, we seem to generate many holder variables than what > maybe required. This also affects sub-classes. HashAggBatch does not have the > same issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4771) Drill should avoid doing the same join twice if count(distinct) exists
[ https://issues.apache.org/jira/browse/DRILL-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai resolved DRILL-4771. --- Resolution: Fixed Fix Version/s: 1.9.0 Closed with commit: 229571533bce1e37395d9675ea804ee97b1a2362 > Drill should avoid doing the same join twice if count(distinct) exists > -- > > Key: DRILL-4771 > URL: https://issues.apache.org/jira/browse/DRILL-4771 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Fix For: 1.9.0 > > > When the query has one distinct aggregate and one or more non-distinct > aggregates, the join instance need not produce the join-based plan. We can > generate multi-phase aggregates. Another approach would be to use grouping > sets. However, Drill is unable to support grouping sets and instead relies on > the join-based plan (see the plan below) > {code} > select emp.empno, count(*), avg(distinct dept.deptno) > from sales.emp emp inner join sales.dept dept > on emp.deptno = dept.deptno > group by emp.empno > LogicalProject(EMPNO=[$0], EXPR$1=[$1], EXPR$2=[$3]) > LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner]) > LogicalAggregate(group=[{0}], EXPR$1=[COUNT()]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > LogicalAggregate(group=[{0}], EXPR$2=[AVG($1)]) > LogicalAggregate(group=[{0, 1}]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} > The more efficient form should look like this > {code} > select emp.empno, count(*), avg(distinct dept.deptno) > from sales.emp emp inner join sales.dept dept > on emp.deptno = dept.deptno > group by emp.empno > LogicalAggregate(group=[{0}], EXPR$1=[SUM($2)], EXPR$2=[AVG($1)]) > LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()]) > LogicalProject(EMPNO=[$0], DEPTNO0=[$9]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4902) nested aggregate query does not complain about missing GROUP BY clause
[ https://issues.apache.org/jira/browse/DRILL-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai reassigned DRILL-4902: - Assignee: Gautam Kumar Parai > nested aggregate query does not complain about missing GROUP BY clause > -- > > Key: DRILL-4902 > URL: https://issues.apache.org/jira/browse/DRILL-4902 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.9.0 >Reporter: Khurram Faraaz >Assignee: Gautam Kumar Parai > Labels: window_function > > A nested aggregate windowed query does not report an error when the > partitioning column is not used in the GROUP BY clause. > Drill 1.9.0 > This is the correct expected behavior. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select count(max(c7)) over (partition by c8) > from `DRILL_4589`; > Error: VALIDATION ERROR: From line 1, column 42 to line 1, column 43: > Expression 'c8' is not being grouped > SQL Query null > [Error Id: 09c837b9-7a66-4a1f-9fbc-522160947274 on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} > The below query too should report above error, as the GROUP BY on > partitioning column is missing. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select count(max(c7)) over (partition by c8) > from (select * from `DRILL_4589`); > +-+ > | EXPR$0 | > +-+ > | 1 | > +-+ > 1 row selected (193.71 seconds) > {noformat} > Postgres 9.3 also reports an error for a similar query > {noformat} > postgres=# select count(max(c1)) over (partition by c2) from (select * from > t222) sub_query; > ERROR: column "sub_query.c2" must appear in the GROUP BY clause or be used > in an aggregate function > LINE 1: select count(max(c1)) over (partition by c2) from (select * ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-6093) Unneeded columns in Drill logical project
Gautam Kumar Parai created DRILL-6093: - Summary: Unneeded columns in Drill logical project Key: DRILL-6093 URL: https://issues.apache.org/jira/browse/DRILL-6093 Project: Apache Drill Issue Type: Bug Affects Versions: 1.12.0, 1.11.0 Reporter: Gautam Kumar Parai Assignee: Gautam Kumar Parai Fix For: 1.12.0 Here is an example query with the corresponding logical plan. The project contains unnecessary columns L_ORDERKEY, O_ORDERKEY in the projection even when it is not required by subsequent operators e.g. DrillJoinRel. EXPLAIN PLAN without implementation FOR SELECT L.L_QUANTITY FROM cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O WHERE cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int); *+--+--+* *|* *text* *|* *json* *|* *+--+--+* *|* DrillScreenRel DrillProjectRel(L_QUANTITY=[$1]) DrillJoinRel(condition=[=($2, $4)], joinType=[inner]) DrillProjectRel(L_ORDERKEY=[$0], L_QUANTITY=[$1], $f2=[CAST($0):INTEGER]) DrillScanRel(table=[[cp, tpch/lineitem.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_QUANTITY`]]]) DrillProjectRel(O_ORDERKEY=[$0], $f1=[CAST($0):INTEGER]) DrillScanRel(table=[[cp, tpch/orders.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`O_ORDERKEY`]]]) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6093) Unneeded columns in Drill logical project
[ https://issues.apache.org/jira/browse/DRILL-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-6093: -- Labels: ready-to-commit (was: ) > Unneeded columns in Drill logical project > - > > Key: DRILL-6093 > URL: https://issues.apache.org/jira/browse/DRILL-6093 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0, 1.12.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > > Here is an example query with the corresponding logical plan. The project > contains unnecessary columns L_ORDERKEY, O_ORDERKEY in the projection even > when it is not required by subsequent operators e.g. DrillJoinRel. > EXPLAIN PLAN without implementation FOR SELECT L.L_QUANTITY FROM > cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O WHERE > cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int); > *+--+--+* > *|* *text* *|* *json* *|* > *+--+--+* > *|* DrillScreenRel > DrillProjectRel(L_QUANTITY=[$1]) > DrillJoinRel(condition=[=($2, $4)], joinType=[inner]) > DrillProjectRel(L_ORDERKEY=[$0], L_QUANTITY=[$1], > $f2=[CAST($0):INTEGER]) > DrillScanRel(table=[[cp, tpch/lineitem.parquet]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=classpath:/tpch/lineitem.parquet]], > selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, > usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_QUANTITY`]]]) > DrillProjectRel(O_ORDERKEY=[$0], $f1=[CAST($0):INTEGER]) > DrillScanRel(table=[[cp, tpch/orders.parquet]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=classpath:/tpch/orders.parquet]], > selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, > usedMetadataFile=false, columns=[`O_ORDERKEY`]]]) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan
Gautam Kumar Parai created DRILL-6099: - Summary: Drill does not push limit past project (flatten) if it cannot be pushed into scan Key: DRILL-6099 URL: https://issues.apache.org/jira/browse/DRILL-6099 Project: Apache Drill Issue Type: Bug Affects Versions: 1.12.0 Reporter: Gautam Kumar Parai Assignee: Gautam Kumar Parai Fix For: 1.13.0 It would be useful to have pushdown occur past flatten(project). Here is an example to illustrate the issue: {{explain plan without implementation for }}{{select name, flatten(categories) as category from dfs.`/tmp/t_json_20` LIMIT 1;}} {{DrillScreenRel}}{{ }} {{ DrillLimitRel(fetch=[1])}}{{ }} {{ DrillProjectRel(name=[$0], category=[FLATTEN($1)])}} {{ DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, `categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}} = Content of 0_0_0.json = { "name" : "Eric Goldberg, MD", "categories" : [ "Doctors", "Health & Medical" ] } { "name" : "Pine Cone Restaurant", "categories" : [ "Restaurants" ] } { "name" : "Deforest Family Restaurant", "categories" : [ "American (Traditional)", "Restaurants" ] } { "name" : "Culver's", "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", "Restaurants" ] } { "name" : "Chang Jiang Chinese Kitchen", "categories" : [ "Chinese", "Restaurants" ] } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6093) Unneeded columns in Drill logical project
[ https://issues.apache.org/jira/browse/DRILL-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334634#comment-16334634 ] Gautam Kumar Parai commented on DRILL-6093: --- [~amansinha100] I see that you removed the ready-to-commit label. Do I need to address something in the PR? Please let me know. Thanks! > Unneeded columns in Drill logical project > - > > Key: DRILL-6093 > URL: https://issues.apache.org/jira/browse/DRILL-6093 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0, 1.12.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai >Priority: Major > Fix For: 1.13.0 > > > Here is an example query with the corresponding logical plan. The project > contains unnecessary columns L_ORDERKEY, O_ORDERKEY in the projection even > when it is not required by subsequent operators e.g. DrillJoinRel. > EXPLAIN PLAN without implementation FOR SELECT L.L_QUANTITY FROM > cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O WHERE > cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int); > *+--+--+* > *|* *text* *|* *json* *|* > *+--+--+* > *|* DrillScreenRel > DrillProjectRel(L_QUANTITY=[$1]) > DrillJoinRel(condition=[=($2, $4)], joinType=[inner]) > DrillProjectRel(L_ORDERKEY=[$0], L_QUANTITY=[$1], > $f2=[CAST($0):INTEGER]) > DrillScanRel(table=[[cp, tpch/lineitem.parquet]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=classpath:/tpch/lineitem.parquet]], > selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, > usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_QUANTITY`]]]) > DrillProjectRel(O_ORDERKEY=[$0], $f1=[CAST($0):INTEGER]) > DrillScanRel(table=[[cp, tpch/orders.parquet]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=classpath:/tpch/orders.parquet]], > selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, > usedMetadataFile=false, columns=[`O_ORDERKEY`]]]) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6093) Unneeded columns in Drill logical project
[ https://issues.apache.org/jira/browse/DRILL-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-6093: -- Labels: ready-to-commit (was: ) > Unneeded columns in Drill logical project > - > > Key: DRILL-6093 > URL: https://issues.apache.org/jira/browse/DRILL-6093 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0, 1.12.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > > Here is an example query with the corresponding logical plan. The project > contains unnecessary columns L_ORDERKEY, O_ORDERKEY in the projection even > when it is not required by subsequent operators e.g. DrillJoinRel. > EXPLAIN PLAN without implementation FOR SELECT L.L_QUANTITY FROM > cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O WHERE > cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int); > *+--+--+* > *|* *text* *|* *json* *|* > *+--+--+* > *|* DrillScreenRel > DrillProjectRel(L_QUANTITY=[$1]) > DrillJoinRel(condition=[=($2, $4)], joinType=[inner]) > DrillProjectRel(L_ORDERKEY=[$0], L_QUANTITY=[$1], > $f2=[CAST($0):INTEGER]) > DrillScanRel(table=[[cp, tpch/lineitem.parquet]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=classpath:/tpch/lineitem.parquet]], > selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, > usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_QUANTITY`]]]) > DrillProjectRel(O_ORDERKEY=[$0], $f1=[CAST($0):INTEGER]) > DrillScanRel(table=[[cp, tpch/orders.parquet]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=classpath:/tpch/orders.parquet]], > selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, > usedMetadataFile=false, columns=[`O_ORDERKEY`]]]) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6093) Unneeded columns in Drill logical project
[ https://issues.apache.org/jira/browse/DRILL-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335187#comment-16335187 ] Gautam Kumar Parai commented on DRILL-6093: --- [~amansinha100] I needed to update the testcase. The project only contains the required cols but was renamed to L_ORDERKEY0=CAST I was checking that L_ORDERKEY should be absent by the regex L_ORDERKEY.* I have modified it to L_ORDERKEY=.* Maybe something changed after the latest rebase since it passed earlier if I remember correctly. > Unneeded columns in Drill logical project > - > > Key: DRILL-6093 > URL: https://issues.apache.org/jira/browse/DRILL-6093 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0, 1.12.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > > Here is an example query with the corresponding logical plan. The project > contains unnecessary columns L_ORDERKEY, O_ORDERKEY in the projection even > when it is not required by subsequent operators e.g. DrillJoinRel. > EXPLAIN PLAN without implementation FOR SELECT L.L_QUANTITY FROM > cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O WHERE > cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int); > *+--+--+* > *|* *text* *|* *json* *|* > *+--+--+* > *|* DrillScreenRel > DrillProjectRel(L_QUANTITY=[$1]) > DrillJoinRel(condition=[=($2, $4)], joinType=[inner]) > DrillProjectRel(L_ORDERKEY=[$0], L_QUANTITY=[$1], > $f2=[CAST($0):INTEGER]) > DrillScanRel(table=[[cp, tpch/lineitem.parquet]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=classpath:/tpch/lineitem.parquet]], > selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, > usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_QUANTITY`]]]) > DrillProjectRel(O_ORDERKEY=[$0], $f1=[CAST($0):INTEGER]) > DrillScanRel(table=[[cp, tpch/orders.parquet]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=classpath:/tpch/orders.parquet]], > selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, > usedMetadataFile=false, columns=[`O_ORDERKEY`]]]) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6093) Unneeded columns in Drill logical project
[ https://issues.apache.org/jira/browse/DRILL-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335251#comment-16335251 ] Gautam Kumar Parai commented on DRILL-6093: --- [~arina] please consider this PR during the batch commit. Thanks! > Unneeded columns in Drill logical project > - > > Key: DRILL-6093 > URL: https://issues.apache.org/jira/browse/DRILL-6093 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.11.0, 1.12.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > > Here is an example query with the corresponding logical plan. The project > contains unnecessary columns L_ORDERKEY, O_ORDERKEY in the projection even > when it is not required by subsequent operators e.g. DrillJoinRel. > EXPLAIN PLAN without implementation FOR SELECT L.L_QUANTITY FROM > cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O WHERE > cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int); > *+--+--+* > *|* *text* *|* *json* *|* > *+--+--+* > *|* DrillScreenRel > DrillProjectRel(L_QUANTITY=[$1]) > DrillJoinRel(condition=[=($2, $4)], joinType=[inner]) > DrillProjectRel(L_ORDERKEY=[$0], L_QUANTITY=[$1], > $f2=[CAST($0):INTEGER]) > DrillScanRel(table=[[cp, tpch/lineitem.parquet]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=classpath:/tpch/lineitem.parquet]], > selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, > usedMetadataFile=false, columns=[`L_ORDERKEY`, `L_QUANTITY`]]]) > DrillProjectRel(O_ORDERKEY=[$0], $f1=[CAST($0):INTEGER]) > DrillScanRel(table=[[cp, tpch/orders.parquet]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=classpath:/tpch/orders.parquet]], > selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, > usedMetadataFile=false, columns=[`O_ORDERKEY`]]]) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan
[ https://issues.apache.org/jira/browse/DRILL-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai updated DRILL-6099: -- Labels: ready-to-commit (was: ) [~ben-zvi] please consider this PR (https://github.com/apache/drill/pull/1096) for the batch commit. > Drill does not push limit past project (flatten) if it cannot be pushed into > scan > - > > Key: DRILL-6099 > URL: https://issues.apache.org/jira/browse/DRILL-6099 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.12.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > It would be useful to have pushdown occur past flatten(project). Here is an > example to illustrate the issue: > {{explain plan without implementation for }}{{select name, > flatten(categories) as category from dfs.`/tmp/t_json_20` LIMIT 1;}} > {{DrillScreenRel}}{{ }} > {{ DrillLimitRel(fetch=[1])}}{{ }} > {{ DrillProjectRel(name=[$0], category=[FLATTEN($1)])}} > {{ DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan > [selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, > `categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}} > = > Content of 0_0_0.json > = > { > "name" : "Eric Goldberg, MD", > "categories" : [ "Doctors", "Health & Medical" ] > } { > "name" : "Pine Cone Restaurant", > "categories" : [ "Restaurants" ] > } { > "name" : "Deforest Family Restaurant", > "categories" : [ "American (Traditional)", "Restaurants" ] > } { > "name" : "Culver's", > "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", > "Restaurants" ] > } { > "name" : "Chang Jiang Chinese Kitchen", > "categories" : [ "Chinese", "Restaurants" ] > } > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan
[ https://issues.apache.org/jira/browse/DRILL-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379154#comment-16379154 ] Gautam Kumar Parai commented on DRILL-6099: --- [~priteshm] no I did not get a chance to address them yet. I will take a look. > Drill does not push limit past project (flatten) if it cannot be pushed into > scan > - > > Key: DRILL-6099 > URL: https://issues.apache.org/jira/browse/DRILL-6099 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.12.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai >Priority: Major > Fix For: 1.13.0 > > > It would be useful to have pushdown occur past flatten(project). Here is an > example to illustrate the issue: > {{explain plan without implementation for }}{{select name, > flatten(categories) as category from dfs.`/tmp/t_json_20` LIMIT 1;}} > {{DrillScreenRel}}{{ }} > {{ DrillLimitRel(fetch=[1])}}{{ }} > {{ DrillProjectRel(name=[$0], category=[FLATTEN($1)])}} > {{ DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan > [selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, > `categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}} > = > Content of 0_0_0.json > = > { > "name" : "Eric Goldberg, MD", > "categories" : [ "Doctors", "Health & Medical" ] > } { > "name" : "Pine Cone Restaurant", > "categories" : [ "Restaurants" ] > } { > "name" : "Deforest Family Restaurant", > "categories" : [ "American (Traditional)", "Restaurants" ] > } { > "name" : "Culver's", > "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", > "Restaurants" ] > } { > "name" : "Chang Jiang Chinese Kitchen", > "categories" : [ "Chinese", "Restaurants" ] > } > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6226) Projection pushdown does not occur with Calcite upgrade
Gautam Kumar Parai created DRILL-6226: - Summary: Projection pushdown does not occur with Calcite upgrade Key: DRILL-6226 URL: https://issues.apache.org/jira/browse/DRILL-6226 Project: Apache Drill Issue Type: Bug Affects Versions: 1.13.0 Reporter: Gautam Kumar Parai Assignee: Arina Ielchiieva Fix For: 1.13.0 I am seeing plan issues where projection pushdown does not occur due to costing. The root cause is in DrillScanRel we are relying on utility functions to determine STAR columns which fails with the Calcite upgrade - because GroupScan is using GroupScan.ALL_COLUMNS which is different from STAR_COLUMN/DYNAMIC_STAR. While looking at references of SchemaPath.STAR_COLUMN, we found comparisons involving hardcoded '*'. These might also need to change to accommodate for DYNAMIC STAR. [~arina] [~amansinha100] please let me know your thoughts on this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6226) Projection pushdown does not occur with Calcite upgrade
[ https://issues.apache.org/jira/browse/DRILL-6226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393259#comment-16393259 ] Gautam Kumar Parai commented on DRILL-6226: --- [~arina] I verified this is present in master. Thanks! I will close this Jira as not a bug. > Projection pushdown does not occur with Calcite upgrade > --- > > Key: DRILL-6226 > URL: https://issues.apache.org/jira/browse/DRILL-6226 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Gautam Kumar Parai >Priority: Major > Fix For: 1.13.0 > > > I am seeing plan issues where projection pushdown does not occur due to > costing. The root cause is in DrillScanRel we are relying on utility > functions to determine STAR columns which fails with the Calcite upgrade - > because GroupScan is using GroupScan.ALL_COLUMNS which is different from > STAR_COLUMN/DYNAMIC_STAR. > While looking at references of SchemaPath.STAR_COLUMN, we found comparisons > involving hardcoded '*'. These might also need to change to accommodate for > DYNAMIC STAR. > [~arina] [~amansinha100] please let me know your thoughts on this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6226) Projection pushdown does not occur with Calcite upgrade
[ https://issues.apache.org/jira/browse/DRILL-6226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai closed DRILL-6226. - Resolution: Not A Bug > Projection pushdown does not occur with Calcite upgrade > --- > > Key: DRILL-6226 > URL: https://issues.apache.org/jira/browse/DRILL-6226 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Gautam Kumar Parai >Priority: Major > Fix For: 1.13.0 > > > I am seeing plan issues where projection pushdown does not occur due to > costing. The root cause is in DrillScanRel we are relying on utility > functions to determine STAR columns which fails with the Calcite upgrade - > because GroupScan is using GroupScan.ALL_COLUMNS which is different from > STAR_COLUMN/DYNAMIC_STAR. > While looking at references of SchemaPath.STAR_COLUMN, we found comparisons > involving hardcoded '*'. These might also need to change to accommodate for > DYNAMIC STAR. > [~arina] [~amansinha100] please let me know your thoughts on this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6260) Query fails with "ERROR: Non-scalar sub-query used in an expression" when it contains a cast expression around a scalar sub-query
[ https://issues.apache.org/jira/browse/DRILL-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402565#comment-16402565 ] Gautam Kumar Parai commented on DRILL-6260: --- [~hanu.ncr] did not realize you had reassigned it to yourself. I analyzed it a bit - Drill throws the error when it `visits` the Calcite logical op tree and finds a LogicalAggregate which has SqlSingleValueAggFunction. However, we may need to change the visitor to keep going down the tree to find one which is not. The code is in PreProcessLogicalRel.java:visit(LogicalAggregate aggregate). Hope this helps! > Query fails with "ERROR: Non-scalar sub-query used in an expression" when it > contains a cast expression around a scalar sub-query > -- > > Key: DRILL-6260 > URL: https://issues.apache.org/jira/browse/DRILL-6260 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.13.0, 1.14.0 > Environment: git Commit ID: dd4a46a6c57425284a2b8c68676357f947e01988 > git Commit Message: Update version to 1.14.0-SNAPSHOT >Reporter: Abhishek Girish >Assignee: Hanumath Rao Maduri >Priority: Major > > {code} > > explain plan for SELECT T1.b FROM `t1.json` T1 WHERE T1.a = (SELECT > > cast(max(T2.a) as varchar) FROM `t2.json` T2); > Error: UNSUPPORTED_OPERATION ERROR: Non-scalar sub-query used in an expression > See Apache Drill JIRA: DRILL-1937 > {code} > Slightly different variants of the query work fine. > {code} > > explain plan for SELECT T1.b FROM `t1.json` T1 WHERE T1.a = (SELECT > > max(cast(T2.a as varchar)) FROM `t2.json` T2); > 00-00 Screen > 00-01 Project(b=[$0]) > 00-02 Project(b=[$1]) > 00-03 SelectionVectorRemover > 00-04 Filter(condition=[=($0, $2)]) > 00-05 NestedLoopJoin(condition=[true], joinType=[left]) > 00-07 Scan(table=[[si, tmp, t1.json]], > groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, > columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]]) > 00-06 StreamAgg(group=[{}], EXPR$0=[MAX($0)]) > 00-08 Project($f0=[CAST($0):VARCHAR(65535) CHARACTER SET > "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"]) > 00-09 Scan(table=[[si, tmp, t2.json]], > groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, > columns=[`a`], files=[maprfs:///tmp/t2.json]]]){code} > {code} > > explain plan for SELECT T1.b FROM `t1.json` T1 WHERE T1.a = (SELECT > > max(T2.a) FROM `t2.json` T2); > 00-00Screen > 00-01 Project(b=[$0]) > 00-02Project(b=[$1]) > 00-03 SelectionVectorRemover > 00-04Filter(condition=[=($0, $2)]) > 00-05 NestedLoopJoin(condition=[true], joinType=[left]) > 00-07Scan(table=[[si, tmp, t1.json]], > groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t1.json, numFiles=1, > columns=[`a`, `b`], files=[maprfs:///tmp/t1.json]]]) > 00-06StreamAgg(group=[{}], EXPR$0=[MAX($0)]) > 00-08 Scan(table=[[si, tmp, t2.json]], > groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/t2.json, numFiles=1, > columns=[`a`], files=[maprfs:///tmp/t2.json]]]) > {code} > File contents: > {code} > # cat t1.json > {"a":1, "b":"V"} > {"a":2, "b":"W"} > {"a":3, "b":"X"} > {"a":4, "b":"Y"} > {"a":5, "b":"Z"} > # cat t2.json > {"a":1, "b":"A"} > {"a":2, "b":"B"} > {"a":3, "b":"C"} > {"a":4, "b":"D"} > {"a":5, "b":"E"} > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-4326) JDBC Storage Plugin for PostgreSQL does not work
[ https://issues.apache.org/jira/browse/DRILL-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai reassigned DRILL-4326: - Assignee: Gautam Kumar Parai > JDBC Storage Plugin for PostgreSQL does not work > > > Key: DRILL-4326 > URL: https://issues.apache.org/jira/browse/DRILL-4326 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 1.3.0, 1.4.0, 1.5.0 > Environment: Mac OS X JDK 1.8 PostgreSQL 9.4.4 PostgreSQL JDBC jars > (postgresql-9.2-1004-jdbc4.jar, postgresql-9.1-901-1.jdbc4.jar, ) >Reporter: Akon Dey >Assignee: Gautam Kumar Parai >Priority: Major > Labels: doc-impacting > Fix For: 1.5.0, 1.14.0 > > > Queries with the JDBC Storage Plugin for PostgreSQL fail with DATA_READ ERROR. > The JDBC Storage Plugin settings in use are: > {code} > { > "type": "jdbc", > "driver": "org.postgresql.Driver", > "url": "jdbc:postgresql://127.0.0.1/test", > "username": "akon", > "password": null, > "enabled": false > } > {code} > Please refer to the following stack for further details: > {noformat} > Akons-MacBook-Pro:drill akon$ > ./distribution/target/apache-drill-1.5.0-SNAPSHOT/apache-drill-1.5.0-SNAPSHOT/bin/drill-embedded > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; > support was removed in 8.0 > Jan 29, 2016 9:17:18 AM org.glassfish.jersey.server.ApplicationHandler > initialize > INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 > 01:25:26... > apache drill 1.5.0-SNAPSHOT > "a little sql for your nosql" > 0: jdbc:drill:zk=local> !verbose > verbose: on > 0: jdbc:drill:zk=local> use pgdb; > +---+---+ > | ok | summary | > +---+---+ > | true | Default schema changed to [pgdb] | > +---+---+ > 1 row selected (0.753 seconds) > 0: jdbc:drill:zk=local> select * from ips; > Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the > SQL query. > sql SELECT * > FROM "test"."ips" > plugin pgdb > Fragment 0:0 > [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010] > (state=,code=0) > java.sql.SQLException: DATA_READ ERROR: The JDBC storage plugin failed while > trying setup the SQL query. > sql SELECT * > FROM "test"."ips" > plugin pgdb > Fragment 0:0 > [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010] > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247) > at > org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1923) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:73) > at > net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404) > at > net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351) > at > net.hydromatic.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:338) > at > net.hydromatic.avatica.AvaticaStatement.execute(AvaticaStatement.java:69) > at > org.apache.drill.jdbc.impl.DrillStatementImpl.execute(DrillStatementImpl.java:101) > at sqlline.Commands.execute(Commands.java:841) > at sqlline.Commands.sql(Commands.java:751) > at sqlline.SqlLine.dispatch(SqlLine.java:746) > at sqlline.SqlLine.begin(SqlLine.java:621) > at sqlline.SqlLine.start(SqlLine.java:375) > at sqlline.SqlLine.main(SqlLine.java:268) > Caused by: org.apache.drill.common.exceptions.UserRemoteException: DATA_READ > ERROR: The JDBC storage plugin failed while trying setup the SQL query. > sql SELECT * > FROM "test"."ips" > plugin pgdb > Fragment 0:0 > [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010] > at > org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119) > at > org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113) > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46) > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31) > at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67) > at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374) > at > org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89) > at > org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252) > at > org.apache.drill.common.SerializedExecutor.execute(Se
[jira] [Assigned] (DRILL-4326) JDBC Storage Plugin for PostgreSQL does not work
[ https://issues.apache.org/jira/browse/DRILL-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai reassigned DRILL-4326: - Assignee: (was: Gautam Kumar Parai) > JDBC Storage Plugin for PostgreSQL does not work > > > Key: DRILL-4326 > URL: https://issues.apache.org/jira/browse/DRILL-4326 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 1.3.0, 1.4.0, 1.5.0 > Environment: Mac OS X JDK 1.8 PostgreSQL 9.4.4 PostgreSQL JDBC jars > (postgresql-9.2-1004-jdbc4.jar, postgresql-9.1-901-1.jdbc4.jar, ) >Reporter: Akon Dey >Priority: Major > Labels: doc-impacting > Fix For: 1.5.0, 1.14.0 > > > Queries with the JDBC Storage Plugin for PostgreSQL fail with DATA_READ ERROR. > The JDBC Storage Plugin settings in use are: > {code} > { > "type": "jdbc", > "driver": "org.postgresql.Driver", > "url": "jdbc:postgresql://127.0.0.1/test", > "username": "akon", > "password": null, > "enabled": false > } > {code} > Please refer to the following stack for further details: > {noformat} > Akons-MacBook-Pro:drill akon$ > ./distribution/target/apache-drill-1.5.0-SNAPSHOT/apache-drill-1.5.0-SNAPSHOT/bin/drill-embedded > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; > support was removed in 8.0 > Jan 29, 2016 9:17:18 AM org.glassfish.jersey.server.ApplicationHandler > initialize > INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 > 01:25:26... > apache drill 1.5.0-SNAPSHOT > "a little sql for your nosql" > 0: jdbc:drill:zk=local> !verbose > verbose: on > 0: jdbc:drill:zk=local> use pgdb; > +---+---+ > | ok | summary | > +---+---+ > | true | Default schema changed to [pgdb] | > +---+---+ > 1 row selected (0.753 seconds) > 0: jdbc:drill:zk=local> select * from ips; > Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the > SQL query. > sql SELECT * > FROM "test"."ips" > plugin pgdb > Fragment 0:0 > [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010] > (state=,code=0) > java.sql.SQLException: DATA_READ ERROR: The JDBC storage plugin failed while > trying setup the SQL query. > sql SELECT * > FROM "test"."ips" > plugin pgdb > Fragment 0:0 > [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010] > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247) > at > org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1923) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:73) > at > net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404) > at > net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351) > at > net.hydromatic.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:338) > at > net.hydromatic.avatica.AvaticaStatement.execute(AvaticaStatement.java:69) > at > org.apache.drill.jdbc.impl.DrillStatementImpl.execute(DrillStatementImpl.java:101) > at sqlline.Commands.execute(Commands.java:841) > at sqlline.Commands.sql(Commands.java:751) > at sqlline.SqlLine.dispatch(SqlLine.java:746) > at sqlline.SqlLine.begin(SqlLine.java:621) > at sqlline.SqlLine.start(SqlLine.java:375) > at sqlline.SqlLine.main(SqlLine.java:268) > Caused by: org.apache.drill.common.exceptions.UserRemoteException: DATA_READ > ERROR: The JDBC storage plugin failed while trying setup the SQL query. > sql SELECT * > FROM "test"."ips" > plugin pgdb > Fragment 0:0 > [Error Id: 26ada06d-e08d-456a-9289-0dec2089b018 on 10.200.104.128:31010] > at > org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119) > at > org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113) > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46) > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31) > at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67) > at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374) > at > org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89) > at > org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252) > at > org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123) >
[jira] [Commented] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637491#comment-16637491 ] Gautam Kumar Parai commented on DRILL-786: -- Yes, option 3 makes the most sense. The default value (TRUE) of the option serves as a defensive check. When the user sets it to FALSE they know what they are getting into rather than Drill surprising them. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Sets: > Set#22, type: (DrillRecordRow[*, age, name, studentnum]) > rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, > importance=0.59049001 > rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), > rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0
[jira] [Comment Edited] (DRILL-786) Implement CROSS JOIN
[ https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637491#comment-16637491 ] Gautam Kumar Parai edited comment on DRILL-786 at 10/3/18 8:51 PM: --- Yes, option 3 makes the most sense given the alternatives. The default value (TRUE) of the option serves as a defensive check. When the user sets it to FALSE they know what they are getting into rather than Drill surprising them. was (Author: gparai): Yes, option 3 makes the most sense. The default value (TRUE) of the option serves as a defensive check. When the user sets it to FALSE they know what they are getting into rather than Drill surprising them. > Implement CROSS JOIN > > > Key: DRILL-786 > URL: https://issues.apache.org/jira/browse/DRILL-786 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization >Reporter: Krystal >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.15.0 > > > git.commit.id.abbrev=5d7e3d3 > 0: jdbc:drill:schema=dfs> select student.name, student.age, > student.studentnum from student cross join voter where student.age = 20 and > voter.age = 20; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2" > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0.0 io, 0.0 network}, id = 140 > Stack trace: > org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node > [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; > planner state: > Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[] > Original rel: > AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], > convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): > rowcount = 22500.0, cumulative cost = {inf}, id = 320 > DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = > 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id > = 316 > DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], > age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 > rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314 > DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], > condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = > {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312 > DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307 > DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], > table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 4000.0 cpu, 0.0 io, 0.0 network}, id = 129 > DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], > condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = > {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310 > DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], > table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, > 2000.0 cpu, 0
[jira] [Created] (DRILL-6375) ANY_VALUE aggregate function
Gautam Kumar Parai created DRILL-6375: - Summary: ANY_VALUE aggregate function Key: DRILL-6375 URL: https://issues.apache.org/jira/browse/DRILL-6375 Project: Apache Drill Issue Type: New Feature Components: Functions - Drill Affects Versions: 1.13.0 Reporter: Gautam Kumar Parai Assignee: Gautam Kumar Parai Fix For: 1.14.0 We had discussions on the Apache Calcite [1] and Apache Drill [2] mailing lists regarding an equivalent for DISTINCT ON. The community seems to prefer the ANY_VALUE. This Jira is a placeholder for implementing the ANY_VALUE aggregate function in Apache Drill. We should also eventually contribute it to Apache Calcite. [1]https://lists.apache.org/thread.html/f2007a489d3a5741875bcc8a1edd8d5c3715e5114ac45058c3b3a42d@%3Cdev.calcite.apache.org%3E [2]https://lists.apache.org/thread.html/2517eef7410aed4e88b9515f7e4256335215c1ad39a2676a08d21cb9@%3Cdev.drill.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-4525) Query with BETWEEN clause on Date and Timestamp values fails with Validation Error
[ https://issues.apache.org/jira/browse/DRILL-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai reassigned DRILL-4525: - Assignee: Gautam Kumar Parai (was: Bohdan Kazydub) > Query with BETWEEN clause on Date and Timestamp values fails with Validation > Error > -- > > Key: DRILL-4525 > URL: https://issues.apache.org/jira/browse/DRILL-4525 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Reporter: Abhishek Girish >Assignee: Gautam Kumar Parai >Priority: Critical > Fix For: 1.14.0 > > > Query: (simplified variant of TPC-DS Query37) > {code} > SELECT >* > FROM >date_dim > WHERE >d_date BETWEEN Cast('1999-03-06' AS DATE) AND ( > Cast('1999-03-06' AS DATE) + INTERVAL '60' day) > LIMIT 10; > {code} > Error: > {code} > Error: VALIDATION ERROR: From line 6, column 8 to line 7, column 64: Cannot > apply 'BETWEEN ASYMMETRIC' to arguments of type ' BETWEEN ASYMMETRIC > AND '. Supported form(s): ' BETWEEN > AND ' > SQL Query null > [Error Id: 223fb37c-f561-4a37-9283-871dc6f4d6d0 on abhi2:31010] > (state=,code=0) > {code} > This is a regression from 1.6.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-4525) Query with BETWEEN clause on Date and Timestamp values fails with Validation Error
[ https://issues.apache.org/jira/browse/DRILL-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai reassigned DRILL-4525: - Assignee: Bohdan Kazydub (was: Gautam Kumar Parai) > Query with BETWEEN clause on Date and Timestamp values fails with Validation > Error > -- > > Key: DRILL-4525 > URL: https://issues.apache.org/jira/browse/DRILL-4525 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Reporter: Abhishek Girish >Assignee: Bohdan Kazydub >Priority: Critical > Fix For: 1.14.0 > > > Query: (simplified variant of TPC-DS Query37) > {code} > SELECT >* > FROM >date_dim > WHERE >d_date BETWEEN Cast('1999-03-06' AS DATE) AND ( > Cast('1999-03-06' AS DATE) + INTERVAL '60' day) > LIMIT 10; > {code} > Error: > {code} > Error: VALIDATION ERROR: From line 6, column 8 to line 7, column 64: Cannot > apply 'BETWEEN ASYMMETRIC' to arguments of type ' BETWEEN ASYMMETRIC > AND '. Supported form(s): ' BETWEEN > AND ' > SQL Query null > [Error Id: 223fb37c-f561-4a37-9283-871dc6f4d6d0 on abhi2:31010] > (state=,code=0) > {code} > This is a regression from 1.6.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-3964) CTAS fails with NPE when source JSON file is empty
[ https://issues.apache.org/jira/browse/DRILL-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai reassigned DRILL-3964: - Assignee: Gautam Kumar Parai > CTAS fails with NPE when source JSON file is empty > -- > > Key: DRILL-3964 > URL: https://issues.apache.org/jira/browse/DRILL-3964 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.2.0 >Reporter: Abhishek Girish >Assignee: Gautam Kumar Parai >Priority: Major > > {code:sql} > CREATE TABLE `complex.json` AS > SELECT id, > gbyi, > gbyt, > fl, > nul, > bool, > str, > sia, > sfa, > soa, > ooa, > oooi, > ooof, > ooos, > oooa > FROM dfs.`/drill/testdata/complex/json/complex.json`; > Error: SYSTEM ERROR: NullPointerException > Fragment 0:0 > [Error Id: 97679667-412a-475f-aebf-e935405c7330 on drill-democ1:31010] > (state=,code=0) > {code} > {code:sql} > > select * from dfs.`/drill/testdata/complex/json/complex.json` limit 1; > +--+ > | | > +--+ > +--+ > No rows selected (0.295 seconds) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-3539) CTAS over empty json file throws NPE
[ https://issues.apache.org/jira/browse/DRILL-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gautam Kumar Parai reassigned DRILL-3539: - Assignee: Gautam Kumar Parai > CTAS over empty json file throws NPE > > > Key: DRILL-3539 > URL: https://issues.apache.org/jira/browse/DRILL-3539 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 >Reporter: Khurram Faraaz >Assignee: Gautam Kumar Parai >Priority: Major > Fix For: Future > > > CTAS over empty JSON file results in NPE. > {code} > 0: jdbc:drill:schema=dfs.tmp> create table t45645 as select * from > `empty.json`; > Error: SYSTEM ERROR: NullPointerException > Fragment 0:0 > [Error Id: 79039288-5402-4b0a-b32d-5bf5024f3b71 on centos-02.qa.lab:31010] > (state=,code=0) > {code} > Stack trace from drillbit.log > {code} > 2015-07-22 00:34:03,788 [2a511b03-90b3-1d39-f4e3-cfd754aa085f:frag:0:0] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException > Fragment 0:0 > [Error Id: 79039288-5402-4b0a-b32d-5bf5024f3b71 on centos-02.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > NullPointerException > Fragment 0:0 > [Error Id: 79039288-5402-4b0a-b32d-5bf5024f3b71 on centos-02.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523) > ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.physical.impl.WriterRecordBatch.addOutputContainerData(WriterRecordBatch.java:133) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:126) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:79) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252) > ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at java.security.AccessController.doPrivileged(Native Method) > ~[na:1.7.0_45] > at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_45] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > ~[hadoop-common-2.5.1-mapr-1503.