[jira] [Closed] (DRILL-4652) C++ client build breaks when trying to include commit messages with quotes
[ https://issues.apache.org/jira/browse/DRILL-4652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Khatua closed DRILL-4652. --- Build issue that has been verified with a successful build of C++ client. > C++ client build breaks when trying to include commit messages with quotes > -- > > Key: DRILL-4652 > URL: https://issues.apache.org/jira/browse/DRILL-4652 > Project: Apache Drill > Issue Type: Bug >Reporter: Parth Chandra > > The C++ client build generates a string based on git commit info to print to > the log at startup time. This breaks if the commit message has quotes since > the embedded quotes are not escaped. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5399) Random Error : Flatten does not support inputs of non-list values.
[ https://issues.apache.org/jira/browse/DRILL-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951849#comment-15951849 ] Rahul Challapalli commented on DRILL-5399: -- One more instance Query : {code} select id, flatten(kvgen(m)) from `json_kvgenflatten/missing-map.json` {code} Data: {code} { "id": 1, "m": {"a":1,"b":2} } { "id": 2 } { "id": 3, "m": {"c":3,"d":4} } {code} Plan : {code} 00-00Screen : rowType = RecordType(ANY id, ANY EXPR$1): rowcount = 1.0, cumulative cost = {2.1 rows, 5.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 761 00-01 Project(id=[$0], EXPR$1=[$3]) : rowType = RecordType(ANY id, ANY EXPR$1): rowcount = 1.0, cumulative cost = {2.0 rows, 5.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 760 00-02Flatten(flattenField=[$3]) : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, ANY EXPR$3, ANY EXPR$4): rowcount = 1.0, cumulative cost = {2.0 rows, 5.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 759 00-03 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$3=[$2], EXPR$4=[$2]) : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, ANY EXPR$3, ANY EXPR$4): rowcount = 1.0, cumulative cost = {1.0 rows, 4.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 758 00-04Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$3=[KVGEN($1)]) : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, ANY EXPR$3): rowcount = 1.0, cumulative cost = {1.0 rows, 4.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 757 00-05 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/json_kvgenflatten/missing-map.json, numFiles=1, columns=[`id`, `m`], files=[maprfs:///drill/testdata/json_kvgenflatten/missing-map.json]]]) : rowType = RecordType(ANY id, ANY m): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 756 {code} And in the logs, I see this warning. Looks like we are failing while setting up the new schema in the project operator. Could the json reader possibly be messing it up? {code} 2017-03-31 15:27:55,863 [27212813-c2fa-204a-2971-015ea610ad67:frag:0:0] WARN o.a.d.e.e.ExpressionTreeMaterializer - Unable to find value vector of path `EXPR$3`, returning null instance. {code} > Random Error : Flatten does not support inputs of non-list values. > -- > > Key: DRILL-5399 > URL: https://issues.apache.org/jira/browse/DRILL-5399 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types, Storage - JSON >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli > > git.commit.id.abbrev=38ef562 > The below query did not fail when I ran it in isolation. However when I ran > the test suite at [1], which also contains the below query, by using 50 > threads submitting queries concurrently, I hit the below error. > {code} > select flatten(sub.fk.`value`) from (select flatten(kvgen(map)) fk from > `json_kvgenflatten/nested3.json`) sub > Failed with exception > java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Flatten does not support > inputs of non-list values. > Fragment 0:0 > [Error Id: 90026283-0b95-4bda-948e-54ed57a62edf on qa-node183.qa.lab:31010] > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489) > at > org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61) > at > oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473) > at > org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100) > at > oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477) > at > org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:180) > at > oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109) > at > oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130) > at > org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112) > at > org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:177) > at > org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at
[jira] [Commented] (DRILL-3474) Add implicit file columns support
[ https://issues.apache.org/jira/browse/DRILL-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951778#comment-15951778 ] Bridget Bevens commented on DRILL-3474: --- Link to doc: http://drill.apache.org/docs/querying-a-file-system-introduction/#implicit-columns > Add implicit file columns support > - > > Key: DRILL-3474 > URL: https://issues.apache.org/jira/browse/DRILL-3474 > Project: Apache Drill > Issue Type: New Feature > Components: Metadata >Affects Versions: 1.1.0 >Reporter: Jim Scott >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.7.0 > > > I could not find another ticket which talks about this ... > The file name should be a column which can be selected or filtered when > querying a directory just like dir0, dir1 are available. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-4604) Generate warning on Web UI if drillbits version mismatch is detected
[ https://issues.apache.org/jira/browse/DRILL-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951774#comment-15951774 ] Bridget Bevens commented on DRILL-4604: --- Link to doc: http://drill.apache.org/docs/identifying-multiple-drill-versions-in-a-cluster/ > Generate warning on Web UI if drillbits version mismatch is detected > > > Key: DRILL-4604 > URL: https://issues.apache.org/jira/browse/DRILL-4604 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting, ready-to-commit > Fix For: 1.10.0 > > Attachments: index_page.JPG, index_page_mismatch.JPG, > NEW_matching_drillbits.JPG, NEW_mismatching_drillbits.JPG, > screenshots_with_different_states.docx > > > Display drillbit version on web UI. If any of drillbits version doesn't match > with current drillbit, generate warning. > Screenshots - NEW_matching_drillbits.JPG, NEW_mismatching_drillbits.JPG -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5098) Improving fault tolerance for connection between client and foreman node.
[ https://issues.apache.org/jira/browse/DRILL-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951773#comment-15951773 ] Bridget Bevens commented on DRILL-5098: --- Link to doc:http://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > Improving fault tolerance for connection between client and foreman node. > - > > Key: DRILL-5098 > URL: https://issues.apache.org/jira/browse/DRILL-5098 > Project: Apache Drill > Issue Type: Improvement > Components: Client - JDBC >Reporter: Sorabh Hamirwasia >Assignee: Sorabh Hamirwasia > Labels: doc-impacting, ready-to-commit > Fix For: 1.10.0 > > > With DRILL-5015 we allowed support for specifying multiple Drillbits in > connection string and randomly choosing one out of it. Over time some of the > Drillbits specified in the connection string may die and the client can fail > to connect to Foreman node if random selection happens to be of dead Drillbit. > Even if ZooKeeper is used for selecting a random Drillbit from the registered > one there is a small window when client selects one Drillbit and then that > Drillbit went down. The client will fail to connect to this Drillbit and > error out. > Instead if we try multiple Drillbits (configurable tries count through > connection string) then the probability of hitting this error window will > reduce in both the cases improving fault tolerance. During further > investigation it was also found that if there is Authentication failure then > we throw that error as generic RpcException. We need to improve that as well > to capture this case explicitly since in case of Auth failure we don't want > to try multiple Drillbits. > Connection string example with new parameter: > jdbc:drill:drillbit=[:][,[:]...;tries=5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly
[ https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951770#comment-15951770 ] Bridget Bevens commented on DRILL-4203: --- Link to doc: http://drill.apache.org/docs/parquet-format/#date-value-auto-correction > Parquet File : Date is stored wrongly > - > > Key: DRILL-4203 > URL: https://issues.apache.org/jira/browse/DRILL-4203 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Stéphane Trou >Assignee: Vitalii Diravka >Priority: Critical > Labels: doc-impacting > Fix For: 1.9.0 > > > Hello, > I have some problems when i try to read parquet files produce by drill with > Spark, all dates are corrupted. > I think the problem come from drill :) > {code} > cat /tmp/date_parquet.csv > Epoch,1970-01-01 > {code} > {code} > 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) > as epoch_date from dfs.tmp.`date_parquet.csv`; > ++-+ > | name | epoch_date | > ++-+ > | Epoch | 1970-01-01 | > ++-+ > {code} > {code} > 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select > columns[0] as name, cast(columns[1] as date) as epoch_date from > dfs.tmp.`date_parquet.csv`; > +---++ > | Fragment | Number of records written | > +---++ > | 0_0 | 1 | > +---++ > {code} > When I read the file with parquet tools, i found > {code} > java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/ > name = Epoch > epoch_date = 4881176 > {code} > According to > [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], > epoch_date should be equals to 0. > Meta : > {code} > java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/ > file:file:/tmp/buggy_parquet/0_0_0.parquet > creator: parquet-mr version 1.8.1-drill-r0 (build > 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) > extra: drill.version = 1.4.0 > file schema: root > > name:OPTIONAL BINARY O:UTF8 R:0 D:1 > epoch_date: OPTIONAL INT32 O:DATE R:0 D:1 > row group 1: RC:1 TS:93 OFFSET:4 > > name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 > ENC:RLE,BIT_PACKED,PLAIN > epoch_date: INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 > ENC:RLE,BIT_PACKED,PLAIN > {code} > Implementation: > After the fix Drill can automatically determine date corruption in parquet > files > and convert it to correct values. > For the reason, when the user want to work with the dates over the 5 000 > years, > an option is included to turn off the auto-correction. > Use of this option is assumed to be extremely unlikely, but it is included for > completeness. > To disable "auto correction" you should use the parquet config in the plugin > settings. Something like this: > {code} > "formats": { > "parquet": { > "type": "parquet", > "autoCorrectCorruptDates": false > } > {code} > Or you can try to use the query like this: > {code} > select l_shipdate, l_commitdate from > table(dfs.`/drill/testdata/parquet_date/dates_nodrillversion/drillgen2_lineitem` > > (type => 'parquet', autoCorrectCorruptDates => false)) limit 1; > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet
[ https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951764#comment-15951764 ] Bridget Bevens commented on DRILL-4373: --- Link to doc: http://drill.apache.org/docs/parquet-format/#about-int96-support > Drill and Hive have incompatible timestamp representations in parquet > - > > Key: DRILL-4373 > URL: https://issues.apache.org/jira/browse/DRILL-4373 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Parquet >Affects Versions: 1.8.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Labels: doc-impacting > Fix For: 1.10.0 > > > git.commit.id.abbrev=83d460c > I created a parquet file with a timestamp type using Drill. Now if I define a > hive table on top of the parquet file and use "timestamp" as the column type, > drill fails to read the hive table through the hive storage plugin > Implementation: > Added int96 to timestamp converter for both parquet readers and controling it > by system / session option "store.parquet.int96_as_timestamp". > The value of the option is false by default for the proper work of the old > query scripts with the "convert_from TIMESTAMP_IMPALA" function. > When the option is true using of that function is unnesessary and can lead to > the query fail. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5031) Documentation for HTTPD Parser
[ https://issues.apache.org/jira/browse/DRILL-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951703#comment-15951703 ] Bridget Bevens commented on DRILL-5031: --- Moved this content based on feedback from Abhishek Girish. New home is here: http://drill.apache.org/docs/httpd-storage-plugin/ > Documentation for HTTPD Parser > -- > > Key: DRILL-5031 > URL: https://issues.apache.org/jira/browse/DRILL-5031 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Charles Givre >Assignee: Bridget Bevens >Priority: Minor > Labels: doc-impacting > Fix For: 1.10.0 > > > https://gist.github.com/cgivre/47f07a06d44df2af625fc6848407ae7c -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (DRILL-4974) NPE in FindPartitionConditions.analyzeCall() for 'holistic' expressions
[ https://issues.apache.org/jira/browse/DRILL-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krystal closed DRILL-4974. -- Verified and test cases added to automation. > NPE in FindPartitionConditions.analyzeCall() for 'holistic' expressions > --- > > Key: DRILL-4974 > URL: https://issues.apache.org/jira/browse/DRILL-4974 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0, 1.7.0, 1.8.0 >Reporter: Karthikeyan Manivannan >Assignee: Karthikeyan Manivannan > Fix For: 1.9.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > The following query can cause an NPE in FindPartitionConditions.analyzeCall() > if the fileSize column is a partitioned column. > SELECT fileSize FROM dfs.`/drill-data/data/` WHERE compoundId LIKE > 'FOO-1234567%' > This is because, the LIKE is treated as a holistic expression in > FindPartitionConditions.analyzeCall(), causing opStack to be empty, thus > causing opStack.peek() to return a NULL value. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-4974) NPE in FindPartitionConditions.analyzeCall() for 'holistic' expressions
[ https://issues.apache.org/jira/browse/DRILL-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krystal updated DRILL-4974: --- Reviewer: Krystal (was: Chun Chang) > NPE in FindPartitionConditions.analyzeCall() for 'holistic' expressions > --- > > Key: DRILL-4974 > URL: https://issues.apache.org/jira/browse/DRILL-4974 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0, 1.7.0, 1.8.0 >Reporter: Karthikeyan Manivannan >Assignee: Karthikeyan Manivannan > Fix For: 1.9.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > The following query can cause an NPE in FindPartitionConditions.analyzeCall() > if the fileSize column is a partitioned column. > SELECT fileSize FROM dfs.`/drill-data/data/` WHERE compoundId LIKE > 'FOO-1234567%' > This is because, the LIKE is treated as a holistic expression in > FindPartitionConditions.analyzeCall(), causing opStack to be empty, thus > causing opStack.peek() to return a NULL value. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-4280) Kerberos Authentication
[ https://issues.apache.org/jira/browse/DRILL-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951665#comment-15951665 ] Chun Chang commented on DRILL-4280: --- This bug should cover the following: "Drill should support Kerberos based authentication from clients. This means that both the ODBC and JDBC drivers as well as the web/REST interfaces should support inbound Kerberos. For Web this would most likely be SPNEGO while for ODBC and JDBC this will be more generic Kerberos." Testing on all area: web/REST, SPNEGO, ODBC and JDBC are on going. > Kerberos Authentication > --- > > Key: DRILL-4280 > URL: https://issues.apache.org/jira/browse/DRILL-4280 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: security > Fix For: 1.10.0 > > > Drill should support Kerberos based authentication from clients. This means > that both the ODBC and JDBC drivers as well as the web/REST interfaces should > support inbound Kerberos. For Web this would most likely be SPNEGO while for > ODBC and JDBC this will be more generic Kerberos. > Since Hive and much of Hadoop supports Kerberos there is a potential for a > lot of reuse of ideas if not implementation. > Note that this is related to but not the same as > https://issues.apache.org/jira/browse/DRILL-3584 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (DRILL-4987) Use ImpersonationUtil in RemoteFunctionRegistry
[ https://issues.apache.org/jira/browse/DRILL-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Chang closed DRILL-4987. - Covered by impersonation testing. > Use ImpersonationUtil in RemoteFunctionRegistry > --- > > Key: DRILL-4987 > URL: https://issues.apache.org/jira/browse/DRILL-4987 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Reporter: Sudheesh Katkam >Assignee: Sudheesh Katkam >Priority: Minor > Fix For: 1.10.0 > > > + Use ImpersonationUtil#getProcessUserName rather than > UserGroupInformation#getCurrentUser#getUserName in RemoteFunctionRegistry > + Expose process users' group info in ImpersonationUtil and use that in > RemoteFunctionRegistry, rather than > UserGroupInformation#getCurrentUser#getGroupNames -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (DRILL-5098) Improving fault tolerance for connection between client and foreman node.
[ https://issues.apache.org/jira/browse/DRILL-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Chang closed DRILL-5098. - Verified with manual testing. Automation framework is not suited for this type of test. And we have extensive unit tests coverage for this feature. > Improving fault tolerance for connection between client and foreman node. > - > > Key: DRILL-5098 > URL: https://issues.apache.org/jira/browse/DRILL-5098 > Project: Apache Drill > Issue Type: Improvement > Components: Client - JDBC >Reporter: Sorabh Hamirwasia >Assignee: Sorabh Hamirwasia > Labels: doc-impacting, ready-to-commit > Fix For: 1.10.0 > > > With DRILL-5015 we allowed support for specifying multiple Drillbits in > connection string and randomly choosing one out of it. Over time some of the > Drillbits specified in the connection string may die and the client can fail > to connect to Foreman node if random selection happens to be of dead Drillbit. > Even if ZooKeeper is used for selecting a random Drillbit from the registered > one there is a small window when client selects one Drillbit and then that > Drillbit went down. The client will fail to connect to this Drillbit and > error out. > Instead if we try multiple Drillbits (configurable tries count through > connection string) then the probability of hitting this error window will > reduce in both the cases improving fault tolerance. During further > investigation it was also found that if there is Authentication failure then > we throw that error as generic RpcException. We need to improve that as well > to capture this case explicitly since in case of Auth failure we don't want > to try multiple Drillbits. > Connection string example with new parameter: > jdbc:drill:drillbit=[:][,[:]...;tries=5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (DRILL-5121) A memory leak is observed when exact case is not specified for a column in a filter condition
[ https://issues.apache.org/jira/browse/DRILL-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krystal closed DRILL-5121. -- Verified and test cases added to automation. > A memory leak is observed when exact case is not specified for a column in a > filter condition > - > > Key: DRILL-5121 > URL: https://issues.apache.org/jira/browse/DRILL-5121 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0, 1.8.0 >Reporter: Karthikeyan Manivannan >Assignee: Karthikeyan Manivannan > Labels: ready-to-commit > Fix For: 1.10.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When the query SELECT XYZ from dfs.`/tmp/foo' where xYZ like "abc", is > executed on a setup where /tmp/foo has 2 Parquet files, 1.parquet and > 2.parquet, where 1.parquet has the column XYZ but 2.parquet does not, then > there is a memory leak. > This seems to happen because xYZ seem to be treated as a new column. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5121) A memory leak is observed when exact case is not specified for a column in a filter condition
[ https://issues.apache.org/jira/browse/DRILL-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krystal updated DRILL-5121: --- Reviewer: Krystal (was: Chun Chang) > A memory leak is observed when exact case is not specified for a column in a > filter condition > - > > Key: DRILL-5121 > URL: https://issues.apache.org/jira/browse/DRILL-5121 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0, 1.8.0 >Reporter: Karthikeyan Manivannan >Assignee: Karthikeyan Manivannan > Labels: ready-to-commit > Fix For: 1.10.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When the query SELECT XYZ from dfs.`/tmp/foo' where xYZ like "abc", is > executed on a setup where /tmp/foo has 2 Parquet files, 1.parquet and > 2.parquet, where 1.parquet has the column XYZ but 2.parquet does not, then > there is a memory leak. > This seems to happen because xYZ seem to be treated as a new column. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5394) Optimize query planning for MapR-DB tables by caching row counts
[ https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951555#comment-15951555 ] Padma Penumarthy commented on DRILL-5394: - Yes, I did. Thanks for the review [~gparai] > Optimize query planning for MapR-DB tables by caching row counts > > > Key: DRILL-5394 > URL: https://issues.apache.org/jira/browse/DRILL-5394 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization, Storage - MapRDB >Affects Versions: 1.9.0, 1.10.0 >Reporter: Abhishek Girish >Assignee: Padma Penumarthy > Labels: MapR-DB-Binary, ready-to-commit > Fix For: 1.11.0 > > > On large MapR-DB tables, it was observed that the query planning time was > longer than expected. With DEBUG logs, it was understood that there were > multiple calls being made to get MapR-DB region locations and to fetch total > row count for tables. > {code} > 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Function > ... > 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Special > ... > 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Special > ... > 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Special > ... > 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms): > {code} > We should cache these stats and reuse them where all required during query > planning. This should help reduce query planning time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (DRILL-5405) Add missing operator types
[ https://issues.apache.org/jira/browse/DRILL-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong reassigned DRILL-5405: --- Assignee: Arina Ielchiieva (was: Zelaine Fong) > Add missing operator types > -- > > Key: DRILL-5405 > URL: https://issues.apache.org/jira/browse/DRILL-5405 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG > > > Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they > won't be displayed on Web UI as UNKNOWN_OPERATOR. > Example: > before the fix -> unknown_operator.JPG > after the fix -> maprdb_sub_scan.JPG -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (DRILL-5405) Add missing operator types
[ https://issues.apache.org/jira/browse/DRILL-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong reassigned DRILL-5405: --- Assignee: Zelaine Fong (was: Arina Ielchiieva) > Add missing operator types > -- > > Key: DRILL-5405 > URL: https://issues.apache.org/jira/browse/DRILL-5405 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Arina Ielchiieva >Assignee: Zelaine Fong >Priority: Minor > Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG > > > Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they > won't be displayed on Web UI as UNKNOWN_OPERATOR. > Example: > before the fix -> unknown_operator.JPG > after the fix -> maprdb_sub_scan.JPG -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (DRILL-5031) Documentation for HTTPD Parser
[ https://issues.apache.org/jira/browse/DRILL-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens resolved DRILL-5031. --- Resolution: Fixed Fix Version/s: (was: 1.9.0) 1.10.0 Added minor edits and moved the content into Apache Drill: http://drill.apache.org/docs/configuring-drill-to-read-web-server-logs/ Please let me know if you see any issues. Thanks, Bridget > Documentation for HTTPD Parser > -- > > Key: DRILL-5031 > URL: https://issues.apache.org/jira/browse/DRILL-5031 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Charles Givre >Assignee: Bridget Bevens >Priority: Minor > Labels: doc-impacting > Fix For: 1.10.0 > > > https://gist.github.com/cgivre/47f07a06d44df2af625fc6848407ae7c -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5394) Optimize query planning for MapR-DB tables by caching row counts
[ https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951400#comment-15951400 ] Gautam Kumar Parai commented on DRILL-5394: --- [~ppenumarthy] is the code ready to go in Apache? If so , then we should mark it as with ready-to-commit tag. > Optimize query planning for MapR-DB tables by caching row counts > > > Key: DRILL-5394 > URL: https://issues.apache.org/jira/browse/DRILL-5394 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization, Storage - MapRDB >Affects Versions: 1.9.0, 1.10.0 >Reporter: Abhishek Girish >Assignee: Padma Penumarthy > Labels: MapR-DB-Binary, ready-to-commit > Fix For: 1.11.0 > > > On large MapR-DB tables, it was observed that the query planning time was > longer than expected. With DEBUG logs, it was understood that there were > multiple calls being made to get MapR-DB region locations and to fetch total > row count for tables. > {code} > 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Function > ... > 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Special > ... > 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Special > ... > 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Special > ... > 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms): > {code} > We should cache these stats and reuse them where all required during query > planning. This should help reduce query planning time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5406) Flatten produces a random ClassCastException
[ https://issues.apache.org/jira/browse/DRILL-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951316#comment-15951316 ] Rahul Challapalli commented on DRILL-5406: -- Another instance of the below query failing. This time the stacktrace shows the issue happened in the JDBC code {code} java.sql.SQLException: SYSTEM ERROR: ClassCastException Fragment 0:0 [Error Id: 3ef91b70-debf-4e32-a3a0-39010fb42460 on qa-node183.qa.lab:31010] (java.lang.ClassCastException) null at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:232) at org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:275) at org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1943) at org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:76) at oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473) at org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:465) at oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477) at org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:169) at oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109) at oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130) at org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112) at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:177) at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: ClassCastException Fragment 0:0 [Error Id: 3ef91b70-debf-4e32-a3a0-39010fb42460 on qa-node183.qa.lab:31010] (java.lang.ClassCastException) null at oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123) at oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:144) at oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46) at oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31) at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:65) at oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:363) at oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89) at oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:240) at oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123) at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:245) at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at
[jira] [Commented] (DRILL-5404) kvgen function only supports Simple maps as input
[ https://issues.apache.org/jira/browse/DRILL-5404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951301#comment-15951301 ] Rahul Challapalli commented on DRILL-5404: -- This is reproducible on drill 1.9.0 as well with the below query on the same data set {code} select kvgen(bigintegercol), kvgen(float8col) from `json_kvgenflatten/kvgen1.json` {code} > kvgen function only supports Simple maps as input > - > > Key: DRILL-5404 > URL: https://issues.apache.org/jira/browse/DRILL-5404 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli > > git.commit.id.abbrev=38ef562 > The below query did not fail when I ran it in isolation. However when I ran > the test suite at [1], which also contains the below query, by using 50 > threads submitting queries concurrently, I hit the below error. > {code} > select boolcol, bigintegercol, varcharcol, kvgen(bigintegercol), > kvgen(boolcol), kvgen(varcharcol) from `json_kvgenflatten/kvgen1.json` > Failed with exception > java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: kvgen function > only supports Simple maps as input > Fragment 0:0 > [Error Id: 953541c2-cf67-4d29-8d1c-ac3ff3c18f1f on qa-node182.qa.lab:31010] > (org.apache.drill.common.exceptions.DrillRuntimeException) kvgen function > only supports Simple maps as input > org.apache.drill.exec.expr.fn.impl.MappifyUtility.mappify():46 > org.apache.drill.exec.test.generated.ProjectorGen10361.doEval():45 > org.apache.drill.exec.test.generated.ProjectorGen10361.projectRecords():67 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():199 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():226 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489) > at > org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61) > at > oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473) > at > org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100) > at > oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477) > at > org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:180) > at > oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109) > at > oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130) > at > org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112) > at > org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:177) > at > org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at >
[jira] [Commented] (DRILL-5406) Flatten produces a random ClassCastException
[ https://issues.apache.org/jira/browse/DRILL-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951292#comment-15951292 ] Rahul Challapalli commented on DRILL-5406: -- Data set used in the query : {code} {"map":{"rm": [ {"rptd": [{ "a": "foo"}]}]}}|10 {code} > Flatten produces a random ClassCastException > > > Key: DRILL-5406 > URL: https://issues.apache.org/jira/browse/DRILL-5406 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.9.0 >Reporter: Rahul Challapalli > > I hit a random error on drill 1.9.0. I will try to reproduce the issue on the > latest master. > The below query did not fail when I ran it in isolation. However when I ran > the test suite at [1], which also contains the below query, by using 50 > threads submitting queries concurrently, I hit the below error. > {code} > select flatten(convert_from(columns[0], 'JSON')) from > `json_kvgenflatten/convert4783_2.tbl` where 1=2 > [Error Id: 1b5f4aef-ae34-4af4-9f2f-8349f8dd97c2 on qa-node183.qa.lab:31010] > (java.lang.ClassCastException) > org.apache.drill.common.expression.TypedNullConstant cannot be cast to > org.apache.drill.exec.expr.ValueVectorReadExpression > > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema():307 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78 > > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext():120 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():226 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > at > oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123) > at > oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:144) > at > oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46) > at > oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31) > at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:65) > at > oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:363) > at > oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89) > at > oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:240) > at > oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123) > at > oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) > at > oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:245) > at > oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) > at >
[jira] [Created] (DRILL-5406) Flatten produces a random ClassCastException
Rahul Challapalli created DRILL-5406: Summary: Flatten produces a random ClassCastException Key: DRILL-5406 URL: https://issues.apache.org/jira/browse/DRILL-5406 Project: Apache Drill Issue Type: Bug Components: Storage - JSON Affects Versions: 1.9.0 Reporter: Rahul Challapalli I hit a random error on drill 1.9.0. I will try to reproduce the issue on the latest master. The below query did not fail when I ran it in isolation. However when I ran the test suite at [1], which also contains the below query, by using 50 threads submitting queries concurrently, I hit the below error. {code} select flatten(convert_from(columns[0], 'JSON')) from `json_kvgenflatten/convert4783_2.tbl` where 1=2 [Error Id: 1b5f4aef-ae34-4af4-9f2f-8349f8dd97c2 on qa-node183.qa.lab:31010] (java.lang.ClassCastException) org.apache.drill.common.expression.TypedNullConstant cannot be cast to org.apache.drill.exec.expr.ValueVectorReadExpression org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema():307 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78 org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext():120 org.apache.drill.exec.record.AbstractRecordBatch.next():162 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215 org.apache.drill.exec.physical.impl.BaseRootExec.next():104 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81 org.apache.drill.exec.physical.impl.BaseRootExec.next():94 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1595 org.apache.drill.exec.work.fragment.FragmentExecutor.run():226 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 at oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123) at oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:144) at oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46) at oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31) at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:65) at oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:363) at oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89) at oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:240) at oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123) at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:245) at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at
[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join
[ https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951210#comment-15951210 ] ASF GitHub Bot commented on DRILL-5375: --- Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/794 A few follow-up comments which are non-blockers. Overall LGTM. +1 > Nested loop join: return correct result for left join > - > > Key: DRILL-5375 > URL: https://issues.apache.org/jira/browse/DRILL-5375 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting > > Mini repro: > 1. Create 2 Hive tables with data > {code} > CREATE TABLE t1 ( > FYQ varchar(999), > dts varchar(999), > dte varchar(999) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > 2016-Q1,2016-06-01,2016-09-30 > 2016-Q2,2016-09-01,2016-12-31 > 2016-Q3,2017-01-01,2017-03-31 > 2016-Q4,2017-04-01,2017-06-30 > CREATE TABLE t2 ( > who varchar(999), > event varchar(999), > dt varchar(999) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > aperson,did somthing,2017-01-06 > aperson,did somthing else,2017-01-12 > aperson,had chrsitmas,2016-12-26 > aperson,went wild,2016-01-01 > {code} > 2. Impala Query shows correct result > {code} > select t2.dt, t1.fyq, t2.who, t2.event > from t2 > left join t1 on t2.dt between t1.dts and t1.dte > order by t2.dt; > ++-+-+---+ > | dt | fyq | who | event | > ++-+-+---+ > | 2016-01-01 | NULL| aperson | went wild | > | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas | > | 2017-01-06 | 2016-Q3 | aperson | did somthing | > | 2017-01-12 | 2016-Q3 | aperson | did somthing else | > ++-+-+---+ > {code} > 3. Drill query shows wrong results: > {code} > alter session set planner.enable_nljoin_for_scalar_only=false; > use hive; > select t2.dt, t1.fyq, t2.who, t2.event > from t2 > left join t1 on t2.dt between t1.dts and t1.dte > order by t2.dt; > +-+--+--++ > | dt | fyq| who| event| > +-+--+--++ > | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas | > | 2017-01-06 | 2016-Q3 | aperson | did somthing | > | 2017-01-12 | 2016-Q3 | aperson | did somthing else | > +-+--+--++ > 3 rows selected (2.523 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join
[ https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951207#comment-15951207 ] ASF GitHub Bot commented on DRILL-5375: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/794#discussion_r109197541 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java --- @@ -105,6 +103,29 @@ public static final PositiveLongValidator PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD = new PositiveLongValidator(PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD_KEY, Long.MAX_VALUE, 1); + /* + Enables rules that re-write query joins in the most optimal way. + Though its turned on be default and its value in query optimization is undeniable, user may want turn off such + optimization to leave join order indicated in sql query unchanged. + + For example: + Currently only nested loop join allows non-equi join conditions usage. + During planning stage nested loop join will be chosen when non-equi join is detected + and {@link #NLJOIN_FOR_SCALAR} set to false. Though query performance may not be the most optimal in such case, + user may use such workaround to execute queries with non-equi joins. + + Nested loop join allows only INNER and LEFT join usage and implies that right input is smaller that left input. + During LEFT join when join optimization is enabled and detected that right input is larger that left, + join will be optimized: left and right inputs will be flipped and LEFT join type will be changed to RIGHT one. + If query contains non-equi joins, after such optimization it will fail, since nested loop does not allow + RIGHT join. In this case if user accepts probability of non optimal performance, he may turn off join optimization. + Turning off join optimization, makes sense only if user are not sure that right output is less or equal to left, + otherwise join optimization can be left turned on. + + Note: once hash and merge joins will allow non-equi join conditions, + the need to turn off join optimization may go away. + */ + public static final BooleanValidator JOIN_OPTIMIZATION = new BooleanValidator("planner.enable_join_optimization", true); --- End diff -- Ah, you added this option to enable/disable the *logical* join rules. Since NestedLoopJoin is a physical join implementation, from the comments I interpreted that this was intended for the swapping of left and right inputs of the (physical) NL join, which is why I mentioned about hashjoin_swap option. It seems to me that if there is an LEFT OUTER JOIN and condition is non-equality, then we should not allow changing to a Right Outer Join by flipping the left and right sides, since that would make the query fail. What do you think ? I suppose we could keep your boolean option for this PR and address the left outer join issue separately. > Nested loop join: return correct result for left join > - > > Key: DRILL-5375 > URL: https://issues.apache.org/jira/browse/DRILL-5375 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting > > Mini repro: > 1. Create 2 Hive tables with data > {code} > CREATE TABLE t1 ( > FYQ varchar(999), > dts varchar(999), > dte varchar(999) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > 2016-Q1,2016-06-01,2016-09-30 > 2016-Q2,2016-09-01,2016-12-31 > 2016-Q3,2017-01-01,2017-03-31 > 2016-Q4,2017-04-01,2017-06-30 > CREATE TABLE t2 ( > who varchar(999), > event varchar(999), > dt varchar(999) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > aperson,did somthing,2017-01-06 > aperson,did somthing else,2017-01-12 > aperson,had chrsitmas,2016-12-26 > aperson,went wild,2016-01-01 > {code} > 2. Impala Query shows correct result > {code} > select t2.dt, t1.fyq, t2.who, t2.event > from t2 > left join t1 on t2.dt between t1.dts and t1.dte > order by t2.dt; > ++-+-+---+ > | dt | fyq | who | event | > ++-+-+---+ > | 2016-01-01 | NULL| aperson | went wild | > | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas | > | 2017-01-06 | 2016-Q3 | aperson | did somthing | > | 2017-01-12 | 2016-Q3 | aperson | did somthing else | > ++-+-+---+ > {code} > 3. Drill query shows wrong results: > {code} > alter session set
[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join
[ https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951205#comment-15951205 ] ASF GitHub Bot commented on DRILL-5375: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/794#discussion_r109193083 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java --- @@ -214,26 +226,62 @@ private boolean hasMore(IterOutcome outcome) { /** * Method generates the runtime code needed for NLJ. Other than the setup method to set the input and output value - * vector references we implement two more methods - * 1. emitLeft() -> Project record from the left side - * 2. emitRight() -> Project record from the right side (which is a hyper container) + * vector references we implement three more methods + * 1. doEval() -> Evaluates if record from left side matches record from the right side + * 2. emitLeft() -> Project record from the left side + * 3. emitRight() -> Project record from the right side (which is a hyper container) * @return the runtime generated class that implements the NestedLoopJoin interface - * @throws IOException - * @throws ClassTransformationException */ - private NestedLoopJoin setupWorker() throws IOException, ClassTransformationException { -final CodeGenerator nLJCodeGenerator = CodeGenerator.get(NestedLoopJoin.TEMPLATE_DEFINITION, context.getFunctionRegistry(), context.getOptions()); + private NestedLoopJoin setupWorker() throws IOException, ClassTransformationException, SchemaChangeException { +final CodeGenerator nLJCodeGenerator = CodeGenerator.get( +NestedLoopJoin.TEMPLATE_DEFINITION, context.getFunctionRegistry(), context.getOptions()); nLJCodeGenerator.plainJavaCapable(true); // Uncomment out this line to debug the generated code. //nLJCodeGenerator.saveCodeForDebugging(true); final ClassGenerator nLJClassGenerator = nLJCodeGenerator.getRoot(); +// generate doEval +final ErrorCollector collector = new ErrorCollectorImpl(); + + +/* +Logical expression may contain fields from left and right batches. During code generation (materialization) +we need to indicate from which input field should be taken. Mapping sets can work with only one input at a time. +But non-equality expressions can be complex: + select t1.c1, t2.c1, t2.c2 from t1 inner join t2 on t1.c1 between t2.c1 and t2.c2 +or even contain self join which can not be transformed into filter since OR clause is present + select *from t1 inner join t2 on t1.c1 >= t2.c1 or t1.c3 <> t1.c4 + +In this case logical expression can not be split according to input presence (like during equality joins --- End diff -- To avoid confusion you could list couple of example categories: 1. Join on non-equijoin predicates: t1 inner join t2 on (t1.c1 between t2.c1 AND t2.c2) AND (...) 2. Join with an OR predicate: t1 inner join t2 on on t1.c1 = t2.c1 OR t1.c2 = t2.c2 The other category where a join predicate includes self-join could probably be left out since there are quite a few variations there - if there are 2 tables but the join condition only specifies 1 table, then it would be a cartesian join with the second table. If the self join occurs in combination with an AND it would be treated differently compared with OR etc.. > Nested loop join: return correct result for left join > - > > Key: DRILL-5375 > URL: https://issues.apache.org/jira/browse/DRILL-5375 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting > > Mini repro: > 1. Create 2 Hive tables with data > {code} > CREATE TABLE t1 ( > FYQ varchar(999), > dts varchar(999), > dte varchar(999) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > 2016-Q1,2016-06-01,2016-09-30 > 2016-Q2,2016-09-01,2016-12-31 > 2016-Q3,2017-01-01,2017-03-31 > 2016-Q4,2017-04-01,2017-06-30 > CREATE TABLE t2 ( > who varchar(999), > event varchar(999), > dt varchar(999) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > aperson,did somthing,2017-01-06 > aperson,did somthing else,2017-01-12 > aperson,had chrsitmas,2016-12-26 > aperson,went wild,2016-01-01 > {code} > 2. Impala Query shows correct result > {code} > select t2.dt, t1.fyq, t2.who, t2.event > from t2 > left join t1 on t2.dt between t1.dts and t1.dte > order by t2.dt; > ++-+-+---+ > | dt
[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join
[ https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951206#comment-15951206 ] ASF GitHub Bot commented on DRILL-5375: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/794#discussion_r109186694 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java --- @@ -70,27 +70,65 @@ private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DrillOptiq.class); /** - * Converts a tree of {@link RexNode} operators into a scalar expression in Drill syntax. + * Converts a tree of {@link RexNode} operators into a scalar expression in Drill syntax using one input. + * + * @param context parse context which contains planner settings + * @param input data input + * @param expr expression to be converted + * @return converted expression */ public static LogicalExpression toDrill(DrillParseContext context, RelNode input, RexNode expr) { -final RexToDrill visitor = new RexToDrill(context, input); +return toDrill(context, Lists.newArrayList(input), expr); + } + + /** + * Converts a tree of {@link RexNode} operators into a scalar expression in Drill syntax using multiple inputs. + * + * @param context parse context which contains planner settings + * @param inputs multiple data inputs + * @param expr expression to be converted + * @return converted expression + */ + public static LogicalExpression toDrill(DrillParseContext context, List inputs, RexNode expr) { +final RexToDrill visitor = new RexToDrill(context, inputs); return expr.accept(visitor); } private static class RexToDrill extends RexVisitorImpl { -private final RelNode input; +private final List inputs; private final DrillParseContext context; +private final List fieldList; -RexToDrill(DrillParseContext context, RelNode input) { +RexToDrill(DrillParseContext context, List inputs) { super(true); this.context = context; - this.input = input; + this.inputs = inputs; + this.fieldList = Lists.newArrayList(); + /* + Fields are enumerated by their presence order in input. Details {@link org.apache.calcite.rex.RexInputRef}. + Thus we can merge field list from several inputs by adding them into the list in order of appearance. + Each field index in the list will match field index in the RexInputRef instance which will allow us + to retrieve field from filed list by index in {@link #visitInputRef(RexInputRef)} method. Example: + + Query: select t1.c1, t2.c1. t2.c2 from t1 inner join t2 on t1.c1 between t2.c1 and t2.c2 + + Input 1: $0 + Input 2: $1, $2 + + Result: $0, $1, $2 + */ + for (RelNode input : inputs) { --- End diff -- Ok, I see. Performance-wise it is a minor thing but It is more about working with the existing visitInputRef() which takes one input. > Nested loop join: return correct result for left join > - > > Key: DRILL-5375 > URL: https://issues.apache.org/jira/browse/DRILL-5375 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting > > Mini repro: > 1. Create 2 Hive tables with data > {code} > CREATE TABLE t1 ( > FYQ varchar(999), > dts varchar(999), > dte varchar(999) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > 2016-Q1,2016-06-01,2016-09-30 > 2016-Q2,2016-09-01,2016-12-31 > 2016-Q3,2017-01-01,2017-03-31 > 2016-Q4,2017-04-01,2017-06-30 > CREATE TABLE t2 ( > who varchar(999), > event varchar(999), > dt varchar(999) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > aperson,did somthing,2017-01-06 > aperson,did somthing else,2017-01-12 > aperson,had chrsitmas,2016-12-26 > aperson,went wild,2016-01-01 > {code} > 2. Impala Query shows correct result > {code} > select t2.dt, t1.fyq, t2.who, t2.event > from t2 > left join t1 on t2.dt between t1.dts and t1.dte > order by t2.dt; > ++-+-+---+ > | dt | fyq | who | event | > ++-+-+---+ > | 2016-01-01 | NULL| aperson | went wild | > | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas | > | 2017-01-06 | 2016-Q3 | aperson | did somthing | > | 2017-01-12 | 2016-Q3 | aperson | did somthing else | >
[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join
[ https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951204#comment-15951204 ] ASF GitHub Bot commented on DRILL-5375: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/794#discussion_r109193949 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java --- @@ -40,132 +41,133 @@ // Record count of the left batch currently being processed private int leftRecordCount = 0; - // List of record counts per batch in the hyper container + // List of record counts per batch in the hyper container private List rightCounts = null; // Output batch private NestedLoopJoinBatch outgoing = null; - // Next right batch to process - private int nextRightBatchToProcess = 0; - - // Next record in the current right batch to process - private int nextRightRecordToProcess = 0; - - // Next record in the left batch to process - private int nextLeftRecordToProcess = 0; + // Iteration status tracker + private IterationStatusTracker tracker = new IterationStatusTracker(); /** * Method initializes necessary state and invokes the doSetup() to set the - * input and output value vector references + * input and output value vector references. + * * @param context Fragment context * @param left Current left input batch being processed * @param rightContainer Hyper container + * @param rightCounts Counts for each right container * @param outgoing Output batch */ - public void setupNestedLoopJoin(FragmentContext context, RecordBatch left, + public void setupNestedLoopJoin(FragmentContext context, + RecordBatch left, ExpandableHyperContainer rightContainer, LinkedList rightCounts, NestedLoopJoinBatch outgoing) { this.left = left; -leftRecordCount = left.getRecordCount(); +this.leftRecordCount = left.getRecordCount(); this.rightCounts = rightCounts; this.outgoing = outgoing; doSetup(context, rightContainer, left, outgoing); } /** - * This method is the core of the nested loop join. For every record on the right we go over - * the left batch and produce the cross product output + * Main entry point for producing the output records. Thin wrapper around populateOutgoingBatch(), this method + * controls which left batch we are processing and fetches the next left input batch once we exhaust the current one. + * + * @param joinType join type (INNER ot LEFT) + * @return the number of records produced in the output batch + */ + public int outputRecords(JoinRelType joinType) { +int outputIndex = 0; +while (leftRecordCount != 0) { + outputIndex = populateOutgoingBatch(joinType, outputIndex); + if (outputIndex >= NestedLoopJoinBatch.MAX_BATCH_SIZE) { +break; + } + // reset state and get next left batch + resetAndGetNextLeft(); +} +return outputIndex; + } + + /** + * This method is the core of the nested loop join.For each left batch record looks for matching record + * from the list of right batches. Match is checked by calling {@link #doEval(int, int, int)} method. + * If matching record is found both left and right records are written into output batch, + * otherwise if join type is LEFT, than only left record is written, right batch record values will be null. + * + * @param joinType join type (INNER or LEFT) * @param outputIndex index to start emitting records at * @return final outputIndex after producing records in the output batch */ - private int populateOutgoingBatch(int outputIndex) { - -// Total number of batches on the right side -int totalRightBatches = rightCounts.size(); - -// Total number of records on the left -int localLeftRecordCount = leftRecordCount; - -/* - * The below logic is the core of the NLJ. To have better performance we copy the instance members into local - * method variables, once we are done with the loop we need to update the instance variables to reflect the new - * state. To avoid code duplication of resetting the instance members at every exit point in the loop we are using - * 'goto' - */ -int localNextRightBatchToProcess = nextRightBatchToProcess; -int localNextRightRecordToProcess = nextRightRecordToProcess; -
[jira] [Updated] (DRILL-5394) Optimize query planning for MapR-DB tables by caching row counts
[ https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Padma Penumarthy updated DRILL-5394: Labels: MapR-DB-Binary ready-to-commit (was: MapR-DB-Binary) > Optimize query planning for MapR-DB tables by caching row counts > > > Key: DRILL-5394 > URL: https://issues.apache.org/jira/browse/DRILL-5394 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization, Storage - MapRDB >Affects Versions: 1.9.0, 1.10.0 >Reporter: Abhishek Girish >Assignee: Padma Penumarthy > Labels: MapR-DB-Binary, ready-to-commit > Fix For: 1.11.0 > > > On large MapR-DB tables, it was observed that the query planning time was > longer than expected. With DEBUG logs, it was understood that there were > multiple calls being made to get MapR-DB region locations and to fetch total > row count for tables. > {code} > 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Function > ... > 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Special > ... > 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Special > ... > 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.planner.logical.DrillOptiq - Special > ... > 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations > 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG > o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms): > {code} > We should cache these stats and reuse them where all required during query > planning. This should help reduce query planning time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5405) Add missing operator types
[ https://issues.apache.org/jira/browse/DRILL-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-5405: Reviewer: Karthikeyan Manivannan Assigned Reviewer to [~karthikm] > Add missing operator types > -- > > Key: DRILL-5405 > URL: https://issues.apache.org/jira/browse/DRILL-5405 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG > > > Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they > won't be displayed on Web UI as UNKNOWN_OPERATOR. > Example: > before the fix -> unknown_operator.JPG > after the fix -> maprdb_sub_scan.JPG -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5405) Add missing operator types
[ https://issues.apache.org/jira/browse/DRILL-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950693#comment-15950693 ] ASF GitHub Bot commented on DRILL-5405: --- GitHub user arina-ielchiieva opened a pull request: https://github.com/apache/drill/pull/804 DRILL-5405: Add missing operator types You can merge this pull request into a Git repository by running: $ git pull https://github.com/arina-ielchiieva/drill DRILL-5405 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/804.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #804 commit 91ccb9c539f1c20d73b5cae9cb101c18b8f0cb73 Author: Arina IelchiievaDate: 2017-03-30T16:55:31Z DRILL-5405: Add missing operator types > Add missing operator types > -- > > Key: DRILL-5405 > URL: https://issues.apache.org/jira/browse/DRILL-5405 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG > > > Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they > won't be displayed on Web UI as UNKNOWN_OPERATOR. > Example: > before the fix -> unknown_operator.JPG > after the fix -> maprdb_sub_scan.JPG -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array
[ https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz closed DRILL-3562. - > Query fails when using flatten on JSON data where some documents have an > empty array > > > Key: DRILL-3562 > URL: https://issues.apache.org/jira/browse/DRILL-3562 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.1.0 >Reporter: Philip Deegan >Assignee: Serhii Harnyk > Fix For: 1.10.0 > > > Drill query fails when using flatten when some records contain an empty array > {noformat} > SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) > flat WHERE flat.c.d.e = 'f' limit 1; > {noformat} > Succeeds on > { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } } > Fails on > { "a": { "b": { "c": [] } } } > Error > {noformat} > Error: SYSTEM ERROR: ClassCastException: Cannot cast > org.apache.drill.exec.vector.NullableIntVector to > org.apache.drill.exec.vector.complex.RepeatedValueVector > {noformat} > Is it possible to ignore the empty arrays, or do they need to be populated > with dummy data? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5401) isnotnull(MAP-REPEATED) - IS NULL / IS NOT NULL over a list in JSON
[ https://issues.apache.org/jira/browse/DRILL-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950643#comment-15950643 ] Khurram Faraaz commented on DRILL-5401: --- The SQL was incorrect in the above example, fixing the SQL results in SchemaChangeException {noformat} 0: jdbc:drill:schema=dfs.tmp> select t.a.b.c from `empty_array.json` t where t.a.b.c is not null; Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [isnotnull(MAP-REPEATED)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: e1b65f30-6f40-43f4-8162-9cb54d6f5a81 on centos-01.qa.lab:31010] (state=,code=0) 0: jdbc:drill:schema=dfs.tmp> select t.a.b.c from `empty_array.json` t where t.a.b.c is null; Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [isnull(MAP-REPEATED)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: c964c93d-0573-4598-a3ee-6d8abc3abff0 on centos-01.qa.lab:31010] (state=,code=0) {noformat} > isnotnull(MAP-REPEATED) - IS NULL / IS NOT NULL over a list in JSON > --- > > Key: DRILL-5401 > URL: https://issues.apache.org/jira/browse/DRILL-5401 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.11.0 >Reporter: Khurram Faraaz > > Checking if a list is null or if it is not null, results in > SchemaChangeException. > Drill 1.11.0 commit id: adbf363d > Data used in test > {noformat} > [root@centos-01 ~]# cat empty_array.json > { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } } > { "a": { "b": { "c": [] } } } > {noformat} > {noformat} > 0: jdbc:drill:schema=dfs.tmp> alter session set > `store.json.all_text_mode`=true; > +---++ > | ok | summary | > +---++ > | true | store.json.all_text_mode updated. | > +---++ > 1 row selected (0.189 seconds) > 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json`; > ++ > | a| > ++ > | {"b":{"c":[{"d":{"e":"f"}}]}} | > | {"b":{"c":[]}} | > ++ > 2 rows selected (0.138 seconds) > /* wrong results */ > 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c > IS NULL; > ++ > | a| > ++ > | {"b":{"c":[{"d":{"e":"f"}}]}} | > | {"b":{"c":[]}} | > ++ > 2 rows selected (0.152 seconds) > /* wrong results */ > 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c > IS NOT NULL; > ++ > | a | > ++ > ++ > No rows selected (0.154 seconds) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5401) isnotnull(MAP-REPEATED) - IS NULL / IS NOT NULL over a list in JSON
[ https://issues.apache.org/jira/browse/DRILL-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz updated DRILL-5401: -- Summary: isnotnull(MAP-REPEATED) - IS NULL / IS NOT NULL over a list in JSON (was: wrong results - IS NULL / IS NOT NULL over a list in JSON) > isnotnull(MAP-REPEATED) - IS NULL / IS NOT NULL over a list in JSON > --- > > Key: DRILL-5401 > URL: https://issues.apache.org/jira/browse/DRILL-5401 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.11.0 >Reporter: Khurram Faraaz > > Checking if a list is null or if it is not null, results in incorrect results. > Drill 1.11.0 commit id: adbf363d > Data used in test > {noformat} > [root@centos-01 ~]# cat empty_array.json > { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } } > { "a": { "b": { "c": [] } } } > {noformat} > {noformat} > 0: jdbc:drill:schema=dfs.tmp> alter session set > `store.json.all_text_mode`=true; > +---++ > | ok | summary | > +---++ > | true | store.json.all_text_mode updated. | > +---++ > 1 row selected (0.189 seconds) > 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json`; > ++ > | a| > ++ > | {"b":{"c":[{"d":{"e":"f"}}]}} | > | {"b":{"c":[]}} | > ++ > 2 rows selected (0.138 seconds) > /* wrong results */ > 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c > IS NULL; > ++ > | a| > ++ > | {"b":{"c":[{"d":{"e":"f"}}]}} | > | {"b":{"c":[]}} | > ++ > 2 rows selected (0.152 seconds) > /* wrong results */ > 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c > IS NOT NULL; > ++ > | a | > ++ > ++ > No rows selected (0.154 seconds) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5401) isnotnull(MAP-REPEATED) - IS NULL / IS NOT NULL over a list in JSON
[ https://issues.apache.org/jira/browse/DRILL-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz updated DRILL-5401: -- Description: Checking if a list is null or if it is not null, results in SchemaChangeException. Drill 1.11.0 commit id: adbf363d Data used in test {noformat} [root@centos-01 ~]# cat empty_array.json { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } } { "a": { "b": { "c": [] } } } {noformat} {noformat} 0: jdbc:drill:schema=dfs.tmp> alter session set `store.json.all_text_mode`=true; +---++ | ok | summary | +---++ | true | store.json.all_text_mode updated. | +---++ 1 row selected (0.189 seconds) 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json`; ++ | a| ++ | {"b":{"c":[{"d":{"e":"f"}}]}} | | {"b":{"c":[]}} | ++ 2 rows selected (0.138 seconds) /* wrong results */ 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c IS NULL; ++ | a| ++ | {"b":{"c":[{"d":{"e":"f"}}]}} | | {"b":{"c":[]}} | ++ 2 rows selected (0.152 seconds) /* wrong results */ 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c IS NOT NULL; ++ | a | ++ ++ No rows selected (0.154 seconds) {noformat} was: Checking if a list is null or if it is not null, results in incorrect results. Drill 1.11.0 commit id: adbf363d Data used in test {noformat} [root@centos-01 ~]# cat empty_array.json { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } } { "a": { "b": { "c": [] } } } {noformat} {noformat} 0: jdbc:drill:schema=dfs.tmp> alter session set `store.json.all_text_mode`=true; +---++ | ok | summary | +---++ | true | store.json.all_text_mode updated. | +---++ 1 row selected (0.189 seconds) 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json`; ++ | a| ++ | {"b":{"c":[{"d":{"e":"f"}}]}} | | {"b":{"c":[]}} | ++ 2 rows selected (0.138 seconds) /* wrong results */ 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c IS NULL; ++ | a| ++ | {"b":{"c":[{"d":{"e":"f"}}]}} | | {"b":{"c":[]}} | ++ 2 rows selected (0.152 seconds) /* wrong results */ 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c IS NOT NULL; ++ | a | ++ ++ No rows selected (0.154 seconds) {noformat} > isnotnull(MAP-REPEATED) - IS NULL / IS NOT NULL over a list in JSON > --- > > Key: DRILL-5401 > URL: https://issues.apache.org/jira/browse/DRILL-5401 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.11.0 >Reporter: Khurram Faraaz > > Checking if a list is null or if it is not null, results in > SchemaChangeException. > Drill 1.11.0 commit id: adbf363d > Data used in test > {noformat} > [root@centos-01 ~]# cat empty_array.json > { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } } > { "a": { "b": { "c": [] } } } > {noformat} > {noformat} > 0: jdbc:drill:schema=dfs.tmp> alter session set > `store.json.all_text_mode`=true; > +---++ > | ok | summary | > +---++ > | true | store.json.all_text_mode updated. | > +---++ > 1 row selected (0.189 seconds) > 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json`; > ++ > | a| > ++ > | {"b":{"c":[{"d":{"e":"f"}}]}} | > | {"b":{"c":[]}} | > ++ > 2 rows selected (0.138 seconds) > /* wrong results */ > 0: jdbc:drill:schema=dfs.tmp> select * from `empty_array.json` t where t.b.c > IS NULL; > ++ > | a| > ++ > | {"b":{"c":[{"d":{"e":"f"}}]}} | > | {"b":{"c":[]}} | > ++ > 2 rows selected (0.152 seconds) > /* wrong results */
[jira] [Comment Edited] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array
[ https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950596#comment-15950596 ] Khurram Faraaz edited comment on DRILL-3562 at 3/31/17 9:32 AM: [~arina] thanks for confirming. Verified that SQL reported in this JIRA returns correct results on Drill 1.11.0 Test added here framework/resources/Functional/json/json_storage/drill_3562.q {noformat} 0: jdbc:drill:schema=dfs.tmp> select count(*) from (select FLATTEN(t.a.b.c) AS c from `empty_array.json` t) flat WHERE flat.c.d.e = 'f' ; +-+ | EXPR$0 | +-+ | 1 | +-+ 1 row selected (0.241 seconds) {noformat} was (Author: khfaraaz): [~arina] thanks for confirming. Verified that SQL reported in this JIRA returns correct results on Drill 1.10.0 Test added here framework/resources/Functional/json/json_storage/drill_3562.q {noformat} 0: jdbc:drill:schema=dfs.tmp> select count(*) from (select FLATTEN(t.a.b.c) AS c from `empty_array.json` t) flat WHERE flat.c.d.e = 'f' ; +-+ | EXPR$0 | +-+ | 1 | +-+ 1 row selected (0.241 seconds) {noformat} > Query fails when using flatten on JSON data where some documents have an > empty array > > > Key: DRILL-3562 > URL: https://issues.apache.org/jira/browse/DRILL-3562 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.1.0 >Reporter: Philip Deegan >Assignee: Serhii Harnyk > Fix For: 1.10.0 > > > Drill query fails when using flatten when some records contain an empty array > {noformat} > SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) > flat WHERE flat.c.d.e = 'f' limit 1; > {noformat} > Succeeds on > { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } } > Fails on > { "a": { "b": { "c": [] } } } > Error > {noformat} > Error: SYSTEM ERROR: ClassCastException: Cannot cast > org.apache.drill.exec.vector.NullableIntVector to > org.apache.drill.exec.vector.complex.RepeatedValueVector > {noformat} > Is it possible to ignore the empty arrays, or do they need to be populated > with dummy data? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array
[ https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950596#comment-15950596 ] Khurram Faraaz commented on DRILL-3562: --- [~arina] thanks for confirming. Verified that SQL reported in this JIRA returns correct results on Drill 1.10.0 Test added here framework/resources/Functional/json/json_storage/drill_3562.q {noformat} 0: jdbc:drill:schema=dfs.tmp> select count(*) from (select FLATTEN(t.a.b.c) AS c from `empty_array.json` t) flat WHERE flat.c.d.e = 'f' ; +-+ | EXPR$0 | +-+ | 1 | +-+ 1 row selected (0.241 seconds) {noformat} > Query fails when using flatten on JSON data where some documents have an > empty array > > > Key: DRILL-3562 > URL: https://issues.apache.org/jira/browse/DRILL-3562 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.1.0 >Reporter: Philip Deegan >Assignee: Serhii Harnyk > Fix For: 1.10.0 > > > Drill query fails when using flatten when some records contain an empty array > {noformat} > SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) > flat WHERE flat.c.d.e = 'f' limit 1; > {noformat} > Succeeds on > { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } } > Fails on > { "a": { "b": { "c": [] } } } > Error > {noformat} > Error: SYSTEM ERROR: ClassCastException: Cannot cast > org.apache.drill.exec.vector.NullableIntVector to > org.apache.drill.exec.vector.complex.RepeatedValueVector > {noformat} > Is it possible to ignore the empty arrays, or do they need to be populated > with dummy data? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array
[ https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950582#comment-15950582 ] Arina Ielchiieva commented on DRILL-3562: - Yes, it does. This behavior is expected in unit test TestJsonReader.testFlattenEmptyArrayWithAllTextMode. > Query fails when using flatten on JSON data where some documents have an > empty array > > > Key: DRILL-3562 > URL: https://issues.apache.org/jira/browse/DRILL-3562 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.1.0 >Reporter: Philip Deegan >Assignee: Serhii Harnyk > Fix For: 1.10.0 > > > Drill query fails when using flatten when some records contain an empty array > {noformat} > SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) > flat WHERE flat.c.d.e = 'f' limit 1; > {noformat} > Succeeds on > { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } } > Fails on > { "a": { "b": { "c": [] } } } > Error > {noformat} > Error: SYSTEM ERROR: ClassCastException: Cannot cast > org.apache.drill.exec.vector.NullableIntVector to > org.apache.drill.exec.vector.complex.RepeatedValueVector > {noformat} > Is it possible to ignore the empty arrays, or do they need to be populated > with dummy data? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DRILL-5405) Add missing operator types
Arina Ielchiieva created DRILL-5405: --- Summary: Add missing operator types Key: DRILL-5405 URL: https://issues.apache.org/jira/browse/DRILL-5405 Project: Apache Drill Issue Type: Bug Affects Versions: 1.10.0 Reporter: Arina Ielchiieva Assignee: Arina Ielchiieva Priority: Minor Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they won't be displayed on Web UI as UNKNOWN_OPERATOR. Example: before the fix -> unknown_operator.JPG after the fix -> maprdb_sub_scan.JPG -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5405) Add missing operator types
[ https://issues.apache.org/jira/browse/DRILL-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5405: Attachment: maprdb_sub_scan.JPG unknown_operator.JPG > Add missing operator types > -- > > Key: DRILL-5405 > URL: https://issues.apache.org/jira/browse/DRILL-5405 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Attachments: maprdb_sub_scan.JPG, unknown_operator.JPG > > > Add missing operator types: FLATTEN, MONGO_SUB_SCAN, MAPRDB_SUB_SCAN so they > won't be displayed on Web UI as UNKNOWN_OPERATOR. > Example: > before the fix -> unknown_operator.JPG > after the fix -> maprdb_sub_scan.JPG -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array
[ https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950452#comment-15950452 ] Khurram Faraaz commented on DRILL-3562: --- [~arina] Is this the expected result for the second SQL below ? {noformat} 0: jdbc:drill:schema=dfs.tmp> select * from `drill_3562.json`; +-+ |a| +-+ | {"b":{"c":[]}} | +-+ 1 row selected (0.138 seconds) 0: jdbc:drill:schema=dfs.tmp> select FLATTEN(t.a.b.c) AS c from `drill_3562.json` t; ++ | c | ++ ++ No rows selected (0.181 seconds) {noformat} > Query fails when using flatten on JSON data where some documents have an > empty array > > > Key: DRILL-3562 > URL: https://issues.apache.org/jira/browse/DRILL-3562 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.1.0 >Reporter: Philip Deegan >Assignee: Serhii Harnyk > Fix For: 1.10.0 > > > Drill query fails when using flatten when some records contain an empty array > {noformat} > SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) > flat WHERE flat.c.d.e = 'f' limit 1; > {noformat} > Succeeds on > { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } } > Fails on > { "a": { "b": { "c": [] } } } > Error > {noformat} > Error: SYSTEM ERROR: ClassCastException: Cannot cast > org.apache.drill.exec.vector.NullableIntVector to > org.apache.drill.exec.vector.complex.RepeatedValueVector > {noformat} > Is it possible to ignore the empty arrays, or do they need to be populated > with dummy data? -- This message was sent by Atlassian JIRA (v6.3.15#6346)