[jira] [Commented] (DRILL-2282) Eliminate spaces, special characters from names in function templates
[ https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144172#comment-15144172 ] Mehant Baid commented on DRILL-2282: [~vitalii] You don't need hbase to reproduce this. You just need a plan that does not execute in a single fragment. You can probably use a similar test case like in the patch for DRILL-1496 to verify if this is still a problem. > Eliminate spaces, special characters from names in function templates > - > > Key: DRILL-2282 > URL: https://issues.apache.org/jira/browse/DRILL-2282 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Reporter: Mehant Baid >Assignee: Vitalii Diravka > Fix For: 1.6.0 > > Attachments: DRILL-2282.patch > > > Having spaces in the name of the functions causes issues while deserializing > such expressions when we try to read the plan fragment. As part of this JIRA > would like to clean up all the templates to not include special characters in > their names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2282) Eliminate spaces, special characters from names in function templates
[ https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140298#comment-15140298 ] Mehant Baid commented on DRILL-2282: [~parthc] There was a specific issue with 'similar' function as noted here: [DRILL-1496|https://issues.apache.org/jira/browse/DRILL-1496] that was fixed, but this is a more generic JIRA to make sure we don't run into a similar issue. If i recall correctly there was a problem in deserializing the plan fragment if we had a space while serializing the expression. > Eliminate spaces, special characters from names in function templates > - > > Key: DRILL-2282 > URL: https://issues.apache.org/jira/browse/DRILL-2282 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Reporter: Mehant Baid >Assignee: Vitalii Diravka > Fix For: 1.6.0 > > Attachments: DRILL-2282.patch > > > Having spaces in the name of the functions causes issues while deserializing > such expressions when we try to read the plan fragment. As part of this JIRA > would like to clean up all the templates to not include special characters in > their names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3739) NPE on select from Hive for HBase table
[ https://issues.apache.org/jira/browse/DRILL-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid resolved DRILL-3739. Resolution: Fixed Fix Version/s: (was: 1.4.0) 1.5.0 Fixed in 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > NPE on select from Hive for HBase table > --- > > Key: DRILL-3739 > URL: https://issues.apache.org/jira/browse/DRILL-3739 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: ckran >Assignee: Mehant Baid >Priority: Critical > Fix For: 1.5.0 > > > For a table in HBase or MapR-DB with metadata created in Hive so that it can > be accessed through beeline or Hue. From Drill query fail with > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > NullPointerException [Error Id: 1cfd2a36-bc73-4a36-83ee-ac317b8e6cdb] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4192) Dir0 and Dir1 from drill-1.4 are messed up
[ https://issues.apache.org/jira/browse/DRILL-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-4192: --- Assignee: Aman Sinha (was: Mehant Baid) > Dir0 and Dir1 from drill-1.4 are messed up > -- > > Key: DRILL-4192 > URL: https://issues.apache.org/jira/browse/DRILL-4192 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.4.0 >Reporter: Krystal >Assignee: Aman Sinha >Priority: Blocker > > I have the following directories: > /drill/testdata/temp1/abc/dt=2014-12-30/lineitem.parquet > /drill/testdata/temp1/abc/dt=2014-12-31/lineitem.parquet > The following queries returned incorrect data. > select dir0,dir1 from dfs.`/drill/testdata/temp1` limit 2; > ++---+ > | dir0 | dir1 | > ++---+ > | dt=2014-12-30 | null | > | dt=2014-12-30 | null | > ++---+ > select dir0 from dfs.`/drill/testdata/temp1` limit 2; > ++ > | dir0 | > ++ > | dt=2014-12-31 | > | dt=2014-12-31 | > ++ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2419) UDF that returns string representation of expression type
[ https://issues.apache.org/jira/browse/DRILL-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid resolved DRILL-2419. Resolution: Fixed Fix Version/s: (was: Future) 1.3.0 Fixed in eb6325dc9b59291582cd7d3c3e5d02efd5d15906. > UDF that returns string representation of expression type > - > > Key: DRILL-2419 > URL: https://issues.apache.org/jira/browse/DRILL-2419 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Reporter: Victoria Markman >Assignee: Steven Phillips > Fix For: 1.3.0 > > > Suggested name: typeof (credit goes to Aman) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-2419) UDF that returns string representation of expression type
[ https://issues.apache.org/jira/browse/DRILL-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid reassigned DRILL-2419: -- Assignee: Mehant Baid > UDF that returns string representation of expression type > - > > Key: DRILL-2419 > URL: https://issues.apache.org/jira/browse/DRILL-2419 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Reporter: Victoria Markman >Assignee: Mehant Baid > Fix For: Future > > > Suggested name: typeof (credit goes to Aman) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3739) NPE on select from Hive for HBase table
[ https://issues.apache.org/jira/browse/DRILL-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034297#comment-15034297 ] Mehant Baid commented on DRILL-3739: +1. > NPE on select from Hive for HBase table > --- > > Key: DRILL-3739 > URL: https://issues.apache.org/jira/browse/DRILL-3739 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: ckran >Priority: Critical > > For a table in HBase or MapR-DB with metadata created in Hive so that it can > be accessed through beeline or Hue. From Drill query fail with > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > NullPointerException [Error Id: 1cfd2a36-bc73-4a36-83ee-ac317b8e6cdb] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3893) Issue with Drill after Hive Alters the Table
[ https://issues.apache.org/jira/browse/DRILL-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034327#comment-15034327 ] Mehant Baid commented on DRILL-3893: lgtm +1 > Issue with Drill after Hive Alters the Table > - > > Key: DRILL-3893 > URL: https://issues.apache.org/jira/browse/DRILL-3893 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive, Storage - Hive >Affects Versions: 1.0.0, 1.1.0 > Environment: DEV >Reporter: arnab chatterjee > > I reproduced this again on another partitioned table with existing data. > Providing some more details. I have enabled the version mode for errors. > Drill is unable to fetch the new column name that was introduced.This most > likely to me seems to me that it’s still picking up the stale metadata of > hive. > if (!tableColumns.contains(columnName)) { > if (partitionNames.contains(columnName)) { > selectedPartitionNames.add(columnName); > } else { > throw new ExecutionSetupException(String.format("Column %s does > not exist", columnName)); > } > } > select testdata from testtable; > Error: SYSTEM ERROR: ExecutionSetupException: Column testdata does not exist > Fragment 0:0 > [Error Id: be5cccba-97f6-4cc4-94e8-c11a4c53c8f4 on x.x.com:] > (org.apache.drill.common.exceptions.ExecutionSetupException) Failure while > initializing HiveRecordReader: Column testdata does not exist > org.apache.drill.exec.store.hive.HiveRecordReader.init():241 > org.apache.drill.exec.store.hive.HiveRecordReader.():138 > org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58 > org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34 > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():150 > org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173 > org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106 > org.apache.drill.exec.physical.impl.ImplCreator.getExec():81 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():235 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (org.apache.drill.common.exceptions.ExecutionSetupException) > Column testdata does not exist > org.apache.drill.exec.store.hive.HiveRecordReader.init():206 > org.apache.drill.exec.store.hive.HiveRecordReader.():138 > org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58 > org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34 > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():150 > org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173 > org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106 > org.apache.drill.exec.physical.impl.ImplCreator.getExec():81 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():235 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) > # > Please note that this is a partitioned table with existing data. > Does Drill Cache the Meta somewhere and hence it’s not getting reflected > immediately ? > DRILL CLI > > select x from xx; > Error: SYSTEM ERROR: ExecutionSetupException: Column x does not exist > Fragment 0:0 > [Error Id: 62086e22-1341-459e-87ce-430a24cc5119 on x.x.com:999] > (state=,code=0) > HIVE CLI > hive> describe formatted x; > OK > # col_name data_type comment > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4119) Skew in hash distribution for varchar (and possibly other) types of data
[ https://issues.apache.org/jira/browse/DRILL-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032229#comment-15032229 ] Mehant Baid commented on DRILL-4119: I think it makes sense to address that as a separate issue. Patch looks good otherwise. +1. > Skew in hash distribution for varchar (and possibly other) types of data > > > Key: DRILL-4119 > URL: https://issues.apache.org/jira/browse/DRILL-4119 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.3.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.4.0 > > > We are seeing substantial skew for an Id column that contains varchar data of > length 32. It is easily reproducible by a group-by query: > {noformat} > Explain plan for SELECT SomeId From table GROUP BY SomeId; > ... > 01-02 HashAgg(group=[{0}]) > 01-03Project(SomeId=[$0]) > 01-04 HashToRandomExchange(dist0=[[$0]]) > 02-01UnorderedMuxExchange > 03-01 Project(SomeId=[$0], > E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))]) > 03-02HashAgg(group=[{0}]) > 03-03 Project(SomeId=[$0]) > {noformat} > The string id happens to be of the following type: > {noformat} > e4b4388e8865819126cb0e4dcaa7261d > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4119) Skew in hash distribution for varchar (and possibly other) types of data
[ https://issues.apache.org/jira/browse/DRILL-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15024951#comment-15024951 ] Mehant Baid commented on DRILL-4119: If we are returning different values from the original implementation then I feel we should fix that issue? I can help out to identify the differences. > Skew in hash distribution for varchar (and possibly other) types of data > > > Key: DRILL-4119 > URL: https://issues.apache.org/jira/browse/DRILL-4119 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.3.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.4.0 > > > We are seeing substantial skew for an Id column that contains varchar data of > length 32. It is easily reproducible by a group-by query: > {noformat} > Explain plan for SELECT SomeId From table GROUP BY SomeId; > ... > 01-02 HashAgg(group=[{0}]) > 01-03Project(SomeId=[$0]) > 01-04 HashToRandomExchange(dist0=[[$0]]) > 02-01UnorderedMuxExchange > 03-01 Project(SomeId=[$0], > E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))]) > 03-02HashAgg(group=[{0}]) > 03-03 Project(SomeId=[$0]) > {noformat} > The string id happens to be of the following type: > {noformat} > e4b4388e8865819126cb0e4dcaa7261d > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4071) Partition pruning fails when a Coalesce() function appears with partition filter
[ https://issues.apache.org/jira/browse/DRILL-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-4071: --- Attachment: (was: DRILL-4071.patch) > Partition pruning fails when a Coalesce() function appears with partition > filter > > > Key: DRILL-4071 > URL: https://issues.apache.org/jira/browse/DRILL-4071 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Aman Sinha >Assignee: Aman Sinha > > Pruning fails for this query: > {code} > 0: jdbc:drill:zk=local> explain plan for select count(*) from > dfs.`/Users/asinha/data/multilevel/parquet` where dir0 = 1994 and > coalesce(o_clerk, 'Clerk') = ''; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[$0]) > 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()]) > 00-03 Project($f0=[0]) > 00-04SelectionVectorRemover > 00-05 Filter(condition=[AND(=($0, 1994), =(CASE(IS NOT NULL($1), > $1, 'Clerk'), ''))]) > 00-06Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q1/orders_94_q1.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q2/orders_94_q2.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q3/orders_94_q3.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q4/orders_94_q4.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q1/orders_95_q1.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q2/orders_95_q2.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q3/orders_95_q3.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q4/orders_95_q4.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q1/orders_96_q1.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q2/orders_96_q2.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q3/orders_96_q3.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q4/orders_96_q4.parquet]], > selectionRoot=file:/Users/asinha/data/multilevel/parquet, numFiles=12, > usedMetadataFile=false, columns=[`dir0`, `o_clerk`]]]) > {code} > The log indicates no partition filters were found: > {code} > ... > o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for > partition pruning.Total pruning elapsed time: 0 ms > {code} > A preliminary analysis indicates that since the Coalesce gets converted to a > CASE(IS NOT NULL) expression, the filter analysis does not correctly > process the full expression tree. At one point in > {{FindPartitionConditions.analyzeCall()}} I saw the operandStack had 3 > elements in it: [NO_PUSH, NO_PUSH, PUSH] which seemed strange since I would > expect even number of elements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4071) Partition pruning fails when a Coalesce() function appears with partition filter
[ https://issues.apache.org/jira/browse/DRILL-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-4071: --- Assignee: Aman Sinha (was: Mehant Baid) > Partition pruning fails when a Coalesce() function appears with partition > filter > > > Key: DRILL-4071 > URL: https://issues.apache.org/jira/browse/DRILL-4071 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Aman Sinha >Assignee: Aman Sinha > Attachments: DRILL-4071.patch > > > Pruning fails for this query: > {code} > 0: jdbc:drill:zk=local> explain plan for select count(*) from > dfs.`/Users/asinha/data/multilevel/parquet` where dir0 = 1994 and > coalesce(o_clerk, 'Clerk') = ''; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[$0]) > 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()]) > 00-03 Project($f0=[0]) > 00-04SelectionVectorRemover > 00-05 Filter(condition=[AND(=($0, 1994), =(CASE(IS NOT NULL($1), > $1, 'Clerk'), ''))]) > 00-06Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q1/orders_94_q1.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q2/orders_94_q2.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q3/orders_94_q3.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q4/orders_94_q4.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q1/orders_95_q1.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q2/orders_95_q2.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q3/orders_95_q3.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q4/orders_95_q4.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q1/orders_96_q1.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q2/orders_96_q2.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q3/orders_96_q3.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q4/orders_96_q4.parquet]], > selectionRoot=file:/Users/asinha/data/multilevel/parquet, numFiles=12, > usedMetadataFile=false, columns=[`dir0`, `o_clerk`]]]) > {code} > The log indicates no partition filters were found: > {code} > ... > o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for > partition pruning.Total pruning elapsed time: 0 ms > {code} > A preliminary analysis indicates that since the Coalesce gets converted to a > CASE(IS NOT NULL) expression, the filter analysis does not correctly > process the full expression tree. At one point in > {{FindPartitionConditions.analyzeCall()}} I saw the operandStack had 3 > elements in it: [NO_PUSH, NO_PUSH, PUSH] which seemed strange since I would > expect even number of elements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4071) Partition pruning fails when a Coalesce() function appears with partition filter
[ https://issues.apache.org/jira/browse/DRILL-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-4071: --- Attachment: DRILL-4071.patch [~amansinha100] can you please review. > Partition pruning fails when a Coalesce() function appears with partition > filter > > > Key: DRILL-4071 > URL: https://issues.apache.org/jira/browse/DRILL-4071 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Aman Sinha >Assignee: Mehant Baid > Attachments: DRILL-4071.patch > > > Pruning fails for this query: > {code} > 0: jdbc:drill:zk=local> explain plan for select count(*) from > dfs.`/Users/asinha/data/multilevel/parquet` where dir0 = 1994 and > coalesce(o_clerk, 'Clerk') = ''; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[$0]) > 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()]) > 00-03 Project($f0=[0]) > 00-04SelectionVectorRemover > 00-05 Filter(condition=[AND(=($0, 1994), =(CASE(IS NOT NULL($1), > $1, 'Clerk'), ''))]) > 00-06Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q1/orders_94_q1.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q2/orders_94_q2.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q3/orders_94_q3.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q4/orders_94_q4.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q1/orders_95_q1.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q2/orders_95_q2.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q3/orders_95_q3.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q4/orders_95_q4.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q1/orders_96_q1.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q2/orders_96_q2.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q3/orders_96_q3.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q4/orders_96_q4.parquet]], > selectionRoot=file:/Users/asinha/data/multilevel/parquet, numFiles=12, > usedMetadataFile=false, columns=[`dir0`, `o_clerk`]]]) > {code} > The log indicates no partition filters were found: > {code} > ... > o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for > partition pruning.Total pruning elapsed time: 0 ms > {code} > A preliminary analysis indicates that since the Coalesce gets converted to a > CASE(IS NOT NULL) expression, the filter analysis does not correctly > process the full expression tree. At one point in > {{FindPartitionConditions.analyzeCall()}} I saw the operandStack had 3 > elements in it: [NO_PUSH, NO_PUSH, PUSH] which seemed strange since I would > expect even number of elements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4071) Partition pruning fails when a Coalesce() function appears with partition filter
[ https://issues.apache.org/jira/browse/DRILL-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-4071: --- Attachment: DRILL-4071.patch Thanks for catching that, forgot to clean it up. > Partition pruning fails when a Coalesce() function appears with partition > filter > > > Key: DRILL-4071 > URL: https://issues.apache.org/jira/browse/DRILL-4071 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Aman Sinha >Assignee: Aman Sinha > Attachments: DRILL-4071.patch > > > Pruning fails for this query: > {code} > 0: jdbc:drill:zk=local> explain plan for select count(*) from > dfs.`/Users/asinha/data/multilevel/parquet` where dir0 = 1994 and > coalesce(o_clerk, 'Clerk') = ''; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[$0]) > 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()]) > 00-03 Project($f0=[0]) > 00-04SelectionVectorRemover > 00-05 Filter(condition=[AND(=($0, 1994), =(CASE(IS NOT NULL($1), > $1, 'Clerk'), ''))]) > 00-06Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q1/orders_94_q1.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q2/orders_94_q2.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q3/orders_94_q3.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1994/Q4/orders_94_q4.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q1/orders_95_q1.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q2/orders_95_q2.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q3/orders_95_q3.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1995/Q4/orders_95_q4.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q1/orders_96_q1.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q2/orders_96_q2.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q3/orders_96_q3.parquet], > ReadEntryWithPath > [path=file:/Users/asinha/data/multilevel/parquet/1996/Q4/orders_96_q4.parquet]], > selectionRoot=file:/Users/asinha/data/multilevel/parquet, numFiles=12, > usedMetadataFile=false, columns=[`dir0`, `o_clerk`]]]) > {code} > The log indicates no partition filters were found: > {code} > ... > o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for > partition pruning.Total pruning elapsed time: 0 ms > {code} > A preliminary analysis indicates that since the Coalesce gets converted to a > CASE(IS NOT NULL) expression, the filter analysis does not correctly > process the full expression tree. At one point in > {{FindPartitionConditions.analyzeCall()}} I saw the operandStack had 3 > elements in it: [NO_PUSH, NO_PUSH, PUSH] which seemed strange since I would > expect even number of elements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4025) Don't invoke getFileStatus() when metadata cache is available
[ https://issues.apache.org/jira/browse/DRILL-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-4025: --- Assignee: Aman Sinha (was: Mehant Baid) > Don't invoke getFileStatus() when metadata cache is available > - > > Key: DRILL-4025 > URL: https://issues.apache.org/jira/browse/DRILL-4025 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Mehant Baid >Assignee: Aman Sinha > Attachments: DRILL-4025.patch > > > Currently we invoke getFileStatus() to list all the files under a directory > even when we have the metadata cache file. The information is already present > in the cache so we don't need to perform this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4025) Don't invoke getFileStatus() when metadata cache is available
[ https://issues.apache.org/jira/browse/DRILL-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-4025: --- Attachment: DRILL-4025.patch > Don't invoke getFileStatus() when metadata cache is available > - > > Key: DRILL-4025 > URL: https://issues.apache.org/jira/browse/DRILL-4025 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Mehant Baid >Assignee: Mehant Baid > Attachments: DRILL-4025.patch > > > Currently we invoke getFileStatus() to list all the files under a directory > even when we have the metadata cache file. The information is already present > in the cache so we don't need to perform this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4025) Don't invoke getFileStatus() when metadata cache is available
[ https://issues.apache.org/jira/browse/DRILL-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992829#comment-14992829 ] Mehant Baid commented on DRILL-4025: [~jnadeau] We aren't changing the behavior of checking if the cache file is in sync with the actual data. That check is done a couple of line earlier in the code [ParquetFormatPlugin.readBlockMeta()|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java#L229]. What we are avoiding in my patch is the additional ls in [FileSelection.init()|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java#L141] to populate the FileStatus in the case it is null. However, I will run a small test to confirm this and report the result. > Don't invoke getFileStatus() when metadata cache is available > - > > Key: DRILL-4025 > URL: https://issues.apache.org/jira/browse/DRILL-4025 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Mehant Baid >Assignee: Mehant Baid > Attachments: DRILL-4025.patch > > > Currently we invoke getFileStatus() to list all the files under a directory > even when we have the metadata cache file. The information is already present > in the cache so we don't need to perform this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4025) Reduce getFileStatus() invocation for Parquet by 1
[ https://issues.apache.org/jira/browse/DRILL-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-4025: --- Summary: Reduce getFileStatus() invocation for Parquet by 1 (was: Don't invoke getFileStatus() when metadata cache is available) > Reduce getFileStatus() invocation for Parquet by 1 > -- > > Key: DRILL-4025 > URL: https://issues.apache.org/jira/browse/DRILL-4025 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Mehant Baid >Assignee: Mehant Baid > Attachments: DRILL-4025.patch > > > Currently we invoke getFileStatus() to list all the files under a directory > even when we have the metadata cache file. The information is already present > in the cache so we don't need to perform this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4025) Don't invoke getFileStatus() when metadata cache is available
[ https://issues.apache.org/jira/browse/DRILL-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992855#comment-14992855 ] Mehant Baid commented on DRILL-4025: Agreed, the title of the JIRA is a bit misleading, I'll change it. I ran a quick test and made sure the metadata cache and the data sync logic works as expected with my patch. > Don't invoke getFileStatus() when metadata cache is available > - > > Key: DRILL-4025 > URL: https://issues.apache.org/jira/browse/DRILL-4025 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Mehant Baid >Assignee: Mehant Baid > Attachments: DRILL-4025.patch > > > Currently we invoke getFileStatus() to list all the files under a directory > even when we have the metadata cache file. The information is already present > in the cache so we don't need to perform this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4025) Don't invoke getFileStatus() when metadata cache is available
Mehant Baid created DRILL-4025: -- Summary: Don't invoke getFileStatus() when metadata cache is available Key: DRILL-4025 URL: https://issues.apache.org/jira/browse/DRILL-4025 Project: Apache Drill Issue Type: Bug Affects Versions: 1.3.0 Reporter: Mehant Baid Assignee: Mehant Baid Currently we invoke getFileStatus() to list all the files under a directory even when we have the metadata cache file. The information is already present in the cache so we don't need to perform this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3941) Add timing instrumentation around Partition Pruning
[ https://issues.apache.org/jira/browse/DRILL-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3941: --- Assignee: Aman Sinha (was: Mehant Baid) > Add timing instrumentation around Partition Pruning > --- > > Key: DRILL-3941 > URL: https://issues.apache.org/jira/browse/DRILL-3941 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Aman Sinha > > We seem to spending a chunk time doing partition pruning, it would be good to > log timing information to indicate the amount of time we spend doing pruning. > A little more granularity to indicate the time taken to build the filter tree > and in the interpreter would also be good. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3634) Hive Scan : Add fileCount (no of files scanned) or no of partitions scanned to the text plan
[ https://issues.apache.org/jira/browse/DRILL-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3634: --- Assignee: Aman Sinha (was: Mehant Baid) > Hive Scan : Add fileCount (no of files scanned) or no of partitions scanned > to the text plan > > > Key: DRILL-3634 > URL: https://issues.apache.org/jira/browse/DRILL-3634 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Aman Sinha > Fix For: Future > > > The hive scan portion of the text plan only lists the files scanned. It would > be helpful if the text plan also had fileCount value or the number of > partitions scanned. > Reason : Currently as part of our tests we are verifying plans using a regex > based verification and the expected regex is matching more than it should. > Fixing this might be hard. So if we have the fileCount/partitionCount as part > of the plan, the plan comparision will be more accurate -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-3634) Hive Scan : Add fileCount (no of files scanned) or no of partitions scanned to the text plan
[ https://issues.apache.org/jira/browse/DRILL-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid reassigned DRILL-3634: -- Assignee: Mehant Baid > Hive Scan : Add fileCount (no of files scanned) or no of partitions scanned > to the text plan > > > Key: DRILL-3634 > URL: https://issues.apache.org/jira/browse/DRILL-3634 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Mehant Baid > Fix For: Future > > > The hive scan portion of the text plan only lists the files scanned. It would > be helpful if the text plan also had fileCount value or the number of > partitions scanned. > Reason : Currently as part of our tests we are verifying plans using a regex > based verification and the expected regex is matching more than it should. > Fixing this might be hard. So if we have the fileCount/partitionCount as part > of the plan, the plan comparision will be more accurate -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3975) Partition Planning rule causes query failure due to IndexOutOfBoundsException on HDFS
[ https://issues.apache.org/jira/browse/DRILL-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972998#comment-14972998 ] Mehant Baid commented on DRILL-3975: This particular bug was happening when ParquetPruneScanRule was hitting IOOB in the logic you pointed out when there was no need to perform any splitting (since for auto partitioning scheme we get the partitioning column value from the file and not from the location). However while debugging this I found that the "selectionRoot" contained the scheme and "file" did not contain the scheme potentially causing IOOB you might be seeing. Stripping out the scheme makes sense, we cannot check for -1 [Here|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/DFSPartitionLocation.java#L30] as it would cause the partitioning columns to be incorrectly empty. > Partition Planning rule causes query failure due to IndexOutOfBoundsException > on HDFS > - > > Key: DRILL-3975 > URL: https://issues.apache.org/jira/browse/DRILL-3975 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Jacques Nadeau > > In attempting to run the extended test suite provided by MapR, there are a > large number of queries that fail due to issues in the PruneScanRule and > specifically the DFSPartitionLocation constructor line 31. It is likely due > to issues with the code that are related to running on HDFS where this code > path has apparently not been tested. > An example test query this type of failure occurred: > /src/drill-test-framework/resources/Functional/ctas/ctas_auto_partition/tpch0.01_multiple_partitions/data/q11.q > Example stack trace below: > {code} > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > StringIndexOutOfBoundsException: String index out of range: -12 > [Error Id: f2941267-49b1-4f67-a17f-610ffb13fcb7 on > ip-172-31-30-32.us-west-2.compute.internal:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) > ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) > [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) > [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_85] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_85] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85] > Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected > exception during fragment initialization: Internal error: Error while > applying rule PruneScanRule:Filter_On_Scan_Parquet, args > [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0, > 1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, > ctasAutoPartition, > tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan > [entries=[ReadEntryWithPath > [path=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2]], > > selectionRoot=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2, > numFiles=1, usedMetadataFile=false, columns=[`l_modline`, `l_moddate`]])] > ... 4 common frames omitted > Caused by: java.lang.AssertionError: Internal error: Error while applying > rule PruneScanRule:Filter_On_Scan_Parquet, args > [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0, > 1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, > ctasAutoPartition, > tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan >
[jira] [Created] (DRILL-3965) Index out of bounds exception in partition pruning
Mehant Baid created DRILL-3965: -- Summary: Index out of bounds exception in partition pruning Key: DRILL-3965 URL: https://issues.apache.org/jira/browse/DRILL-3965 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Hit IOOB while trying to perform partition pruning on a table that was created using CTAS auto partitioning with the below stack trace. Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -8 at java.lang.String.substring(String.java:1875) ~[na:1.7.0_79] at org.apache.drill.exec.planner.DFSPartitionLocation.(DFSPartitionLocation.java:31) ~[drill-java-exec-1.2.0.jar:1.2.0] at org.apache.drill.exec.planner.ParquetPartitionDescriptor.createPartitionSublists(ParquetPartitionDescriptor.java:126) ~[drill-java-exec-1.2.0.jar:1.2.0] at org.apache.drill.exec.planner.AbstractPartitionDescriptor.iterator(AbstractPartitionDescriptor.java:53) ~[drill-java-exec-1.2.0.jar:1.2.0] at org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:190) ~[drill-java-exec-1.2.0.jar:1.2.0] at org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87) ~[drill-java-exec-1.2.0.jar:1.2.0] at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) ~[calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3965) Index out of bounds exception in partition pruning
[ https://issues.apache.org/jira/browse/DRILL-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969986#comment-14969986 ] Mehant Baid commented on DRILL-3965: I don't think so, looking at the stack trace in the DRILL-3376 it seems like a separate issue. > Index out of bounds exception in partition pruning > -- > > Key: DRILL-3965 > URL: https://issues.apache.org/jira/browse/DRILL-3965 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Aman Sinha > Attachments: DRILL-3965.patch > > > Hit IOOB while trying to perform partition pruning on a table that was > created using CTAS auto partitioning with the below stack trace. > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -8 > at java.lang.String.substring(String.java:1875) ~[na:1.7.0_79] > at > org.apache.drill.exec.planner.DFSPartitionLocation.(DFSPartitionLocation.java:31) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.ParquetPartitionDescriptor.createPartitionSublists(ParquetPartitionDescriptor.java:126) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.AbstractPartitionDescriptor.iterator(AbstractPartitionDescriptor.java:53) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:190) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) > ~[calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3965) Index out of bounds exception in partition pruning
[ https://issues.apache.org/jira/browse/DRILL-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3965: --- Attachment: DRILL-3965.patch [~amansinha100] can you please review. > Index out of bounds exception in partition pruning > -- > > Key: DRILL-3965 > URL: https://issues.apache.org/jira/browse/DRILL-3965 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Mehant Baid > Attachments: DRILL-3965.patch > > > Hit IOOB while trying to perform partition pruning on a table that was > created using CTAS auto partitioning with the below stack trace. > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -8 > at java.lang.String.substring(String.java:1875) ~[na:1.7.0_79] > at > org.apache.drill.exec.planner.DFSPartitionLocation.(DFSPartitionLocation.java:31) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.ParquetPartitionDescriptor.createPartitionSublists(ParquetPartitionDescriptor.java:126) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.AbstractPartitionDescriptor.iterator(AbstractPartitionDescriptor.java:53) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:190) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) > ~[calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3965) Index out of bounds exception in partition pruning
[ https://issues.apache.org/jira/browse/DRILL-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3965: --- Assignee: Aman Sinha (was: Mehant Baid) > Index out of bounds exception in partition pruning > -- > > Key: DRILL-3965 > URL: https://issues.apache.org/jira/browse/DRILL-3965 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Aman Sinha > Attachments: DRILL-3965.patch > > > Hit IOOB while trying to perform partition pruning on a table that was > created using CTAS auto partitioning with the below stack trace. > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -8 > at java.lang.String.substring(String.java:1875) ~[na:1.7.0_79] > at > org.apache.drill.exec.planner.DFSPartitionLocation.(DFSPartitionLocation.java:31) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.ParquetPartitionDescriptor.createPartitionSublists(ParquetPartitionDescriptor.java:126) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.AbstractPartitionDescriptor.iterator(AbstractPartitionDescriptor.java:53) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:190) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) > ~[calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3965) Index out of bounds exception in partition pruning
[ https://issues.apache.org/jira/browse/DRILL-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3965: --- Attachment: (was: DRILL-3965.patch) > Index out of bounds exception in partition pruning > -- > > Key: DRILL-3965 > URL: https://issues.apache.org/jira/browse/DRILL-3965 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Mehant Baid > Attachments: DRILL-3965.patch > > > Hit IOOB while trying to perform partition pruning on a table that was > created using CTAS auto partitioning with the below stack trace. > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -8 > at java.lang.String.substring(String.java:1875) ~[na:1.7.0_79] > at > org.apache.drill.exec.planner.DFSPartitionLocation.(DFSPartitionLocation.java:31) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.ParquetPartitionDescriptor.createPartitionSublists(ParquetPartitionDescriptor.java:126) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.AbstractPartitionDescriptor.iterator(AbstractPartitionDescriptor.java:53) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:190) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) > ~[calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3965) Index out of bounds exception in partition pruning
[ https://issues.apache.org/jira/browse/DRILL-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3965: --- Attachment: DRILL-3965.patch Updated patch with minor changes > Index out of bounds exception in partition pruning > -- > > Key: DRILL-3965 > URL: https://issues.apache.org/jira/browse/DRILL-3965 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Mehant Baid > Attachments: DRILL-3965.patch > > > Hit IOOB while trying to perform partition pruning on a table that was > created using CTAS auto partitioning with the below stack trace. > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -8 > at java.lang.String.substring(String.java:1875) ~[na:1.7.0_79] > at > org.apache.drill.exec.planner.DFSPartitionLocation.(DFSPartitionLocation.java:31) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.ParquetPartitionDescriptor.createPartitionSublists(ParquetPartitionDescriptor.java:126) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.AbstractPartitionDescriptor.iterator(AbstractPartitionDescriptor.java:53) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:190) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87) > ~[drill-java-exec-1.2.0.jar:1.2.0] > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) > ~[calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3429) DrillAvgVarianceConvertlet may produce wrong results while rewriting stddev, variance
[ https://issues.apache.org/jira/browse/DRILL-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3429: --- Attachment: (was: DRILL-3429.patch) > DrillAvgVarianceConvertlet may produce wrong results while rewriting stddev, > variance > - > > Key: DRILL-3429 > URL: https://issues.apache.org/jira/browse/DRILL-3429 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Aman Sinha >Priority: Critical > Fix For: 1.3.0 > > > DrillAvgVarianceConvertlet currently rewrites aggregate functions like avg, > stddev, variance to simple computations. > Eg: > Stddev( x ) => power( > (sum(x * x) - sum( x ) * sum( x ) / count( x )) > / count( x ), > .5) > Consider the case when the input is an integer. Now the rewrite contains > multiplication and division, which will bind to functions that operate on > integers however the expected result should be a double and since double has > more precision than integer we should be operating on double during the > multiplication and division. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3429) DrillAvgVarianceConvertlet may produce wrong results while rewriting stddev, variance
[ https://issues.apache.org/jira/browse/DRILL-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3429: --- Attachment: DRILL-3429.patch Addressed review comment. > DrillAvgVarianceConvertlet may produce wrong results while rewriting stddev, > variance > - > > Key: DRILL-3429 > URL: https://issues.apache.org/jira/browse/DRILL-3429 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Aman Sinha >Priority: Critical > Fix For: 1.3.0 > > Attachments: DRILL-3429.patch > > > DrillAvgVarianceConvertlet currently rewrites aggregate functions like avg, > stddev, variance to simple computations. > Eg: > Stddev( x ) => power( > (sum(x * x) - sum( x ) * sum( x ) / count( x )) > / count( x ), > .5) > Consider the case when the input is an integer. Now the rewrite contains > multiplication and division, which will bind to functions that operate on > integers however the expected result should be a double and since double has > more precision than integer we should be operating on double during the > multiplication and division. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3941) Add timing instrumentation around Partition Pruning
Mehant Baid created DRILL-3941: -- Summary: Add timing instrumentation around Partition Pruning Key: DRILL-3941 URL: https://issues.apache.org/jira/browse/DRILL-3941 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid We seem to spending a chunk time doing partition pruning, it would be good to log timing information to indicate the amount of time we spend doing pruning. A little more granularity to indicate the time taken to build the filter tree and in the interpreter would also be good. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3429) DrillAvgVarianceConvertlet may produce wrong results while rewriting stddev, variance
[ https://issues.apache.org/jira/browse/DRILL-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3429: --- Assignee: Aman Sinha (was: Mehant Baid) > DrillAvgVarianceConvertlet may produce wrong results while rewriting stddev, > variance > - > > Key: DRILL-3429 > URL: https://issues.apache.org/jira/browse/DRILL-3429 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Aman Sinha >Priority: Critical > Fix For: 1.3.0 > > Attachments: DRILL-3429.patch > > > DrillAvgVarianceConvertlet currently rewrites aggregate functions like avg, > stddev, variance to simple computations. > Eg: > Stddev( x ) => power( > (sum(x * x) - sum( x ) * sum( x ) / count( x )) > / count( x ), > .5) > Consider the case when the input is an integer. Now the rewrite contains > multiplication and division, which will bind to functions that operate on > integers however the expected result should be a double and since double has > more precision than integer we should be operating on double during the > multiplication and division. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-3936) We don't handle out of memory condition during build phase of hash join
[ https://issues.apache.org/jira/browse/DRILL-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid reassigned DRILL-3936: -- Assignee: Mehant Baid > We don't handle out of memory condition during build phase of hash join > --- > > Key: DRILL-3936 > URL: https://issues.apache.org/jira/browse/DRILL-3936 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Reporter: Victoria Markman >Assignee: Mehant Baid > > It looks like we just fall through ( see excerpt from HashJoinBatch.java > below ) > {code:java} > public void executeBuildPhase() throws SchemaChangeException, > ClassTransformationException, IOException { > //Setup the underlying hash table > // skip first batch if count is zero, as it may be an empty schema batch > if (right.getRecordCount() == 0) { > for (final VectorWrapper w : right) { > w.clear(); > } > rightUpstream = next(right); > } > boolean moreData = true; > while (moreData) { > switch (rightUpstream) { > case OUT_OF_MEMORY: > case NONE: > case NOT_YET: > case STOP: > moreData = false; > continue; > ... > {code} > We don't handle it later either: > {code:java} > public IterOutcome innerNext() { > try { > /* If we are here for the first time, execute the build phase of the >* hash join and setup the run time generated class for the probe side >*/ > if (state == BatchState.FIRST) { > // Build the hash table, using the build side record batches. > executeBuildPhase(); > //IterOutcome next = next(HashJoinHelper.LEFT_INPUT, > left); > hashJoinProbe.setupHashJoinProbe(context, hyperContainer, left, > left.getRecordCount(), this, hashTable, > hjHelper, joinType); > // Update the hash table related stats for the operator > updateStats(this.hashTable); > } > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3764) Support the ability to identify and/or skip records when a function evaluation fails
[ https://issues.apache.org/jira/browse/DRILL-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950005#comment-14950005 ] Mehant Baid commented on DRILL-3764: I had worked on providing a similar functionality with [~jnadeau] on providing a framework (annotations for errors in function template and necessary addition to the runtime code gen to handle errors) to be able to deal with errors in function evaluation. Here is the branch, https://github.com/mehant/drill/commit/3e81a776d1c1bb0ce7f64d8c5a905c87d71e42e0 (this is old, most likely won't rebase cleanly, I can work on rebasing if deemed useful). The basic idea was to provide a way to specify different type of errors within the UDF and in case of an error use null for that row. > Support the ability to identify and/or skip records when a function > evaluation fails > > > Key: DRILL-3764 > URL: https://issues.apache.org/jira/browse/DRILL-3764 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.1.0 >Reporter: Aman Sinha > Fix For: Future > > > Drill can point out the filename and location of corrupted records in a file > but it does not have a good mechanism to deal with the following scenario: > Consider a text file with 2 records: > {code} > $ cat t4.csv > 10,2001 > 11,http://www.cnn.com > {code} > {code} > 0: jdbc:drill:zk=local> alter session set `exec.errors.verbose` = true; > 0: jdbc:drill:zk=local> select cast(columns[0] as init), cast(columns[1] as > bigint) from dfs.`t4.csv`; > Error: SYSTEM ERROR: NumberFormatException: http://www.cnn.com > Fragment 0:0 > [Error Id: 72aad22c-a345-4100-9a57-dcd8436105f7 on 10.250.56.140:31010] > (java.lang.NumberFormatException) http://www.cnn.com > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeL():91 > > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varCharToLong():62 > org.apache.drill.exec.test.generated.ProjectorGen1.doEval():62 > org.apache.drill.exec.test.generated.ProjectorGen1.projectRecords():62 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():172 > {code} > The problem is user does not have the context of where the error occurred > -either the file name or the record number. This becomes a pain point > especially when CTAS is being used to do data conversion from (say) text > format to Parquet format. The CTAS may be accessing thousands of files and 1 > such casting (or another function) failure aborts the query. > It would substantially improve the user experience if we provided: > 1) the filename and record number where this failure occurred > 2) the ability to skip such records depending on a session option > 3) the ability to write such records to a staging table for future ingestion > Please see discussion on dev list: > http://mail-archives.apache.org/mod_mbox/drill-dev/201509.mbox/%3cCAFyDVvLuPLgTNZ56S6=J=9Vb=aBs=pdw7nrhkkdupbdxgfa...@mail.gmail.com%3e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3901) Performance regression with doing Explain of COUNT(*) over 100K files
[ https://issues.apache.org/jira/browse/DRILL-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947124#comment-14947124 ] Mehant Baid commented on DRILL-3901: +1. The change looks good to me. > Performance regression with doing Explain of COUNT(*) over 100K files > - > > Key: DRILL-3901 > URL: https://issues.apache.org/jira/browse/DRILL-3901 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Aman Sinha >Assignee: Mehant Baid > Attachments: > 0001-DRILL-3901-Don-t-do-early-expansion-of-directory-in-.patch > > > We are seeing a performance regression when doing an Explain of SELECT > COUNT(*) over 100K files in a flat directory (no subdirectories) on latest > master branch compared to a run that was done on Sept 26. Some initial > details (I will have more later): > {code} > master branch on Sept 26 >No metadata cache: 71.452 secs >With metadata cache: 15.804 secs > Latest master branch >No metadata cache: 110 secs >With metadata cache: 32 secs > {code} > So, both cases show regression. > [~mehant] and I took an initial look at this and it appears we might be doing > the directory expansion twice. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3901) Performance regression with doing Explain of COUNT(*) over 100K files
[ https://issues.apache.org/jira/browse/DRILL-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945960#comment-14945960 ] Mehant Baid commented on DRILL-3901: Wanted to quickly update the status on this, I have a patch for avoiding listing files in a directory twice (will post patch soon). I am waiting for some performance feedback, will post the findings once I have them. [~sphillips] can you please file a separate JIRA for the issue you mentioned. > Performance regression with doing Explain of COUNT(*) over 100K files > - > > Key: DRILL-3901 > URL: https://issues.apache.org/jira/browse/DRILL-3901 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Aman Sinha >Assignee: Mehant Baid > > We are seeing a performance regression when doing an Explain of SELECT > COUNT(*) over 100K files in a flat directory (no subdirectories) on latest > master branch compared to a run that was done on Sept 26. Some initial > details (I will have more later): > {code} > master branch on Sept 26 >No metadata cache: 71.452 secs >With metadata cache: 15.804 secs > Latest master branch >No metadata cache: 110 secs >With metadata cache: 32 secs > {code} > So, both cases show regression. > [~mehant] and I took an initial look at this and it appears we might be doing > the directory expansion twice. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3788) Directory based partition pruning not taking effect with metadata caching
[ https://issues.apache.org/jira/browse/DRILL-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3788: --- Attachment: DRILL-3788.patch [~sphillips] can you please review. > Directory based partition pruning not taking effect with metadata caching > - > > Key: DRILL-3788 > URL: https://issues.apache.org/jira/browse/DRILL-3788 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.2.0 >Reporter: Rahul Challapalli >Assignee: Mehant Baid >Priority: Critical > Fix For: 1.2.0 > > Attachments: DRILL-3788.patch, lineitem.tgz, plan.txt > > > git.commit.id.abbrev=240a455 > Partition Pruning did not take place for the below query after I executed the > "refresh table metadata command" > {code} > explain plan for > select > l_returnflag, > l_linestatus > from > `lineitem/2006/1` > where > dir0=1 or dir0=2 > {code} > The logs did not indicate that "pruning did not take place" > Before executing the refresh table metadata command, partition pruning did > take effect > I am not attaching the data set as it is larger than 10MB. Reach out to me if > you need more information -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3577) Counting nested fields on CTAS-created-parquet file/s reports inaccurate results
[ https://issues.apache.org/jira/browse/DRILL-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3577: --- Fix Version/s: (was: 1.2.0) 1.3.0 > Counting nested fields on CTAS-created-parquet file/s reports inaccurate > results > > > Key: DRILL-3577 > URL: https://issues.apache.org/jira/browse/DRILL-3577 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.1.0 >Reporter: Hanifi Gunes >Assignee: Mehant Baid >Priority: Critical > Fix For: 1.3.0 > > > I have not tried this at a smaller scale nor on JSON file directly but the > following seems to re-prod the issue > 1. Create an input file as follows > 20K rows with the following - > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}} > 200 rows with the following - > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last > entries only"}} > 2. CTAS as follows > {code:sql} > CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t > {code} > This should read > {code} > Fragment Number of records written > 0_0 20200 > {code} > 3. Count on nested fields via > {code:sql} > select count(t.others.additional) from dfs.`tmp`.`tp` t > OR > select count(t.others.other) from dfs.`tmp`.`tp` t > {code} > reports no rows as follows > {code} > EXPR$0 > 0 > {code} > While > {code:sql} > select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not > null > {code} > reports expected 200 rows > {code} > EXPR$0 > 200 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3819) Remove redundant filter for files start with "."
[ https://issues.apache.org/jira/browse/DRILL-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3819: --- Assignee: Deneche A. Hakim (was: Mehant Baid) > Remove redundant filter for files start with "." > > > Key: DRILL-3819 > URL: https://issues.apache.org/jira/browse/DRILL-3819 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Deneche A. Hakim > Fix For: 1.2.0 > > Attachments: DRILL-3819.patch > > > Due to a minor issue in resolving merge conflict between drop table and > refresh metadata, we now have two checks for the same filter (files starting > with "."). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3817) Refresh metadata does not work when used with sub schema
[ https://issues.apache.org/jira/browse/DRILL-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3817: --- Attachment: DRILL-3817.patch Minor patch, [~vkorukanti] please review. > Refresh metadata does not work when used with sub schema > -- > > Key: DRILL-3817 > URL: https://issues.apache.org/jira/browse/DRILL-3817 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Mehant Baid > Fix For: 1.2.0 > > Attachments: DRILL-3817.patch > > > refresh table metadata dfs.tmp.`lineitem` does not work, hit the following > exception > org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: > org.apache.calcite.sql.SqlBasicCall cannot be cast to > org.apache.calcite.sql.SqlIdentifier > If the sub schema is removed it works. > refresh table metadata dfs.`/tmp/lineitem` -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3819) Remove redundant filter for files start with "."
[ https://issues.apache.org/jira/browse/DRILL-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3819: --- Attachment: DRILL-3819.patch Its a minor patch, [~adeneche] please review. > Remove redundant filter for files start with "." > > > Key: DRILL-3819 > URL: https://issues.apache.org/jira/browse/DRILL-3819 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Mehant Baid > Fix For: 1.2.0 > > Attachments: DRILL-3819.patch > > > Due to a minor issue in resolving merge conflict between drop table and > refresh metadata, we now have two checks for the same filter (files starting > with "."). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3824) Cancelling the "refresh table metadata" command does not cancel it on the drillbit
[ https://issues.apache.org/jira/browse/DRILL-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903706#comment-14903706 ] Mehant Baid commented on DRILL-3824: This is a known issue not related to refresh table. We don't support cancellation during this stage, so commands like drop, show files etc will also have the same problem. We need to address this in a more broader sense. > Cancelling the "refresh table metadata" command does not cancel it on the > drillbit > -- > > Key: DRILL-3824 > URL: https://issues.apache.org/jira/browse/DRILL-3824 > Project: Apache Drill > Issue Type: Bug > Components: Metadata, Query Planning & Optimization >Reporter: Rahul Challapalli >Assignee: Aman Sinha > > git.commit.id.abbrev=3c89b30 > I cancelled the below command from sqlline. As we can see, sqlline returned > immediately but on the backend the drillbit still continues executing the > "refresh" command. This is mis-leading to the end user. > {code} > 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata > dfs.`/drill/testdata/tpch100_5files/lineitem`; > Error: SQL statement execution canceled; ResultSet now closed. > (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2424) Ignore hidden files in directory path
[ https://issues.apache.org/jira/browse/DRILL-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902736#comment-14902736 ] Mehant Baid commented on DRILL-2424: This was added recently. Drill should now ignore files beginning with a "." or "_" > Ignore hidden files in directory path > - > > Key: DRILL-2424 > URL: https://issues.apache.org/jira/browse/DRILL-2424 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON, Storage - Text & CSV >Affects Versions: 0.7.0 >Reporter: Andries Engelbrecht >Assignee: Steven Phillips > Fix For: 1.2.0 > > > When streaming data to the DFS some records can be incomplete during the > temporary write phase for the last file(s). These file typically have a > different extension like '.tmp' or can be marked hidden with a prefix of '.' > . > Querying the directory path will Drill will then cause a query error as some > records may not be complete in the temporary files. Having the ability to > have Drill ignore hidden files and/or to only read files of designated > extension in the workspace will resolve this problem. > Example is using Flume to stream JSON files to a directory structure, the > HDFS sink creates .tmp files (can be hidden with . prefix) that contains > incomplete JSON objects till the file is closed and the .tmp extension (or > prefix) is removed. Attempting to query the directory structure with Drill > then results in errors due to the incomplete JSON object(s) in the tmp files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2424) Ignore hidden files in directory path
[ https://issues.apache.org/jira/browse/DRILL-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902743#comment-14902743 ] Mehant Baid commented on DRILL-2424: Looking at the code, there seems to have been some merge conflict issue between Drop table and Refresh metadata we now have the filter for files beginning with "." twice. Will file a JIRA and fix it. > Ignore hidden files in directory path > - > > Key: DRILL-2424 > URL: https://issues.apache.org/jira/browse/DRILL-2424 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON, Storage - Text & CSV >Affects Versions: 0.7.0 >Reporter: Andries Engelbrecht >Assignee: Mehant Baid > Fix For: 1.2.0 > > > When streaming data to the DFS some records can be incomplete during the > temporary write phase for the last file(s). These file typically have a > different extension like '.tmp' or can be marked hidden with a prefix of '.' > . > Querying the directory path will Drill will then cause a query error as some > records may not be complete in the temporary files. Having the ability to > have Drill ignore hidden files and/or to only read files of designated > extension in the workspace will resolve this problem. > Example is using Flume to stream JSON files to a directory structure, the > HDFS sink creates .tmp files (can be hidden with . prefix) that contains > incomplete JSON objects till the file is closed and the .tmp extension (or > prefix) is removed. Attempting to query the directory structure with Drill > then results in errors due to the incomplete JSON object(s) in the tmp files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3819) Remove redundant filter for files start with "."
Mehant Baid created DRILL-3819: -- Summary: Remove redundant filter for files start with "." Key: DRILL-3819 URL: https://issues.apache.org/jira/browse/DRILL-3819 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 1.2.0 Due to a minor issue in resolving merge conflict between drop table and refresh metadata, we now have two checks for the same filter (files starting with "."). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2424) Ignore hidden files in directory path
[ https://issues.apache.org/jira/browse/DRILL-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid resolved DRILL-2424. Resolution: Duplicate Assignee: Mehant Baid (was: Steven Phillips) > Ignore hidden files in directory path > - > > Key: DRILL-2424 > URL: https://issues.apache.org/jira/browse/DRILL-2424 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON, Storage - Text & CSV >Affects Versions: 0.7.0 >Reporter: Andries Engelbrecht >Assignee: Mehant Baid > Fix For: 1.2.0 > > > When streaming data to the DFS some records can be incomplete during the > temporary write phase for the last file(s). These file typically have a > different extension like '.tmp' or can be marked hidden with a prefix of '.' > . > Querying the directory path will Drill will then cause a query error as some > records may not be complete in the temporary files. Having the ability to > have Drill ignore hidden files and/or to only read files of designated > extension in the workspace will resolve this problem. > Example is using Flume to stream JSON files to a directory structure, the > HDFS sink creates .tmp files (can be hidden with . prefix) that contains > incomplete JSON objects till the file is closed and the .tmp extension (or > prefix) is removed. Attempting to query the directory structure with Drill > then results in errors due to the incomplete JSON object(s) in the tmp files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3817) Refresh metadata does not work when used with sub schema
Mehant Baid created DRILL-3817: -- Summary: Refresh metadata does not work when used with sub schema Key: DRILL-3817 URL: https://issues.apache.org/jira/browse/DRILL-3817 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 1.2.0 refresh table metadata dfs.tmp.`lineitem` does not work, hit the following exception org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: org.apache.calcite.sql.SqlBasicCall cannot be cast to org.apache.calcite.sql.SqlIdentifier If the sub schema is removed it works. refresh table metadata dfs.`/tmp/lineitem` -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3761) CastIntDecimal implementation should not update the input holder.
[ https://issues.apache.org/jira/browse/DRILL-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745877#comment-14745877 ] Mehant Baid commented on DRILL-3761: +1. We should also add logic to enforce the constraint that the input holders are immutable, this can be addressed in a separate JIRA. > CastIntDecimal implementation should not update the input holder. > -- > > Key: DRILL-3761 > URL: https://issues.apache.org/jira/browse/DRILL-3761 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Reporter: Jinfeng Ni >Assignee: Mehant Baid > Attachments: > 0001-DRILL-3761-Modify-CastIntDecimal-implementation-so-t.patch > > > CastIntDecimal implementation would update the input holder's value, which > may cause some side effect. This is especially true, when the run-time > generated code tries to re-use the holder for common expressions. > In general, Drill's build-in/UDF implementation had better not modify the > input holder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3535) Drop table support
[ https://issues.apache.org/jira/browse/DRILL-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid resolved DRILL-3535. Resolution: Fixed Fixed in 2a191847154203871454b229d8ef322766aa9ee4 > Drop table support > -- > > Key: DRILL-3535 > URL: https://issues.apache.org/jira/browse/DRILL-3535 > Project: Apache Drill > Issue Type: New Feature >Reporter: Mehant Baid >Assignee: Mehant Baid > > Umbrella JIRA to track support for "Drop table" feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3535) Drop table support
[ https://issues.apache.org/jira/browse/DRILL-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724390#comment-14724390 ] Mehant Baid commented on DRILL-3535: [~amansinha100] [~vkorukanti] can you please review. > Drop table support > -- > > Key: DRILL-3535 > URL: https://issues.apache.org/jira/browse/DRILL-3535 > Project: Apache Drill > Issue Type: New Feature >Reporter: Mehant Baid >Assignee: Mehant Baid > > Umbrella JIRA to track support for "Drop table" feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3045) Drill is not partition pruning due to internal off-heap memory limit for planning phase
[ https://issues.apache.org/jira/browse/DRILL-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3045: --- Attachment: (was: DRILL-3045.patch) Drill is not partition pruning due to internal off-heap memory limit for planning phase --- Key: DRILL-3045 URL: https://issues.apache.org/jira/browse/DRILL-3045 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Mehant Baid Fix For: 1.2.0 Attachments: DRILL-3045.patch The symptom is: we are running simple query of the form select x from t where dir0='xyz and dir1='2015-01-01'; partition pruning works for a while and then it stops working. Query does run (since we don't fail the query in the case when we failed to prune) and return correct results. drillbit.log {code} 015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN o.a.d.exec.memory.BufferAllocator - Unable to allocate buffer of size 5000 due to memory limit. Current allocation: 16776840 java.lang.Exception: null at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:220) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:231) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:333) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223) [optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661) [optiq-core-0.9-drill-r20.jar:na] at net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) [optiq-core-0.9-drill-r20.jar:na] at net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:275) [optiq-core-0.9-drill-r20.jar:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:206) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:773) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:204) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] 2015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune partition. java.lang.NullPointerException: null at org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:334) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223) [optiq-core-0.9-drill-r20.jar:na] at
[jira] [Updated] (DRILL-3045) Drill is not partition pruning due to internal off-heap memory limit for planning phase
[ https://issues.apache.org/jira/browse/DRILL-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3045: --- Attachment: DRILL-3045.patch addressed review comments. Drill is not partition pruning due to internal off-heap memory limit for planning phase --- Key: DRILL-3045 URL: https://issues.apache.org/jira/browse/DRILL-3045 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Mehant Baid Fix For: 1.2.0 Attachments: DRILL-3045.patch The symptom is: we are running simple query of the form select x from t where dir0='xyz and dir1='2015-01-01'; partition pruning works for a while and then it stops working. Query does run (since we don't fail the query in the case when we failed to prune) and return correct results. drillbit.log {code} 015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN o.a.d.exec.memory.BufferAllocator - Unable to allocate buffer of size 5000 due to memory limit. Current allocation: 16776840 java.lang.Exception: null at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:220) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:231) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:333) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223) [optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661) [optiq-core-0.9-drill-r20.jar:na] at net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) [optiq-core-0.9-drill-r20.jar:na] at net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:275) [optiq-core-0.9-drill-r20.jar:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:206) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:773) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:204) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] 2015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune partition. java.lang.NullPointerException: null at org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:334) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223) [optiq-core-0.9-drill-r20.jar:na] at
[jira] [Updated] (DRILL-3045) Drill is not partition pruning due to internal off-heap memory limit for planning phase
[ https://issues.apache.org/jira/browse/DRILL-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3045: --- Attachment: DRILL-3045.patch [~amansinha100] can you please review. Drill is not partition pruning due to internal off-heap memory limit for planning phase --- Key: DRILL-3045 URL: https://issues.apache.org/jira/browse/DRILL-3045 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Mehant Baid Fix For: 1.2.0 Attachments: DRILL-3045.patch The symptom is: we are running simple query of the form select x from t where dir0='xyz and dir1='2015-01-01'; partition pruning works for a while and then it stops working. Query does run (since we don't fail the query in the case when we failed to prune) and return correct results. drillbit.log {code} 015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN o.a.d.exec.memory.BufferAllocator - Unable to allocate buffer of size 5000 due to memory limit. Current allocation: 16776840 java.lang.Exception: null at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:220) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:231) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:333) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223) [optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661) [optiq-core-0.9-drill-r20.jar:na] at net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) [optiq-core-0.9-drill-r20.jar:na] at net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:275) [optiq-core-0.9-drill-r20.jar:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:206) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:773) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:204) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] 2015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune partition. java.lang.NullPointerException: null at org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:334) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223) [optiq-core-0.9-drill-r20.jar:na] at
[jira] [Updated] (DRILL-3045) Drill is not partition pruning due to internal heap memory limit
[ https://issues.apache.org/jira/browse/DRILL-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3045: --- Summary: Drill is not partition pruning due to internal heap memory limit (was: Drill is leaking memory during partition pruning if directory tree has lots of files) Drill is not partition pruning due to internal heap memory limit Key: DRILL-3045 URL: https://issues.apache.org/jira/browse/DRILL-3045 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Jacques Nadeau Fix For: 1.2.0 The symptom is: we are running simple query of the form select x from t where dir0='xyz and dir1='2015-01-01'; partition pruning works for a while and then it stops working. Query does run (since we don't fail the query in the case when we failed to prune) and return correct results. drillbit.log {code} 015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN o.a.d.exec.memory.BufferAllocator - Unable to allocate buffer of size 5000 due to memory limit. Current allocation: 16776840 java.lang.Exception: null at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:220) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:231) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:333) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223) [optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661) [optiq-core-0.9-drill-r20.jar:na] at net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) [optiq-core-0.9-drill-r20.jar:na] at net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:275) [optiq-core-0.9-drill-r20.jar:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:206) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:773) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:204) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] 2015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune partition. java.lang.NullPointerException: null at org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:334) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223) [optiq-core-0.9-drill-r20.jar:na] at
[jira] [Commented] (DRILL-3313) Eliminate redundant #load methods and unit-test loading exporting of vectors
[ https://issues.apache.org/jira/browse/DRILL-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717584#comment-14717584 ] Mehant Baid commented on DRILL-3313: +1. Jason's review comments addressed in the patch submitted by Parth. Eliminate redundant #load methods and unit-test loading exporting of vectors -- Key: DRILL-3313 URL: https://issues.apache.org/jira/browse/DRILL-3313 Project: Apache Drill Issue Type: Sub-task Components: Execution - Data Types Affects Versions: 1.0.0 Reporter: Hanifi Gunes Assignee: Hanifi Gunes Fix For: 1.2.0 Vectors have multiple #load methods that are used to populate data from raw buffers. It is relatively tough to reason, maintain and unit-test loading and exporting of data since there is many redundant code around load methods. This issue proposes to have single #load method conforming to VV#load(def, buffer) signature eliminating all other #load overrides. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3045) Drill is not partition pruning due to internal off-heap memory limit for planning phase
[ https://issues.apache.org/jira/browse/DRILL-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3045: --- Summary: Drill is not partition pruning due to internal off-heap memory limit for planning phase (was: Drill is not partition pruning due to internal heap memory limit) Drill is not partition pruning due to internal off-heap memory limit for planning phase --- Key: DRILL-3045 URL: https://issues.apache.org/jira/browse/DRILL-3045 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Mehant Baid Fix For: 1.2.0 The symptom is: we are running simple query of the form select x from t where dir0='xyz and dir1='2015-01-01'; partition pruning works for a while and then it stops working. Query does run (since we don't fail the query in the case when we failed to prune) and return correct results. drillbit.log {code} 015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN o.a.d.exec.memory.BufferAllocator - Unable to allocate buffer of size 5000 due to memory limit. Current allocation: 16776840 java.lang.Exception: null at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:220) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:231) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:333) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223) [optiq-core-0.9-drill-r20.jar:na] at org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661) [optiq-core-0.9-drill-r20.jar:na] at net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) [optiq-core-0.9-drill-r20.jar:na] at net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:275) [optiq-core-0.9-drill-r20.jar:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:206) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:773) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:204) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] 2015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune partition. java.lang.NullPointerException: null at org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:334) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] at org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223)
[jira] [Commented] (DRILL-3702) PartitionPruning hit ClassCastException in Interpreter when the pruning filter expression is of non-nullable type.
[ https://issues.apache.org/jira/browse/DRILL-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712046#comment-14712046 ] Mehant Baid commented on DRILL-3702: +1 PartitionPruning hit ClassCastException in Interpreter when the pruning filter expression is of non-nullable type. -- Key: DRILL-3702 URL: https://issues.apache.org/jira/browse/DRILL-3702 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Mehant Baid Fix For: 1.2.0 Attachments: 0001-DRILL-3702-Fix-partition-pruning-rule-when-the-pruni.patch I have the following parquet table, created using partition by clause: {code} create table mypart (id, name) partition by (id) as select cast(n_regionkey as varchar(20)), n_name from cp.`tpch/nation.parquet`; {code} The generated parquet table consists of 5 files, each representing a partition: {code} 0_0_1.parquet 0_0_2.parquet 0_0_3.parquet 0_0_4.parquet 0_0_5.parquet {code} For the following query, partition pruning works as expected: {code} select id, name from mypart where id = '0' ; 00-01 Project(id=[$1], name=[$0]) 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tmp/mypart/0_0_1.parquet]], selectionRoot=file:/tmp/mypart, numFiles=1, columns=[`id`, `name`]]]) selectionRoot : file:/tmp/mypart, fileSet : [ /tmp/mypart/0_0_1.parquet ], cost : 5.0 {code} However, the following query would hit ClassCastException when PruneScanRule calls interpreter to evaluate the filtering condition, which happens to be non-nullable. {code} select id, name from mypart where concat(id,'') = '0' ; 00-05 Project(id=[$1], name=[$0]) 00-06Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/tmp/mypart]], selectionRoot=file:/tmp/mypart, numFiles=1, columns=[`id`, `name`]]]) selectionRoot : file:/tmp/mypart, fileSet : [ /tmp/mypart/0_0_1.parquet, /tmp/mypart/0_0_4.parquet, /tmp/mypart/0_0_5.parquet, /tmp/mypart/0_0_2.parquet, /tmp/mypart/0_0_3.parquet ], cost : 25.0 }, {code} Here is the error for the ClassCastException, raised in Interpreter: {code} java.lang.ClassCastException: org.apache.drill.exec.expr.holders.BitHolder cannot be cast to org.apache.drill.exec.expr.holders.NullableBitHolder {code} The cause of the problem is that PruneScanRule assumes the output type of a filter condition is NullableBit, while in this case the filter condition is Bit type, which leads to ClassCastException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3690) Partitioning pruning produces wrong results when there are nested expressions in the filter
[ https://issues.apache.org/jira/browse/DRILL-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3690: --- Assignee: Aman Sinha (was: Mehant Baid) Partitioning pruning produces wrong results when there are nested expressions in the filter --- Key: DRILL-3690 URL: https://issues.apache.org/jira/browse/DRILL-3690 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Aman Sinha Priority: Blocker Fix For: 1.2.0 Consider the following query: select 1 from foo where dir0 not in (1994) and col1 not in ('bar'); The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 'bar'))) In FindPartitionCondition we rewrite the filter to cherry pick the partition column conditions so the interpreter can evaluate it, however when the expression contains more than two levels of nesting (in this case AND(NOT(=))) ) the expression does not get rewritten correctly. In this case the expression gets rewritten as: AND(=($1, 1994)). NOT is missing from the rewritten expression producing wrong results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3690) Partitioning pruning produces wrong results when there are nested expressions in the filter
[ https://issues.apache.org/jira/browse/DRILL-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708186#comment-14708186 ] Mehant Baid commented on DRILL-3690: [~amansinha100] can you please review. Partitioning pruning produces wrong results when there are nested expressions in the filter --- Key: DRILL-3690 URL: https://issues.apache.org/jira/browse/DRILL-3690 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Aman Sinha Priority: Blocker Fix For: 1.2.0 Consider the following query: select 1 from foo where dir0 not in (1994) and col1 not in ('bar'); The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 'bar'))) In FindPartitionCondition we rewrite the filter to cherry pick the partition column conditions so the interpreter can evaluate it, however when the expression contains more than two levels of nesting (in this case AND(NOT(=))) ) the expression does not get rewritten correctly. In this case the expression gets rewritten as: AND(=($1, 1994)). NOT is missing from the rewritten expression producing wrong results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3690) Partitioning pruning produces wrong results when there are nested expressions in the filter
Mehant Baid created DRILL-3690: -- Summary: Partitioning pruning produces wrong results when there are nested expressions in the filter Key: DRILL-3690 URL: https://issues.apache.org/jira/browse/DRILL-3690 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Priority: Blocker Fix For: 1.2.0 Consider the following query: select 1 from foo where dir0 not in (1994) and dir1 not in (1995); The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 1995))) In FindPartitionCondition we rewrite the filter to cherry pick the partition column conditions so the interpreter can evaluate it, however when the expression contains more than two levels of nesting (in this case AND(NOT(=))) ) the expression does not get rewritten correctly. In this case the expression gets rewritten as: AND(=($1, 1994), =($2, 1995)). NOT is missing from the rewritten expression producing wrong results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3690) Partitioning pruning produces wrong results when there are nested expressions in the filter
[ https://issues.apache.org/jira/browse/DRILL-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3690: --- Description: Consider the following query: select 1 from foo where dir0 not in (1994) and col1 not in ('bar'); The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 'bar'))) In FindPartitionCondition we rewrite the filter to cherry pick the partition column conditions so the interpreter can evaluate it, however when the expression contains more than two levels of nesting (in this case AND(NOT(=))) ) the expression does not get rewritten correctly. In this case the expression gets rewritten as: AND(=($1, 1994)). NOT is missing from the rewritten expression producing wrong results. was: Consider the following query: select 1 from foo where dir0 not in (1994) and dir1 not in (1995); The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 1995))) In FindPartitionCondition we rewrite the filter to cherry pick the partition column conditions so the interpreter can evaluate it, however when the expression contains more than two levels of nesting (in this case AND(NOT(=))) ) the expression does not get rewritten correctly. In this case the expression gets rewritten as: AND(=($1, 1994), =($2, 1995)). NOT is missing from the rewritten expression producing wrong results. Partitioning pruning produces wrong results when there are nested expressions in the filter --- Key: DRILL-3690 URL: https://issues.apache.org/jira/browse/DRILL-3690 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Priority: Blocker Fix For: 1.2.0 Consider the following query: select 1 from foo where dir0 not in (1994) and col1 not in ('bar'); The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 'bar'))) In FindPartitionCondition we rewrite the filter to cherry pick the partition column conditions so the interpreter can evaluate it, however when the expression contains more than two levels of nesting (in this case AND(NOT(=))) ) the expression does not get rewritten correctly. In this case the expression gets rewritten as: AND(=($1, 1994)). NOT is missing from the rewritten expression producing wrong results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2737) Sqlline throws Runtime exception when JDBC ResultSet throws a SQLException
[ https://issues.apache.org/jira/browse/DRILL-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705150#comment-14705150 ] Mehant Baid commented on DRILL-2737: +1 Sqlline throws Runtime exception when JDBC ResultSet throws a SQLException -- Key: DRILL-2737 URL: https://issues.apache.org/jira/browse/DRILL-2737 Project: Apache Drill Issue Type: Bug Components: Client - CLI Reporter: Parth Chandra Assignee: Parth Chandra Fix For: 1.2.0 Attachments: DRILL-2737.patch This is a tracking bug to provide a patch to Sqlline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2625) org.apache.drill.common.StackTrace should follow standard stacktrace format
[ https://issues.apache.org/jira/browse/DRILL-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695860#comment-14695860 ] Mehant Baid commented on DRILL-2625: +1 org.apache.drill.common.StackTrace should follow standard stacktrace format --- Key: DRILL-2625 URL: https://issues.apache.org/jira/browse/DRILL-2625 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 0.8.0 Reporter: Daniel Barclay (Drill) Assignee: Mehant Baid Fix For: 1.2.0 org.apache.drill.common.StackTrace uses a different textual format than JDK's standard format for stack traces. It should probably use the standard format so that its stack trace output can be used by tools that already can parse the standard format to provide functionality such as displaying the corresponding source. (After correcting for DRILL-2624, StackTrace formats stack traces like this: org.apache.drill.common.StackTrace.init:1 org.apache.drill.exec.server.Drillbit.run:20 org.apache.drill.jdbc.DrillConnectionImpl.init:232 The normal form is like this: at org.apache.drill.exec.memory.TopLevelAllocator.close(TopLevelAllocator.java:162) at org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:75) at com.google.common.io.Closeables.close(Closeables.java:77) ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2625) org.apache.drill.common.StackTrace should follow standard stacktrace format
[ https://issues.apache.org/jira/browse/DRILL-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-2625: --- Assignee: Chris Westin (was: Mehant Baid) org.apache.drill.common.StackTrace should follow standard stacktrace format --- Key: DRILL-2625 URL: https://issues.apache.org/jira/browse/DRILL-2625 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 0.8.0 Reporter: Daniel Barclay (Drill) Assignee: Chris Westin Fix For: 1.2.0 org.apache.drill.common.StackTrace uses a different textual format than JDK's standard format for stack traces. It should probably use the standard format so that its stack trace output can be used by tools that already can parse the standard format to provide functionality such as displaying the corresponding source. (After correcting for DRILL-2624, StackTrace formats stack traces like this: org.apache.drill.common.StackTrace.init:1 org.apache.drill.exec.server.Drillbit.run:20 org.apache.drill.jdbc.DrillConnectionImpl.init:232 The normal form is like this: at org.apache.drill.exec.memory.TopLevelAllocator.close(TopLevelAllocator.java:162) at org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:75) at com.google.common.io.Closeables.close(Closeables.java:77) ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3579) Drill on Hive query fails if partition table has __HIVE_DEFAULT_PARTITION__
[ https://issues.apache.org/jira/browse/DRILL-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14663054#comment-14663054 ] Mehant Baid commented on DRILL-3579: +1 Drill on Hive query fails if partition table has __HIVE_DEFAULT_PARTITION__ --- Key: DRILL-3579 URL: https://issues.apache.org/jira/browse/DRILL-3579 Project: Apache Drill Issue Type: Bug Components: Functions - Hive Affects Versions: 1.1.0 Environment: Drill 1.1 on Hive 1.0 Reporter: Hao Zhu Assignee: Venki Korukanti Priority: Critical Fix For: 1.2.0 Attachments: DRILL-3579-1.patch If Hive's partition table has __HIVE_DEFAULT_PARTITION__ in the case of null values in the partition column, Drill on Hive query will fail. Minimum reproduce: 1.Hive: {code} CREATE TABLE h1_testpart2(id INT) PARTITIONED BY(id2 int); set hive.exec.dynamic.partition.mode=nonstrict; INSERT OVERWRITE TABLE h1_testpart2 PARTITION(id2) SELECT 1 as id1 , 20150101 as id2 from h1_passwords limit 1; INSERT OVERWRITE TABLE h1_testpart2 PARTITION(id2) SELECT 1 as id1 , null as id2 from h1_passwords limit 1; {code} 2. Filesystem looks like: {code} h1 h1_testpart2]# ls -altr total 2 drwxrwxrwx 89 mapr mapr 87 Jul 30 00:04 .. drwxr-xr-x 2 mapr mapr 1 Jul 30 00:05 id2=20150101 drwxr-xr-x 2 mapr mapr 1 Jul 30 00:05 id2=__HIVE_DEFAULT_PARTITION__ drwxr-xr-x 4 mapr mapr 2 Jul 30 00:05 . {code} 3.Drill will fail: {code} select * from h1_testpart2; Error: SYSTEM ERROR: NumberFormatException: For input string: __HIVE_DEFAULT_PARTITION__ Fragment 0:0 [Error Id: 509eb392-db9a-42f3-96ea-fb597425f49f on h1.poc.com:31010] (java.lang.reflect.UndeclaredThrowableException) null org.apache.hadoop.security.UserGroupInformation.doAs():1581 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():136 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():131 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173 org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106 org.apache.drill.exec.physical.impl.ImplCreator.getExec():81 org.apache.drill.exec.work.fragment.FragmentExecutor.run():235 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 Caused By (org.apache.drill.common.exceptions.ExecutionSetupException) Failure while initializing HiveRecordReader: For input string: __HIVE_DEFAULT_PARTITION__ org.apache.drill.exec.store.hive.HiveRecordReader.init():241 org.apache.drill.exec.store.hive.HiveRecordReader.init():138 org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58 org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34 org.apache.drill.exec.physical.impl.ImplCreator$2.run():138 org.apache.drill.exec.physical.impl.ImplCreator$2.run():136 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1566 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():136 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():131 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173 org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106 org.apache.drill.exec.physical.impl.ImplCreator.getExec():81 org.apache.drill.exec.work.fragment.FragmentExecutor.run():235 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 Caused By (java.lang.NumberFormatException) For input string: __HIVE_DEFAULT_PARTITION__ java.lang.NumberFormatException.forInputString():65 java.lang.Integer.parseInt():580 java.lang.Integer.parseInt():615 org.apache.drill.exec.store.hive.HiveRecordReader.convertPartitionType():605 org.apache.drill.exec.store.hive.HiveRecordReader.init():236 org.apache.drill.exec.store.hive.HiveRecordReader.init():138 org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58 org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34 org.apache.drill.exec.physical.impl.ImplCreator$2.run():138 org.apache.drill.exec.physical.impl.ImplCreator$2.run():136 java.security.AccessController.doPrivileged():-2
[jira] [Assigned] (DRILL-2912) Exception is not propagated correctly in case when directory contains mix of file types
[ https://issues.apache.org/jira/browse/DRILL-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid reassigned DRILL-2912: -- Assignee: Mehant Baid (was: Steven Phillips) Exception is not propagated correctly in case when directory contains mix of file types --- Key: DRILL-2912 URL: https://issues.apache.org/jira/browse/DRILL-2912 Project: Apache Drill Issue Type: Bug Components: Execution - Flow, Storage - JSON, Storage - Parquet Reporter: Victoria Markman Assignee: Mehant Baid Fix For: 1.2.0 While trying to read from directory that has a mix of parquet and json files I ran into an exception: {code} 0: jdbc:drill:schema=dfs select max(dir0) from bigtable; Query failed: SYSTEM ERROR: Unexpected exception during fragment initialization: Internal error: Error while applying rule DrillPushProjIntoScan, args [rel#4207:LogicalProject.NONE.ANY([]).[](input=rel#4206:Subset#0.ENUMERABLE.ANY([]).[],dir0=$1), rel#4198:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, test, bigtable])] [72d7f7ee-3045-44d9-b13c-1d03bea4e22c on atsqa4-133.qa.lab:31010] Error: exception while executing query: Failure while executing query. (state=,code=0) {code} The real problem is that directory contains 2 parquet and one json files: {code} [Wed Apr 29 14:50:58 root@/mapr/vmarkman.cluster.com/test/bigtable/F114/2014-03-27 ] # pwd /mapr/vmarkman.cluster.com/test/bigtable/F114/2014-03-27 [Wed Apr 29 14:51:06 root@/mapr/vmarkman.cluster.com/test/bigtable/F114/2014-03-27 ] # ls -ltr total 2 -rwxr-xr-x 1 root root 483 Apr 16 16:05 0_0_0.parquet -rwxr-xr-x 1 root root 483 Apr 17 13:06 214c279334946e65-7e32c56eed93cbc2_1965630551_data.0.parq -rw-r--r-- 1 root root 17 Apr 23 15:24 t1.json {code} drillbit.log {code} [72d7f7ee-3045-44d9-b13c-1d03bea4e22c on atsqa4-133.qa.lab:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: Unexpected exception during fragment initialization: Internal error: Error while applying rule DrillPushProjIntoScan, args [rel#4207:LogicalProject.NONE.ANY([]).[](input=rel#4206:Subset#0.ENUMERABLE.ANY([]).[],dir0=$1), rel#4198:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, test, bigtable])] [72d7f7ee-3045-44d9-b13c-1d03bea4e22c on atsqa4-133.qa.lab:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:465) ~[drill-common-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:620) [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:717) [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:659) [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) [drill-common-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:661) [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:762) [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:212) [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Internal error: Error while applying rule DrillPushProjIntoScan, args [rel#4207:LogicalProject.NONE.ANY([]).[](input=rel#4206:Subset#0.ENUMERABLE.ANY([]).[],dir0=$1), rel#4198:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, test, bigtable])] ... 4 common frames omitted Caused by: java.lang.AssertionError: Internal error: Error while applying rule DrillPushProjIntoScan, args [rel#4207:LogicalProject.NONE.ANY([]).[](input=rel#4206:Subset#0.ENUMERABLE.ANY([]).[],dir0=$1), rel#4198:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, test, bigtable])] at org.apache.calcite.util.Util.newInternal(Util.java:743) ~[calcite-core-1.1.0-drill-r2.jar:1.1.0-drill-r2] at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:251)
[jira] [Assigned] (DRILL-3535) Drop table support
[ https://issues.apache.org/jira/browse/DRILL-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid reassigned DRILL-3535: -- Assignee: Mehant Baid Drop table support -- Key: DRILL-3535 URL: https://issues.apache.org/jira/browse/DRILL-3535 Project: Apache Drill Issue Type: New Feature Reporter: Mehant Baid Assignee: Mehant Baid Umbrella JIRA to track support for Drop table feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3593) Reorganize classes that are exposed to storage plugins
Mehant Baid created DRILL-3593: -- Summary: Reorganize classes that are exposed to storage plugins Key: DRILL-3593 URL: https://issues.apache.org/jira/browse/DRILL-3593 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 1.2.0 Based on the discussion on DRILL-3500 we want to reorganize some of the classes/ interfaces (QueryContext, PlannerSettings, OptimizerRulesContext ...) present at planning time and decide what is to be exposed to storage plugin's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules
[ https://issues.apache.org/jira/browse/DRILL-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651033#comment-14651033 ] Mehant Baid commented on DRILL-3500: I've created DRILL-3593 for the reorg task. Provide additional information while registering storage plugin optimizer rules --- Key: DRILL-3500 URL: https://issues.apache.org/jira/browse/DRILL-3500 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 1.2.0 Currently all the optimizer rules internal to Drill have access to QueryContext. This is used by a few rules like PruneScanRule which invoke the interpreter to perform partition pruning. However the rules that belong to specific storage plugins don't have access to this information. This JIRA aims to do the following 1. Add a new interface OptimizerRulesContext that will be implemented by QueryContext. It will contain all the information needed by the rules. This context will be passed to the storage plugin method while getting the optimizer rules specific to that storage plugin. 2. Restrict existing internal rules to only accept OptimizerRulesContext instead of QueryContext so information in QueryContext has better encapsulation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules
[ https://issues.apache.org/jira/browse/DRILL-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid resolved DRILL-3500. Resolution: Fixed Fixed in f8197cfe1bc3671aa6878ef9d1869b2fe8e57331 Provide additional information while registering storage plugin optimizer rules --- Key: DRILL-3500 URL: https://issues.apache.org/jira/browse/DRILL-3500 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 1.2.0 Currently all the optimizer rules internal to Drill have access to QueryContext. This is used by a few rules like PruneScanRule which invoke the interpreter to perform partition pruning. However the rules that belong to specific storage plugins don't have access to this information. This JIRA aims to do the following 1. Add a new interface OptimizerRulesContext that will be implemented by QueryContext. It will contain all the information needed by the rules. This context will be passed to the storage plugin method while getting the optimizer rules specific to that storage plugin. 2. Restrict existing internal rules to only accept OptimizerRulesContext instead of QueryContext so information in QueryContext has better encapsulation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3121) Hive partition pruning is not happening
[ https://issues.apache.org/jira/browse/DRILL-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3121: --- Assignee: Aman Sinha (was: Mehant Baid) Hive partition pruning is not happening --- Key: DRILL-3121 URL: https://issues.apache.org/jira/browse/DRILL-3121 Project: Apache Drill Issue Type: Improvement Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Hao Zhu Assignee: Aman Sinha Priority: Critical Fix For: 1.2.0 Attachments: DRILL-3121.patch Tested on 1.0.0 with below commit id, and hive 0.13. {code} select * from sys.version; +---+++--++ | commit_id | commit_message |commit_time | build_email | build_time | +---+++--++ | d8b19759657698581cc0d01d7038797952888123 | DRILL-3100: TestImpersonationDisabledWithMiniDFS fails on Windows | 15.05.2015 @ 01:18:03 EDT | Unknown | 15.05.2015 @ 03:07:10 EDT | +---+++--++ 1 row selected (0.083 seconds) {code} How to reproduce: 1. Use hive to create below partition table: {code} CREATE TABLE partition_table(id INT, username string) PARTITIONED BY(year STRING, month STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ,; insert into table partition_table PARTITION(year='2014',month='11') select 1,'u' from passwords limit 1; insert into table partition_table PARTITION(year='2014',month='12') select 2,'s' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='01') select 3,'e' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='02') select 4,'r' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='03') select 5,'n' from passwords limit 1; {code} 2. Hive query can do partition pruning for below 2 queries: {code} hive explain EXTENDED select * from partition_table where year='2015' and month in ( '02','03') ; partition values: month 02 year 2015 partition values: month 03 year 2015 explain EXTENDED select * from partition_table where year='2015' and (month = '02' and month = '03') ; partition values: month 02 year 2015 partition values: month 03 year 2015 {code} Hive only scans 2 partitions -- 2015/02 and 2015/03. 3. Drill can not do partition pruning for below 2 queries: {code} explain plan for select * from hive.partition_table where `year`='2015' and `month` in ('02','03'); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) 00-02SelectionVectorRemover 00-03 Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, '03')))]) 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015, 03])]]]) explain plan for select * from hive.partition_table where `year`='2015' and (`month` = '02' and `month` = '03' ); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) 00-02SelectionVectorRemover 00-03 Filter(condition=[AND(=($2, '2015'), =($3, '02'), =($3, '03'))]) 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]),
[jira] [Updated] (DRILL-3121) Hive partition pruning is not happening
[ https://issues.apache.org/jira/browse/DRILL-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3121: --- Attachment: DRILL-3121.patch Hive partition pruning is not happening --- Key: DRILL-3121 URL: https://issues.apache.org/jira/browse/DRILL-3121 Project: Apache Drill Issue Type: Improvement Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Hao Zhu Assignee: Mehant Baid Priority: Critical Fix For: 1.2.0 Attachments: DRILL-3121.patch Tested on 1.0.0 with below commit id, and hive 0.13. {code} select * from sys.version; +---+++--++ | commit_id | commit_message |commit_time | build_email | build_time | +---+++--++ | d8b19759657698581cc0d01d7038797952888123 | DRILL-3100: TestImpersonationDisabledWithMiniDFS fails on Windows | 15.05.2015 @ 01:18:03 EDT | Unknown | 15.05.2015 @ 03:07:10 EDT | +---+++--++ 1 row selected (0.083 seconds) {code} How to reproduce: 1. Use hive to create below partition table: {code} CREATE TABLE partition_table(id INT, username string) PARTITIONED BY(year STRING, month STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ,; insert into table partition_table PARTITION(year='2014',month='11') select 1,'u' from passwords limit 1; insert into table partition_table PARTITION(year='2014',month='12') select 2,'s' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='01') select 3,'e' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='02') select 4,'r' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='03') select 5,'n' from passwords limit 1; {code} 2. Hive query can do partition pruning for below 2 queries: {code} hive explain EXTENDED select * from partition_table where year='2015' and month in ( '02','03') ; partition values: month 02 year 2015 partition values: month 03 year 2015 explain EXTENDED select * from partition_table where year='2015' and (month = '02' and month = '03') ; partition values: month 02 year 2015 partition values: month 03 year 2015 {code} Hive only scans 2 partitions -- 2015/02 and 2015/03. 3. Drill can not do partition pruning for below 2 queries: {code} explain plan for select * from hive.partition_table where `year`='2015' and `month` in ('02','03'); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) 00-02SelectionVectorRemover 00-03 Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, '03')))]) 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015, 03])]]]) explain plan for select * from hive.partition_table where `year`='2015' and (`month` = '02' and `month` = '03' ); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) 00-02SelectionVectorRemover 00-03 Filter(condition=[AND(=($2, '2015'), =($3, '02'), =($3, '03'))]) 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015,
[jira] [Commented] (DRILL-3151) ResultSetMetaData not as specified by JDBC (null/dummy value, not /etc.)
[ https://issues.apache.org/jira/browse/DRILL-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639279#comment-14639279 ] Mehant Baid commented on DRILL-3151: +1 ResultSetMetaData not as specified by JDBC (null/dummy value, not /etc.) -- Key: DRILL-3151 URL: https://issues.apache.org/jira/browse/DRILL-3151 Project: Apache Drill Issue Type: Bug Components: Client - JDBC Reporter: Daniel Barclay (Drill) Assignee: Parth Chandra Fix For: 1.2.0 Attachments: DRILL-3151.3.patch.txt In Drill's JDBC driver, some ResultSetMetaData methods don't return what JDBC specifies they should return. Some cases: {{getTableName(int)}}: - (JDBC says: {{table name or if not applicable}}) - Drill returns {{null}} (instead of empty string or table name) - (Drill indicates not applicable even when from named table, e.g., for {{SELECT * FROM INFORMATION_SCHEMA.CATALOGS}}.) {{getSchemaName(int)}}: - (JDBC says: {{schema name or if not applicable}}) - Drill returns {{\-\-UNKNOWN--}} (instead of empty string or schema name) - (Drill indicates not applicable even when from named table, e.g., for {{SELECT * FROM INFORMATION_SCHEMA.CATALOGS}}.) {{getCatalogName(int)}}: - (JDBC says: {{the name of the catalog for the table in which the given column appears or if not applicable}}) - Drill returns {{\-\-UNKNOWN--}} (instead of empty string or catalog name) - (Drill indicates not applicable even when from named table, e.g., for {{SELECT * FROM INFORMATION_SCHEMA.CATALOGS}}.) {{isSearchable(int)}}: - (JDBC says: {{Indicates whether the designated column can be used in a where clause.}}) - Drill returns {{false}}. {{getColumnClassName(int}}: - (JDBC says: {{the fully-qualified name of the class in the Java programming language that would be used by the method ResultSet.getObject to retrieve the value in the specified column. This is the class name used for custom mapping.}}) - Drill returns {{none}} (instead of the correct class name). More cases: {{getColumnDisplaySize}} - (JDBC says (quite ambiguously): {{the normal maximum number of characters allowed as the width of the designated column}}) - Drill always returns {{10}}! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules
[ https://issues.apache.org/jira/browse/DRILL-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636067#comment-14636067 ] Mehant Baid commented on DRILL-3500: Yep, I was planning on doing that. Provide additional information while registering storage plugin optimizer rules --- Key: DRILL-3500 URL: https://issues.apache.org/jira/browse/DRILL-3500 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 1.2.0 Currently all the optimizer rules internal to Drill have access to QueryContext. This is used by a few rules like PruneScanRule which invoke the interpreter to perform partition pruning. However the rules that belong to specific storage plugins don't have access to this information. This JIRA aims to do the following 1. Add a new interface OptimizerRulesContext that will be implemented by QueryContext. It will contain all the information needed by the rules. This context will be passed to the storage plugin method while getting the optimizer rules specific to that storage plugin. 2. Restrict existing internal rules to only accept OptimizerRulesContext instead of QueryContext so information in QueryContext has better encapsulation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3535) Drop table support
Mehant Baid created DRILL-3535: -- Summary: Drop table support Key: DRILL-3535 URL: https://issues.apache.org/jira/browse/DRILL-3535 Project: Apache Drill Issue Type: New Feature Reporter: Mehant Baid Umbrella JIRA to track support for Drop table feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules
[ https://issues.apache.org/jira/browse/DRILL-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635763#comment-14635763 ] Mehant Baid commented on DRILL-3500: OptimizerRulesContext is essentially an interface added on top of existing information present in QueryContext so the name might be a bit misleading and can be changed. The main motivation behind adding the new interface (OptimizerRulesContext) was to enable Hive storage plugin to add a rule to perform interpreter based execution for partition pruning. I think Jason also needs this for some of his work for reading Hive Parquet files natively. Some information in QueryContext is needed to be able to perform this and the two main reasons to add the interface were: 1. Better encapsulation, since QueryContext is pretty heavy weight and we add a bunch of information to it, this interface would prevent any unnecessary information being leaked to the plugin. 2. One common interface exposing all information needed by optimizer rules that is common to both storage plugin specific rules and the internal rules. Currently in master all the internal optimizer rules (eg: [PruneScanRule|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java#L77] ) have access to information in QueryContext but storage plugin rules don't. This way we provide the same framework to build the rules independent of storage plugin. Provide additional information while registering storage plugin optimizer rules --- Key: DRILL-3500 URL: https://issues.apache.org/jira/browse/DRILL-3500 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 1.2.0 Currently all the optimizer rules internal to Drill have access to QueryContext. This is used by a few rules like PruneScanRule which invoke the interpreter to perform partition pruning. However the rules that belong to specific storage plugins don't have access to this information. This JIRA aims to do the following 1. Add a new interface OptimizerRulesContext that will be implemented by QueryContext. It will contain all the information needed by the rules. This context will be passed to the storage plugin method while getting the optimizer rules specific to that storage plugin. 2. Restrict existing internal rules to only accept OptimizerRulesContext instead of QueryContext so information in QueryContext has better encapsulation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules
[ https://issues.apache.org/jira/browse/DRILL-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636053#comment-14636053 ] Mehant Baid commented on DRILL-3500: PlannerSettings currently mostly contains planner related options. However I think it makes sense to consolidate. PlannerSettings will need to keep an additional reference to the allocator present in the QueryContext. I will make the changes and post a patch. Provide additional information while registering storage plugin optimizer rules --- Key: DRILL-3500 URL: https://issues.apache.org/jira/browse/DRILL-3500 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 1.2.0 Currently all the optimizer rules internal to Drill have access to QueryContext. This is used by a few rules like PruneScanRule which invoke the interpreter to perform partition pruning. However the rules that belong to specific storage plugins don't have access to this information. This JIRA aims to do the following 1. Add a new interface OptimizerRulesContext that will be implemented by QueryContext. It will contain all the information needed by the rules. This context will be passed to the storage plugin method while getting the optimizer rules specific to that storage plugin. 2. Restrict existing internal rules to only accept OptimizerRulesContext instead of QueryContext so information in QueryContext has better encapsulation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3503) Make PruneScanRule have a pluggable partitioning mechanism
[ https://issues.apache.org/jira/browse/DRILL-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3503: --- Attachment: DRILL-3503_part2.patch DRILL-3503_part1.patch First patch is a minor formatting patch generated automatically using IDE. Second patch is the actual change. Make PruneScanRule have a pluggable partitioning mechanism -- Key: DRILL-3503 URL: https://issues.apache.org/jira/browse/DRILL-3503 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 1.2.0 Attachments: DRILL-3503_part1.patch, DRILL-3503_part2.patch Currently PruneScanRule performs partition pruning for file system. Some of the code relies on certain aspects of how partitioning is done in DFS. This JIRA aims to abstract out the behavior of the underlying partition scheme and delegate to the specific storage plugin to get that information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3503) Make PruneScanRule have a pluggable partitioning mechanism
[ https://issues.apache.org/jira/browse/DRILL-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3503: --- Assignee: Aman Sinha (was: Mehant Baid) Make PruneScanRule have a pluggable partitioning mechanism -- Key: DRILL-3503 URL: https://issues.apache.org/jira/browse/DRILL-3503 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Aman Sinha Fix For: 1.2.0 Attachments: DRILL-3503_part1.patch, DRILL-3503_part2.patch Currently PruneScanRule performs partition pruning for file system. Some of the code relies on certain aspects of how partitioning is done in DFS. This JIRA aims to abstract out the behavior of the underlying partition scheme and delegate to the specific storage plugin to get that information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3503) Make PruneScanRule have a pluggable partitioning mechanism
Mehant Baid created DRILL-3503: -- Summary: Make PruneScanRule have a pluggable partitioning mechanism Key: DRILL-3503 URL: https://issues.apache.org/jira/browse/DRILL-3503 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 1.2.0 Currently PruneScanRule performs partition pruning for file system. Some of the code relies on certain aspects of how partitioning is done in DFS. This JIRA aims to abstract out the behavior of the underlying partition scheme and delegate to the specific storage plugin to get that information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules
[ https://issues.apache.org/jira/browse/DRILL-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629246#comment-14629246 ] Mehant Baid commented on DRILL-3500: [~jaltekruse] can you please review Provide additional information while registering storage plugin optimizer rules --- Key: DRILL-3500 URL: https://issues.apache.org/jira/browse/DRILL-3500 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Jason Altekruse Fix For: 1.2.0 Currently all the optimizer rules internal to Drill have access to QueryContext. This is used by a few rules like PruneScanRule which invoke the interpreter to perform partition pruning. However the rules that belong to specific storage plugins don't have access to this information. This JIRA aims to do the following 1. Add a new interface OptimizerRulesContext that will be implemented by QueryContext. It will contain all the information needed by the rules. This context will be passed to the storage plugin method while getting the optimizer rules specific to that storage plugin. 2. Restrict existing internal rules to only accept OptimizerRulesContext instead of QueryContext so information in QueryContext has better encapsulation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules
Mehant Baid created DRILL-3500: -- Summary: Provide additional information while registering storage plugin optimizer rules Key: DRILL-3500 URL: https://issues.apache.org/jira/browse/DRILL-3500 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 1.2.0 Currently all the optimizer rules internal to Drill have access to QueryContext. This is used by a few rules like PruneScanRule which invoke the interpreter to perform partition pruning. However the rules that belong to specific storage plugins don't have access to this information. This JIRA aims to do the following 1. Add a new interface OptimizerRulesContext that will be implemented by QueryContext. It will contain all the information needed by the rules. This context will be passed to the storage plugin method while getting the optimizer rules specific to that storage plugin. 2. Restrict existing internal rules to only accept OptimizerRulesContext instead of QueryContext so information in QueryContext has better encapsulation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2862) Convert_to/Convert_From throw assertion when an incorrect encoding type is specified
[ https://issues.apache.org/jira/browse/DRILL-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622873#comment-14622873 ] Mehant Baid commented on DRILL-2862: +1. Convert_to/Convert_From throw assertion when an incorrect encoding type is specified Key: DRILL-2862 URL: https://issues.apache.org/jira/browse/DRILL-2862 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Reporter: Neeraja Assignee: Parth Chandra Fix For: 1.2.0 Attachments: DRILL-2862.2.patch.txt Below is the error from SQLLine. Replacing UTF-8 to UTF8 works fine. The error message need to accurately represent the problem. 0: jdbc:drill: select Convert_from(t.address.state,'UTF-8') from customers t limit 10; Query failed: AssertionError: Error: exception while executing query: Failure while executing query. (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3121) Hive partition pruning is not happening
[ https://issues.apache.org/jira/browse/DRILL-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3121: --- Priority: Critical (was: Major) Hive partition pruning is not happening --- Key: DRILL-3121 URL: https://issues.apache.org/jira/browse/DRILL-3121 Project: Apache Drill Issue Type: Improvement Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Hao Zhu Assignee: Mehant Baid Priority: Critical Fix For: 1.2.0 Tested on 1.0.0 with below commit id, and hive 0.13. {code} select * from sys.version; +---+++--++ | commit_id | commit_message |commit_time | build_email | build_time | +---+++--++ | d8b19759657698581cc0d01d7038797952888123 | DRILL-3100: TestImpersonationDisabledWithMiniDFS fails on Windows | 15.05.2015 @ 01:18:03 EDT | Unknown | 15.05.2015 @ 03:07:10 EDT | +---+++--++ 1 row selected (0.083 seconds) {code} How to reproduce: 1. Use hive to create below partition table: {code} CREATE TABLE partition_table(id INT, username string) PARTITIONED BY(year STRING, month STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ,; insert into table partition_table PARTITION(year='2014',month='11') select 1,'u' from passwords limit 1; insert into table partition_table PARTITION(year='2014',month='12') select 2,'s' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='01') select 3,'e' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='02') select 4,'r' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='03') select 5,'n' from passwords limit 1; {code} 2. Hive query can do partition pruning for below 2 queries: {code} hive explain EXTENDED select * from partition_table where year='2015' and month in ( '02','03') ; partition values: month 02 year 2015 partition values: month 03 year 2015 explain EXTENDED select * from partition_table where year='2015' and (month = '02' and month = '03') ; partition values: month 02 year 2015 partition values: month 03 year 2015 {code} Hive only scans 2 partitions -- 2015/02 and 2015/03. 3. Drill can not do partition pruning for below 2 queries: {code} explain plan for select * from hive.partition_table where `year`='2015' and `month` in ('02','03'); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) 00-02SelectionVectorRemover 00-03 Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, '03')))]) 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015, 03])]]]) explain plan for select * from hive.partition_table where `year`='2015' and (`month` = '02' and `month` = '03' ); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) 00-02SelectionVectorRemover 00-03 Filter(condition=[AND(=($2, '2015'), =($3, '02'), =($3, '03'))]) 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015, 03])]]]) {code} Drill scans 3
[jira] [Updated] (DRILL-3121) Hive partition pruning is not happening
[ https://issues.apache.org/jira/browse/DRILL-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3121: --- Issue Type: Improvement (was: Bug) Hive partition pruning is not happening --- Key: DRILL-3121 URL: https://issues.apache.org/jira/browse/DRILL-3121 Project: Apache Drill Issue Type: Improvement Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Hao Zhu Assignee: Mehant Baid Fix For: 1.2.0 Tested on 1.0.0 with below commit id, and hive 0.13. {code} select * from sys.version; +---+++--++ | commit_id | commit_message |commit_time | build_email | build_time | +---+++--++ | d8b19759657698581cc0d01d7038797952888123 | DRILL-3100: TestImpersonationDisabledWithMiniDFS fails on Windows | 15.05.2015 @ 01:18:03 EDT | Unknown | 15.05.2015 @ 03:07:10 EDT | +---+++--++ 1 row selected (0.083 seconds) {code} How to reproduce: 1. Use hive to create below partition table: {code} CREATE TABLE partition_table(id INT, username string) PARTITIONED BY(year STRING, month STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ,; insert into table partition_table PARTITION(year='2014',month='11') select 1,'u' from passwords limit 1; insert into table partition_table PARTITION(year='2014',month='12') select 2,'s' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='01') select 3,'e' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='02') select 4,'r' from passwords limit 1; insert into table partition_table PARTITION(year='2015',month='03') select 5,'n' from passwords limit 1; {code} 2. Hive query can do partition pruning for below 2 queries: {code} hive explain EXTENDED select * from partition_table where year='2015' and month in ( '02','03') ; partition values: month 02 year 2015 partition values: month 03 year 2015 explain EXTENDED select * from partition_table where year='2015' and (month = '02' and month = '03') ; partition values: month 02 year 2015 partition values: month 03 year 2015 {code} Hive only scans 2 partitions -- 2015/02 and 2015/03. 3. Drill can not do partition pruning for below 2 queries: {code} explain plan for select * from hive.partition_table where `year`='2015' and `month` in ('02','03'); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) 00-02SelectionVectorRemover 00-03 Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, '03')))]) 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015, 03])]]]) explain plan for select * from hive.partition_table where `year`='2015' and (`month` = '02' and `month` = '03' ); +--+--+ | text | json | +--+--+ | 00-00Screen 00-01 Project(id=[$0], username=[$1], year=[$2], month=[$3]) 00-02SelectionVectorRemover 00-03 Filter(condition=[AND(=($2, '2015'), =($3, '02'), =($3, '03'))]) 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:partition_table), inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], columns=[`*`], partitions= [Partition(values:[2015, 01]), Partition(values:[2015, 02]), Partition(values:[2015, 03])]]]) {code} Drill scans 3 partitions -- 2015/01, 2015/02 and
[jira] [Commented] (DRILL-3334) java.lang.IllegalStateException: Failure while reading vector.: raised when using dynamic schema in JSON
[ https://issues.apache.org/jira/browse/DRILL-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619906#comment-14619906 ] Mehant Baid commented on DRILL-3334: [~hgunes] I don't think HashJoinBatch currently supports any changes in schema (join column or non-join column). However it seems like a limitation we can most likely overcome. java.lang.IllegalStateException: Failure while reading vector.: raised when using dynamic schema in JSON -- Key: DRILL-3334 URL: https://issues.apache.org/jira/browse/DRILL-3334 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 1.0.0 Environment: Single Node running on OSX and MapR Hadoop SandBox + Drill Reporter: Tugdual Grall Assignee: Hanifi Gunes Fix For: 1.2.0 Attachments: test.zip I have a simple data set based on 3 JSON documents: - 1 customer - 2 orders (I have attached the document to the JIRA) when I do the following query that is a join between order and customers I can raise some unexpected exception. A working query: {code} SELECT customers.id, orders.total FROM dfs.ecommerce.`customers/*.json` customers, dfs.ecommerce.`orders/*.json` orders WHERE customers.id = orders.cust_id AND customers.country = 'FRANCE' {code} It works since orders.total is present in all orders Now when I execute the following query (tax is not present in all document) {code} SELECT customers.id, orders.tax FROM dfs.ecommerce.`customers/*.json` customers, dfs.ecommerce.`orders/*.json` orders WHERE customers.id = orders.cust_id AND customers.country = 'FRANCE' {code} Thsi query raise the following exception: {code} org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: java.lang.IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.drill.exec.vector.NullableBigIntVector. Fragment 0:0 [Error Id: a7ad300a-4446-41f3-8b1c-4bb7d1dbfb52 on maprdemo:31010] {code} If you cannot reproduce with tax, you can try with the field: orders.cool or simply move the tax field from one document to the others. (the field must be present in 1 document only) It looks like Drill is losing the list of columns present globally. Note: if I use a field that does not exist in any document it is working ( orders.this_is_crazy ) Note: if I use * instead of a projection this raise another exception: {code} org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException: Hash join does not support schema changes Fragment 0:0 [Error Id: 0b20d580-37a3-491a-9987-4d04fb6f2d43 on maprdemo:31010] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3464) Index out of bounds exception while performing concat()
[ https://issues.apache.org/jira/browse/DRILL-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3464: --- Assignee: Jinfeng Ni (was: Mehant Baid) Index out of bounds exception while performing concat() --- Key: DRILL-3464 URL: https://issues.apache.org/jira/browse/DRILL-3464 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Jinfeng Ni Fix For: 1.2.0 Attachments: DRILL-3464.patch We hit IOOB while performing concat() on a single input in DrillOptiq. Below is the stack trace: at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_67] at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_67] at org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.getDrillFunctionFromOptiqCall(DrillOptiq.java:373) ~[classes/:na] at org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:106) ~[classes/:na] at org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:77) ~[classes/:na] at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) ~[classes/:na] at org.apache.drill.exec.planner.logical.DrillOptiq.toDrill(DrillOptiq.java:74) ~[classes/:na] at org.apache.drill.exec.planner.common.DrillProjectRelBase.getProjectExpressions(DrillProjectRelBase.java:111) ~[classes/:na] at org.apache.drill.exec.planner.physical.ProjectPrel.getPhysicalOperator(ProjectPrel.java:57) ~[classes/:na] at org.apache.drill.exec.planner.physical.ScreenPrel.getPhysicalOperator(ScreenPrel.java:51) ~[classes/:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPop(DefaultSqlHandler.java:392) ~[classes/:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:167) ~[classes/:na] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178) ~[classes/:na] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:903) [classes/:na] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:242) [classes/:na] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3464) Index out of bounds exception while performing concat()
[ https://issues.apache.org/jira/browse/DRILL-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3464: --- Attachment: DRILL-3464.patch [~jni] could you please review. Index out of bounds exception while performing concat() --- Key: DRILL-3464 URL: https://issues.apache.org/jira/browse/DRILL-3464 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 1.2.0 Attachments: DRILL-3464.patch We hit IOOB while performing concat() on a single input in DrillOptiq. Below is the stack trace: at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_67] at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_67] at org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.getDrillFunctionFromOptiqCall(DrillOptiq.java:373) ~[classes/:na] at org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:106) ~[classes/:na] at org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:77) ~[classes/:na] at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) ~[classes/:na] at org.apache.drill.exec.planner.logical.DrillOptiq.toDrill(DrillOptiq.java:74) ~[classes/:na] at org.apache.drill.exec.planner.common.DrillProjectRelBase.getProjectExpressions(DrillProjectRelBase.java:111) ~[classes/:na] at org.apache.drill.exec.planner.physical.ProjectPrel.getPhysicalOperator(ProjectPrel.java:57) ~[classes/:na] at org.apache.drill.exec.planner.physical.ScreenPrel.getPhysicalOperator(ScreenPrel.java:51) ~[classes/:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPop(DefaultSqlHandler.java:392) ~[classes/:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:167) ~[classes/:na] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178) ~[classes/:na] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:903) [classes/:na] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:242) [classes/:na] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3464) Index out of bounds exception while performing concat()
[ https://issues.apache.org/jira/browse/DRILL-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3464: --- Attachment: DRILL-3464.patch Index out of bounds exception while performing concat() --- Key: DRILL-3464 URL: https://issues.apache.org/jira/browse/DRILL-3464 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Jinfeng Ni Fix For: 1.2.0 Attachments: DRILL-3464.patch We hit IOOB while performing concat() on a single input in DrillOptiq. Below is the stack trace: at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_67] at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_67] at org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.getDrillFunctionFromOptiqCall(DrillOptiq.java:373) ~[classes/:na] at org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:106) ~[classes/:na] at org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:77) ~[classes/:na] at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) ~[classes/:na] at org.apache.drill.exec.planner.logical.DrillOptiq.toDrill(DrillOptiq.java:74) ~[classes/:na] at org.apache.drill.exec.planner.common.DrillProjectRelBase.getProjectExpressions(DrillProjectRelBase.java:111) ~[classes/:na] at org.apache.drill.exec.planner.physical.ProjectPrel.getPhysicalOperator(ProjectPrel.java:57) ~[classes/:na] at org.apache.drill.exec.planner.physical.ScreenPrel.getPhysicalOperator(ScreenPrel.java:51) ~[classes/:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPop(DefaultSqlHandler.java:392) ~[classes/:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:167) ~[classes/:na] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178) ~[classes/:na] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:903) [classes/:na] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:242) [classes/:na] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3464) Index out of bounds exception while performing concat()
[ https://issues.apache.org/jira/browse/DRILL-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3464: --- Attachment: (was: DRILL-3464.patch) Index out of bounds exception while performing concat() --- Key: DRILL-3464 URL: https://issues.apache.org/jira/browse/DRILL-3464 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Jinfeng Ni Fix For: 1.2.0 Attachments: DRILL-3464.patch We hit IOOB while performing concat() on a single input in DrillOptiq. Below is the stack trace: at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_67] at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_67] at org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.getDrillFunctionFromOptiqCall(DrillOptiq.java:373) ~[classes/:na] at org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:106) ~[classes/:na] at org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:77) ~[classes/:na] at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) ~[classes/:na] at org.apache.drill.exec.planner.logical.DrillOptiq.toDrill(DrillOptiq.java:74) ~[classes/:na] at org.apache.drill.exec.planner.common.DrillProjectRelBase.getProjectExpressions(DrillProjectRelBase.java:111) ~[classes/:na] at org.apache.drill.exec.planner.physical.ProjectPrel.getPhysicalOperator(ProjectPrel.java:57) ~[classes/:na] at org.apache.drill.exec.planner.physical.ScreenPrel.getPhysicalOperator(ScreenPrel.java:51) ~[classes/:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPop(DefaultSqlHandler.java:392) ~[classes/:na] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:167) ~[classes/:na] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178) ~[classes/:na] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:903) [classes/:na] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:242) [classes/:na] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3463) Unit test of project pushdown in TestUnionAll should put more precisely plan attribute in plan verification.
[ https://issues.apache.org/jira/browse/DRILL-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617563#comment-14617563 ] Mehant Baid commented on DRILL-3463: Looks good. +1 Unit test of project pushdown in TestUnionAll should put more precisely plan attribute in plan verification. -- Key: DRILL-3463 URL: https://issues.apache.org/jira/browse/DRILL-3463 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Reporter: Jinfeng Ni Assignee: Mehant Baid Fix For: 1.2.0 Attachments: 0001-DRILL-3463-Unit-test-of-project-pushdown-in-TestUnio.patch As part of fix for DRILL-2802, it was discovered that several unit test cases for project pushdown in TestUnionAll did not put the desired plan attributes in to the expected plan result. To verify project pushdown is working properly, one simple way is to verify that the the column list in the Scan operator contains the desired columns. This should be the part of plan verification. However, the unit test cases in TestUnionAll did not do that. In stead, it tries to match a pattern of Project -- Scan, which seems not serving the purpose it desired. For instance, {code} final String[] expectedPlan = {UnionAll.*\n. + *Project.*\n + .*Scan.*\n + {code} should be replaced by {code} final String[] expectedPlan = {UnionAll.*\n. + *Project.*\n + .*Scan.*columns=\\[`n_comment`, `n_nationkey`, `n_name`\\].*\n {code} if we want to verify the column 'n_comment', 'n_nationkey', 'n_name' are pushed into Scan operator. To fix this, modify the expected plan result, such that it contains the plan attributes that should be able to verify whether the project pushdown is working or not. This will help catch project pushdown failure, and avoid causing more false alarm in plan verification. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3056) Numeric literal in an IN list is casted to decimal even when decimal type is disabled
[ https://issues.apache.org/jira/browse/DRILL-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid resolved DRILL-3056. Resolution: Fixed Even though the record type indicates Decimal type when the IN list is converted we still use double data type. Numeric literal in an IN list is casted to decimal even when decimal type is disabled - Key: DRILL-3056 URL: https://issues.apache.org/jira/browse/DRILL-3056 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Mehant Baid Fix For: 1.2.0 {code} 0: jdbc:drill:schema=dfs select * from sys.options where name like '%decimal%'; +++++++++ |name|kind|type| status | num_val | string_val | bool_val | float_val | +++++++++ | planner.enable_decimal_data_type | BOOLEAN| SYSTEM | DEFAULT| null | null | false | null | +++++++++ 1 row selected (0.212 seconds) {code} In list that contains more than 20 numeric literals. We are casting number with the decimal point to decimal type even though decimal type is disabled: {code} 0: jdbc:drill:schema=dfs explain plan including all attributes for select * from t1 where a1 in (1,2,3,4,5,6,7,8,9,0,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25.0); +++ |text|json| +++ | 00-00Screen : rowType = RecordType(ANY *): rowcount = 10.0, cumulative cost = {24.0 rows, 158.0 cpu, 0.0 io, 0.0 network, 35.2 memory}, id = 4921 00-01 Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 10.0, cumulative cost = {23.0 rows, 157.0 cpu, 0.0 io, 0.0 network, 35.2 memory}, id = 4920 00-02Project(T7¦¦*=[$0]) : rowType = RecordType(ANY T7¦¦*): rowcount = 10.0, cumulative cost = {23.0 rows, 157.0 cpu, 0.0 io, 0.0 network, 35.2 memory}, id = 4919 00-03 HashJoin(condition=[=($2, $3)], joinType=[inner]) : rowType = RecordType(ANY T7¦¦*, ANY a1, ANY a10, DECIMAL(11, 1) ROW_VALUE): rowcount = 10.0, cumulative cost = {23.0 rows, 157.0 cpu, 0.0 io, 0.0 network, 35.2 memory}, id = 4918 00-05Project(T7¦¦*=[$0], a1=[$1], a10=[$1]) : rowType = RecordType(ANY T7¦¦*, ANY a1, ANY a10): rowcount = 10.0, cumulative cost = {10.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4915 00-07 Project(T7¦¦*=[$0], a1=[$1]) : rowType = RecordType(ANY T7¦¦*, ANY a1): rowcount = 10.0, cumulative cost = {10.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4914 00-08Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/subqueries/t1]], selectionRoot=/drill/testdata/subqueries/t1, numFiles=1, columns=[`*`]]]) : rowType = (DrillRecordRow[*, a1]): rowcount = 10.0, cumulative cost = {10.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4913 00-04HashAgg(group=[{0}]) : rowType = RecordType(DECIMAL(11, 1) ROW_VALUE): rowcount = 1.0, cumulative cost = {2.0 rows, 9.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 4917 00-06 Values : rowType = RecordType(DECIMAL(11, 1) ROW_VALUE): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4916 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3128) LENGTH(..., CAST(... AS VARCHAR(0) ) ) yields ClassCastException
[ https://issues.apache.org/jira/browse/DRILL-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3128: --- Fix Version/s: (was: 1.2.0) 1.4.0 LENGTH(..., CAST(... AS VARCHAR(0) ) ) yields ClassCastException Key: DRILL-3128 URL: https://issues.apache.org/jira/browse/DRILL-3128 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Reporter: Daniel Barclay (Drill) Assignee: Mehant Baid Fix For: 1.4.0 Trying to make a function call with a function name of {{LENGTH}}, with two arguments, and with the second argument being a cast expression having a target type of {{VARCHAR(0)}} yields a {{ClassCastException}} (at least for several cases of source expression): {noformat} 0: jdbc:drill:zk=local SELECT LENGTH(1, CAST('x' AS VARCHAR(0) ) ) FROM INFORMATION_SCHEMA.CATALOGS; Error: SYSTEM ERROR: java.lang.ClassCastException: org.apache.drill.common.expression.CastExpression cannot be cast to org.apache.drill.common.expression.ValueExpressions$QuotedString [Error Id: 1860730b-b69b-4400-bb2c-935a56aa456e on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:zk=local SELECT LENGTH(1, CAST(1 AS VARCHAR(0) ) ) FROM INFORMATION_SCHEMA.CATALOGS; Error: SYSTEM ERROR: java.lang.ClassCastException: org.apache.drill.common.expression.CastExpression cannot be cast to org.apache.drill.common.expression.ValueExpressions$QuotedString [Error Id: 476c4848-4b53-4c1e-9005-2bab3a2a91a4 on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:zk=local SELECT LENGTH(1, CAST(NULL AS VARCHAR(0) ) ) FROM INFORMATION_SCHEMA.CATALOGS; Error: SYSTEM ERROR: java.lang.ClassCastException: org.apache.drill.common.expression.TypedNullConstant cannot be cast to org.apache.drill.common.expression.ValueExpressions$QuotedString [Error Id: d888a336-2b18-45d9-a5e8-f4c2406a292e on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:zk=local {noformat} This case (not with {{VARCHAR(0)}}) also yields a {{ClassCastException}}: {noformat} 0: jdbc:drill:zk=local SELECT LENGTH(1, CAST(1 AS VARCHAR(2) ) ) FROM INFORMATION_SCHEMA.CATALOGS; Error: SYSTEM ERROR: java.lang.ClassCastException: org.apache.drill.common.expression.CastExpression cannot be cast to org.apache.drill.common.expression.ValueExpressions$QuotedString [Error Id: 04bd6cb1-2dd7-4938-ab9b-4d460aaaf05f on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:zk=local {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-1951) Can't cast numeric value with decimal point read from CSV file into integer data type
[ https://issues.apache.org/jira/browse/DRILL-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-1951: --- Fix Version/s: (was: 1.2.0) 1.4.0 Can't cast numeric value with decimal point read from CSV file into integer data type - Key: DRILL-1951 URL: https://issues.apache.org/jira/browse/DRILL-1951 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Mehant Baid Fix For: 1.4.0 sales.csv file: {code} 997,Ford,ME350,3000.00, comment#1 1999,Chevy,Venture,4900.00, comment#2 1999,Chevy,Venture,5000.00, comment#3 1996,Jeep,Cherokee,1.01, comment#4 0: jdbc:drill:schema=dfs select cast(columns[3] as decimal(18,2)) from `sales.csv`; ++ | EXPR$0 | ++ | 3000.00| | 4900.00| | 5000.00| | 1.01 | ++ 4 rows selected (0.093 seconds) {code} -- Can cast to decimal {code} 0: jdbc:drill:schema=dfs select cast(columns[3] as decimal(18,2)) from `sales.csv`; ++ | EXPR$0 | ++ | 3000.00| | 4900.00| | 5000.00| | 1.01 | ++ 4 rows selected (0.095 seconds) {code} -- Can cast to float {code} 0: jdbc:drill:schema=dfs select cast(columns[3] as float) from `sales.csv`; ++ | EXPR$0 | ++ | 3000.0 | | 4900.0 | | 5000.0 | | 1.01 | ++ 4 rows selected (0.112 seconds) {code}-- Can't cast to INT/BIGINT {code} 0: jdbc:drill:schema=dfs select cast(columns[3] as bigint) from `sales.csv`; Query failed: Query failed: Failure while running fragment., 3000.00 [ 4818451a-c731-48a9-9992-1e81ab1d520d on atsqa4-134.qa.lab:31010 ] [ 4818451a-c731-48a9-9992-1e81ab1d520d on atsqa4-134.qa.lab:31010 ] Error: exception while executing query: Failure while executing query. (state=,code=0) {code} -- Same works with json/parquet files {code} 0: jdbc:drill:schema=dfs select a1 from `t1.json`; ++ | a1 | ++ | 10.01 | ++ 1 row selected (0.077 seconds) 0: jdbc:drill:schema=dfs select cast(a1 as int) from `t1.json`; ++ | EXPR$0 | ++ | 10 | ++ 0: jdbc:drill:schema=dfs select * from test_cast; ++ | a1 | ++ | 10.0100| ++ 1 row selected (0.06 seconds) 0: jdbc:drill:schema=dfs select cast(a1 as int) from test_cast; ++ | EXPR$0 | ++ | 10 | ++ 1 row selected (0.094 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3460) Implement function validation in Drill
Mehant Baid created DRILL-3460: -- Summary: Implement function validation in Drill Key: DRILL-3460 URL: https://issues.apache.org/jira/browse/DRILL-3460 Project: Apache Drill Issue Type: Improvement Reporter: Mehant Baid Assignee: Mehant Baid Fix For: 1.3.0 Since the schema of the table is not known during the validation phase of Calcite, Drill ends up skipping most of the validation checks in Calcite. This causes certain problems at execution time, for example when we fail function resolution or function execution due to incorrect types provided to the function. The worst manifestation of this problem is in the case when Drill tries to apply implicit casting and produces incorrect results. There are cases when its fine the apply the implicit cast but it doesn't make sense for a particular function. This JIRA is aimed to provide a new approach to be able to perform validation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2860) Unable to cast integer column from parquet file to interval day
[ https://issues.apache.org/jira/browse/DRILL-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-2860: --- Fix Version/s: (was: 1.2.0) 1.3.0 Unable to cast integer column from parquet file to interval day --- Key: DRILL-2860 URL: https://issues.apache.org/jira/browse/DRILL-2860 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Reporter: Victoria Markman Assignee: Mehant Baid Fix For: 1.3.0 Attachments: t1.parquet I can cast numeric literal to interval day: {code} 0: jdbc:drill:schema=dfs select cast(1 as interval day) from t1; ++ | EXPR$0 | ++ | P1D| | P1D| | P1D| | P1D| | P1D| | P1D| | P1D| | P1D| | P1D| | P1D| ++ 10 rows selected (0.122 seconds) {code} Get an error when I'm trying to do the same from parquet file: {code} 0: jdbc:drill:schema=dfs select cast(a1 as interval day) from t1 where a1 = 1; Query failed: SYSTEM ERROR: Invalid format: 1 Fragment 0:0 [6a4adf04-f3db-4feb-8010-ebc3bfced1e3 on atsqa4-134.qa.lab:31010] (java.lang.IllegalArgumentException) Invalid format: 1 org.joda.time.format.PeriodFormatter.parseMutablePeriod():326 org.joda.time.format.PeriodFormatter.parsePeriod():304 org.joda.time.Period.parse():92 org.joda.time.Period.parse():81 org.apache.drill.exec.test.generated.ProjectorGen180.doEval():77 org.apache.drill.exec.test.generated.ProjectorGen180.projectRecords():62 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():170 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():130 org.apache.drill.exec.record.AbstractRecordBatch.next():144 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():118 org.apache.drill.exec.physical.impl.BaseRootExec.next():74 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80 org.apache.drill.exec.physical.impl.BaseRootExec.next():64 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():198 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():192 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():415 org.apache.hadoop.security.UserGroupInformation.doAs():1469 org.apache.drill.exec.work.fragment.FragmentExecutor.run():192 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1145 java.util.concurrent.ThreadPoolExecutor$Worker.run():615 java.lang.Thread.run():745 Error: exception while executing query: Failure while executing query. (state=,code=0) {code} If I try casting a1 to an integer I run into drill-2859 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2456) regexp_replace using hex codes fails on larger JSON data sets
[ https://issues.apache.org/jira/browse/DRILL-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-2456: --- Fix Version/s: (was: 1.2.0) 1.3.0 regexp_replace using hex codes fails on larger JSON data sets - Key: DRILL-2456 URL: https://issues.apache.org/jira/browse/DRILL-2456 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 0.7.0 Environment: Drill 0.7 MapR 4.0.1 CentOS Reporter: Andries Engelbrecht Assignee: Mehant Baid Fix For: 1.3.0 Attachments: drillbit.log This query works with only 1 file select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id) from dfs.twitter.`/feed/2015/03/13/17/FlumeData.1426267859699.json` group by `text` order by count(id) desc limit 10; This one fails with multiple files select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id) from dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit 10; Query failed: Query failed: Failure while trying to start remote fragment, Encountered an illegal char on line 1, column 31: '' [ 43ff1aa4-4a71-455d-b817-ec5eb8d179bb on twitternode:31010 ] Using text in regexp_replace does work for same dataset. This query works fine on full data set. select regexp_replace(`text`, '[^ -~¡-ÿ]', '°'), count(id) from dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit 10; Attached snippet drillbit.log for error -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3430) CAST to interval type doesn't accept standard-format strings
[ https://issues.apache.org/jira/browse/DRILL-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3430: --- Fix Version/s: (was: 1.2.0) 1.3.0 CAST to interval type doesn't accept standard-format strings Key: DRILL-3430 URL: https://issues.apache.org/jira/browse/DRILL-3430 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Reporter: Daniel Barclay (Drill) Assignee: Mehant Baid Fix For: 1.3.0 Cast specification evaluation is not compliant with the SQL standard. Mainly, it yields errors for standard-format strings that are specified to successfully yield interval values. In ISO/IEC 9075-2:2011(E) section 6.13 cast specification, General Rule 19 case b says that, in a cast specification casting to an interval type, a character string value that is a valid interval literal (interval literal) or unquoted interval string yields an interval value. (interval literal is the INTERVAL '1-6' YEAR TO MONTH syntax; unquoted interval string is the 1-6 syntax.) Drill currently rejects both of those syntaxes. Note the casts to type INTERVAL HOUR and the resulting error messages in the following: {noformat} 0: jdbc:drill:zk=local SELECT CAST( CAST( 'INTERVAL ''1'' HOUR' AS VARCHAR(100) ) AS INTERVAL HOUR) FROM INFORMATION_SCHEMA.CATALOGS; Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: INTERVAL '1' HOUR Fragment 0:0 [Error Id: b4bed61a-1efe-4e06-86d4-fff8f9829d50 on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:zk=local SELECT CAST( CAST( '1' AS VARCHAR(100) ) AS INTERVAL HOUR) FROM INFORMATION_SCHEMA.CATALOGS; Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: 1 Fragment 0:0 [Error Id: 91dec1ed-5cac-4235-93d7-49a2a0f03a1a on dev-linux2:31010] (state=,code=0) 0: jdbc:drill:zk=local {noformat} (The extra cast to VARCHAR is a workaround for a CHAR-vs.-VARCHAR bug.) Drill should accept the standard formats or at least document the non-compliance for users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)