[jira] [Commented] (DRILL-2282) Eliminate spaces, special characters from names in function templates

2016-02-11 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144172#comment-15144172
 ] 

Mehant Baid commented on DRILL-2282:


[~vitalii] You don't need hbase to reproduce this. You just need a plan that 
does not execute in a single fragment. You can probably use a similar test case 
like in the patch for DRILL-1496 to verify if this is still a problem.

> Eliminate spaces, special characters from names in function templates
> -
>
> Key: DRILL-2282
> URL: https://issues.apache.org/jira/browse/DRILL-2282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Mehant Baid
>Assignee: Vitalii Diravka
> Fix For: 1.6.0
>
> Attachments: DRILL-2282.patch
>
>
> Having spaces in the name of the functions causes issues while deserializing 
> such expressions when we try to read the plan fragment. As part of this JIRA 
> would like to clean up all the templates to not include special characters in 
> their names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2282) Eliminate spaces, special characters from names in function templates

2016-02-09 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140298#comment-15140298
 ] 

Mehant Baid commented on DRILL-2282:


[~parthc] There was a specific issue with 'similar' function as noted here: 
[DRILL-1496|https://issues.apache.org/jira/browse/DRILL-1496] that was fixed, 
but this is a more generic JIRA to make sure we don't run into a similar issue. 
If i recall correctly there was a problem in deserializing the plan fragment if 
we had a space while serializing the expression.

> Eliminate spaces, special characters from names in function templates
> -
>
> Key: DRILL-2282
> URL: https://issues.apache.org/jira/browse/DRILL-2282
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Mehant Baid
>Assignee: Vitalii Diravka
> Fix For: 1.6.0
>
> Attachments: DRILL-2282.patch
>
>
> Having spaces in the name of the functions causes issues while deserializing 
> such expressions when we try to read the plan fragment. As part of this JIRA 
> would like to clean up all the templates to not include special characters in 
> their names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3739) NPE on select from Hive for HBase table

2015-12-30 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-3739.

   Resolution: Fixed
Fix Version/s: (was: 1.4.0)
   1.5.0

Fixed in 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a

> NPE on select from Hive for HBase table
> ---
>
> Key: DRILL-3739
> URL: https://issues.apache.org/jira/browse/DRILL-3739
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: ckran
>Assignee: Mehant Baid
>Priority: Critical
> Fix For: 1.5.0
>
>
> For a table in HBase or MapR-DB with metadata created in Hive so that it can 
> be accessed through beeline or Hue. From Drill query fail with
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> NullPointerException [Error Id: 1cfd2a36-bc73-4a36-83ee-ac317b8e6cdb]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4192) Dir0 and Dir1 from drill-1.4 are messed up

2015-12-21 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-4192:
---
Assignee: Aman Sinha  (was: Mehant Baid)

> Dir0 and Dir1 from drill-1.4 are messed up
> --
>
> Key: DRILL-4192
> URL: https://issues.apache.org/jira/browse/DRILL-4192
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.4.0
>Reporter: Krystal
>Assignee: Aman Sinha
>Priority: Blocker
>
> I have the following directories:
> /drill/testdata/temp1/abc/dt=2014-12-30/lineitem.parquet
> /drill/testdata/temp1/abc/dt=2014-12-31/lineitem.parquet
> The following queries returned incorrect data.
> select dir0,dir1 from dfs.`/drill/testdata/temp1` limit 2;
> ++---+
> |  dir0  | dir1  |
> ++---+
> | dt=2014-12-30  | null  |
> | dt=2014-12-30  | null  |
> ++---+
> select dir0 from dfs.`/drill/testdata/temp1` limit 2;
> ++
> |  dir0  |
> ++
> | dt=2014-12-31  |
> | dt=2014-12-31  |
> ++



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2419) UDF that returns string representation of expression type

2015-12-02 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2419.

   Resolution: Fixed
Fix Version/s: (was: Future)
   1.3.0

Fixed in eb6325dc9b59291582cd7d3c3e5d02efd5d15906. 



> UDF that returns string representation of expression type
> -
>
> Key: DRILL-2419
> URL: https://issues.apache.org/jira/browse/DRILL-2419
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Victoria Markman
>Assignee: Steven Phillips
> Fix For: 1.3.0
>
>
> Suggested name: typeof (credit goes to Aman)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-2419) UDF that returns string representation of expression type

2015-12-01 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid reassigned DRILL-2419:
--

Assignee: Mehant Baid

> UDF that returns string representation of expression type
> -
>
> Key: DRILL-2419
> URL: https://issues.apache.org/jira/browse/DRILL-2419
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Victoria Markman
>Assignee: Mehant Baid
> Fix For: Future
>
>
> Suggested name: typeof (credit goes to Aman)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3739) NPE on select from Hive for HBase table

2015-12-01 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034297#comment-15034297
 ] 

Mehant Baid commented on DRILL-3739:


+1. 

> NPE on select from Hive for HBase table
> ---
>
> Key: DRILL-3739
> URL: https://issues.apache.org/jira/browse/DRILL-3739
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: ckran
>Priority: Critical
>
> For a table in HBase or MapR-DB with metadata created in Hive so that it can 
> be accessed through beeline or Hue. From Drill query fail with
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> NullPointerException [Error Id: 1cfd2a36-bc73-4a36-83ee-ac317b8e6cdb]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3893) Issue with Drill after Hive Alters the Table

2015-12-01 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034327#comment-15034327
 ] 

Mehant Baid commented on DRILL-3893:


lgtm +1

>  Issue with Drill after Hive Alters the Table
> -
>
> Key: DRILL-3893
> URL: https://issues.apache.org/jira/browse/DRILL-3893
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Storage - Hive
>Affects Versions: 1.0.0, 1.1.0
> Environment: DEV
>Reporter: arnab chatterjee
>
> I reproduced this again on another partitioned table with existing data.
> Providing some more details. I have enabled the version mode for errors. 
> Drill is unable to fetch the new column name that was introduced.This most 
> likely to me seems  to me that it’s still picking up the stale metadata of 
> hive.
> if (!tableColumns.contains(columnName)) {
> if (partitionNames.contains(columnName)) {
>   selectedPartitionNames.add(columnName);
> } else {
>   throw new ExecutionSetupException(String.format("Column %s does 
> not exist", columnName));
> }
>   }
> select testdata from testtable;
> Error: SYSTEM ERROR: ExecutionSetupException: Column testdata does not exist
> Fragment 0:0
> [Error Id: be5cccba-97f6-4cc4-94e8-c11a4c53c8f4 on x.x.com:]
>   (org.apache.drill.common.exceptions.ExecutionSetupException) Failure while 
> initializing HiveRecordReader: Column testdata does not exist
> org.apache.drill.exec.store.hive.HiveRecordReader.init():241
> org.apache.drill.exec.store.hive.HiveRecordReader.():138
> org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58
> org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():150
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106
> org.apache.drill.exec.physical.impl.ImplCreator.getExec():81
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():235
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (org.apache.drill.common.exceptions.ExecutionSetupException) 
> Column testdata does not exist
> org.apache.drill.exec.store.hive.HiveRecordReader.init():206
> org.apache.drill.exec.store.hive.HiveRecordReader.():138
> org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58
> org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():150
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106
> org.apache.drill.exec.physical.impl.ImplCreator.getExec():81
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():235
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)
> #
> Please note that this is a partitioned table with existing data.
> Does Drill Cache the Meta somewhere and hence it’s not getting reflected 
> immediately ?
> DRILL CLI
> > select x from xx;
> Error: SYSTEM ERROR: ExecutionSetupException: Column x does not exist
> Fragment 0:0
> [Error Id: 62086e22-1341-459e-87ce-430a24cc5119 on x.x.com:999] 
> (state=,code=0)
> HIVE CLI
> hive> describe formatted x;
> OK
> # col_name  data_type   comment
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4119) Skew in hash distribution for varchar (and possibly other) types of data

2015-11-30 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032229#comment-15032229
 ] 

Mehant Baid commented on DRILL-4119:


I think it makes sense to address that as a separate issue. Patch looks good 
otherwise. +1. 

> Skew in hash distribution for varchar (and possibly other) types of data
> 
>
> Key: DRILL-4119
> URL: https://issues.apache.org/jira/browse/DRILL-4119
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.4.0
>
>
> We are seeing substantial skew for an Id column that contains varchar data of 
> length 32.   It is easily reproducible by a group-by query: 
> {noformat}
> Explain plan for SELECT SomeId From table GROUP BY SomeId;
> ...
> 01-02  HashAgg(group=[{0}])
> 01-03Project(SomeId=[$0])
> 01-04  HashToRandomExchange(dist0=[[$0]])
> 02-01UnorderedMuxExchange
> 03-01  Project(SomeId=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
> 03-02HashAgg(group=[{0}])
> 03-03  Project(SomeId=[$0])
> {noformat}
> The string id happens to be of the following type: 
> {noformat}
> e4b4388e8865819126cb0e4dcaa7261d
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4119) Skew in hash distribution for varchar (and possibly other) types of data

2015-11-24 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15024951#comment-15024951
 ] 

Mehant Baid commented on DRILL-4119:


If we are returning different values from the original implementation then I 
feel we should fix that issue? I can help out to identify the differences.

> Skew in hash distribution for varchar (and possibly other) types of data
> 
>
> Key: DRILL-4119
> URL: https://issues.apache.org/jira/browse/DRILL-4119
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.4.0
>
>
> We are seeing substantial skew for an Id column that contains varchar data of 
> length 32.   It is easily reproducible by a group-by query: 
> {noformat}
> Explain plan for SELECT SomeId From table GROUP BY SomeId;
> ...
> 01-02  HashAgg(group=[{0}])
> 01-03Project(SomeId=[$0])
> 01-04  HashToRandomExchange(dist0=[[$0]])
> 02-01UnorderedMuxExchange
> 03-01  Project(SomeId=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
> 03-02HashAgg(group=[{0}])
> 03-03  Project(SomeId=[$0])
> {noformat}
> The string id happens to be of the following type: 
> {noformat}
> e4b4388e8865819126cb0e4dcaa7261d
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4071) Partition pruning fails when a Coalesce() function appears with partition filter

2015-11-11 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-4071:
---
Attachment: (was: DRILL-4071.patch)

> Partition pruning fails when a Coalesce() function appears with partition 
> filter
> 
>
> Key: DRILL-4071
> URL: https://issues.apache.org/jira/browse/DRILL-4071
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> Pruning fails for this query: 
> {code}
> 0: jdbc:drill:zk=local> explain plan for select count(*) from 
> dfs.`/Users/asinha/data/multilevel/parquet` where dir0 = 1994 and 
> coalesce(o_clerk, 'Clerk') = '';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 00-03  Project($f0=[0])
> 00-04SelectionVectorRemover
> 00-05  Filter(condition=[AND(=($0, 1994), =(CASE(IS NOT NULL($1), 
> $1, 'Clerk'), ''))])
> 00-06Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q1/orders_94_q1.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q2/orders_94_q2.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q3/orders_94_q3.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q4/orders_94_q4.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q1/orders_95_q1.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q2/orders_95_q2.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q3/orders_95_q3.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q4/orders_95_q4.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q1/orders_96_q1.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q2/orders_96_q2.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q3/orders_96_q3.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q4/orders_96_q4.parquet]],
>  selectionRoot=file:/Users/asinha/data/multilevel/parquet, numFiles=12, 
> usedMetadataFile=false, columns=[`dir0`, `o_clerk`]]])
> {code}
> The log indicates no partition filters were found: 
> {code}
> ...
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> {code}
> A preliminary analysis indicates that since the Coalesce gets converted to a 
> CASE(IS NOT NULL) expression, the filter analysis does not correctly 
> process the full expression tree.  At one point in 
> {{FindPartitionConditions.analyzeCall()}} I saw the operandStack had 3 
> elements in it: [NO_PUSH, NO_PUSH, PUSH] which seemed strange since I would 
> expect even number of elements. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4071) Partition pruning fails when a Coalesce() function appears with partition filter

2015-11-11 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-4071:
---
Assignee: Aman Sinha  (was: Mehant Baid)

> Partition pruning fails when a Coalesce() function appears with partition 
> filter
> 
>
> Key: DRILL-4071
> URL: https://issues.apache.org/jira/browse/DRILL-4071
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Attachments: DRILL-4071.patch
>
>
> Pruning fails for this query: 
> {code}
> 0: jdbc:drill:zk=local> explain plan for select count(*) from 
> dfs.`/Users/asinha/data/multilevel/parquet` where dir0 = 1994 and 
> coalesce(o_clerk, 'Clerk') = '';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 00-03  Project($f0=[0])
> 00-04SelectionVectorRemover
> 00-05  Filter(condition=[AND(=($0, 1994), =(CASE(IS NOT NULL($1), 
> $1, 'Clerk'), ''))])
> 00-06Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q1/orders_94_q1.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q2/orders_94_q2.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q3/orders_94_q3.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q4/orders_94_q4.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q1/orders_95_q1.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q2/orders_95_q2.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q3/orders_95_q3.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q4/orders_95_q4.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q1/orders_96_q1.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q2/orders_96_q2.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q3/orders_96_q3.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q4/orders_96_q4.parquet]],
>  selectionRoot=file:/Users/asinha/data/multilevel/parquet, numFiles=12, 
> usedMetadataFile=false, columns=[`dir0`, `o_clerk`]]])
> {code}
> The log indicates no partition filters were found: 
> {code}
> ...
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> {code}
> A preliminary analysis indicates that since the Coalesce gets converted to a 
> CASE(IS NOT NULL) expression, the filter analysis does not correctly 
> process the full expression tree.  At one point in 
> {{FindPartitionConditions.analyzeCall()}} I saw the operandStack had 3 
> elements in it: [NO_PUSH, NO_PUSH, PUSH] which seemed strange since I would 
> expect even number of elements. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4071) Partition pruning fails when a Coalesce() function appears with partition filter

2015-11-11 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-4071:
---
Attachment: DRILL-4071.patch

[~amansinha100] can you please review.

> Partition pruning fails when a Coalesce() function appears with partition 
> filter
> 
>
> Key: DRILL-4071
> URL: https://issues.apache.org/jira/browse/DRILL-4071
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Mehant Baid
> Attachments: DRILL-4071.patch
>
>
> Pruning fails for this query: 
> {code}
> 0: jdbc:drill:zk=local> explain plan for select count(*) from 
> dfs.`/Users/asinha/data/multilevel/parquet` where dir0 = 1994 and 
> coalesce(o_clerk, 'Clerk') = '';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 00-03  Project($f0=[0])
> 00-04SelectionVectorRemover
> 00-05  Filter(condition=[AND(=($0, 1994), =(CASE(IS NOT NULL($1), 
> $1, 'Clerk'), ''))])
> 00-06Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q1/orders_94_q1.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q2/orders_94_q2.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q3/orders_94_q3.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q4/orders_94_q4.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q1/orders_95_q1.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q2/orders_95_q2.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q3/orders_95_q3.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q4/orders_95_q4.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q1/orders_96_q1.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q2/orders_96_q2.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q3/orders_96_q3.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q4/orders_96_q4.parquet]],
>  selectionRoot=file:/Users/asinha/data/multilevel/parquet, numFiles=12, 
> usedMetadataFile=false, columns=[`dir0`, `o_clerk`]]])
> {code}
> The log indicates no partition filters were found: 
> {code}
> ...
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> {code}
> A preliminary analysis indicates that since the Coalesce gets converted to a 
> CASE(IS NOT NULL) expression, the filter analysis does not correctly 
> process the full expression tree.  At one point in 
> {{FindPartitionConditions.analyzeCall()}} I saw the operandStack had 3 
> elements in it: [NO_PUSH, NO_PUSH, PUSH] which seemed strange since I would 
> expect even number of elements. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4071) Partition pruning fails when a Coalesce() function appears with partition filter

2015-11-11 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-4071:
---
Attachment: DRILL-4071.patch

Thanks for catching that, forgot to clean it up. 

> Partition pruning fails when a Coalesce() function appears with partition 
> filter
> 
>
> Key: DRILL-4071
> URL: https://issues.apache.org/jira/browse/DRILL-4071
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Attachments: DRILL-4071.patch
>
>
> Pruning fails for this query: 
> {code}
> 0: jdbc:drill:zk=local> explain plan for select count(*) from 
> dfs.`/Users/asinha/data/multilevel/parquet` where dir0 = 1994 and 
> coalesce(o_clerk, 'Clerk') = '';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 00-03  Project($f0=[0])
> 00-04SelectionVectorRemover
> 00-05  Filter(condition=[AND(=($0, 1994), =(CASE(IS NOT NULL($1), 
> $1, 'Clerk'), ''))])
> 00-06Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q1/orders_94_q1.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q2/orders_94_q2.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q3/orders_94_q3.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1994/Q4/orders_94_q4.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q1/orders_95_q1.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q2/orders_95_q2.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q3/orders_95_q3.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q4/orders_95_q4.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q1/orders_96_q1.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q2/orders_96_q2.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q3/orders_96_q3.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1996/Q4/orders_96_q4.parquet]],
>  selectionRoot=file:/Users/asinha/data/multilevel/parquet, numFiles=12, 
> usedMetadataFile=false, columns=[`dir0`, `o_clerk`]]])
> {code}
> The log indicates no partition filters were found: 
> {code}
> ...
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> {code}
> A preliminary analysis indicates that since the Coalesce gets converted to a 
> CASE(IS NOT NULL) expression, the filter analysis does not correctly 
> process the full expression tree.  At one point in 
> {{FindPartitionConditions.analyzeCall()}} I saw the operandStack had 3 
> elements in it: [NO_PUSH, NO_PUSH, PUSH] which seemed strange since I would 
> expect even number of elements. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4025) Don't invoke getFileStatus() when metadata cache is available

2015-11-05 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-4025:
---
Assignee: Aman Sinha  (was: Mehant Baid)

> Don't invoke getFileStatus() when metadata cache is available
> -
>
> Key: DRILL-4025
> URL: https://issues.apache.org/jira/browse/DRILL-4025
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Mehant Baid
>Assignee: Aman Sinha
> Attachments: DRILL-4025.patch
>
>
> Currently we invoke getFileStatus() to list all the files under a directory 
> even when we have the metadata cache file. The information is already present 
> in the cache so we don't need to perform this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4025) Don't invoke getFileStatus() when metadata cache is available

2015-11-05 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-4025:
---
Attachment: DRILL-4025.patch

> Don't invoke getFileStatus() when metadata cache is available
> -
>
> Key: DRILL-4025
> URL: https://issues.apache.org/jira/browse/DRILL-4025
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Mehant Baid
>Assignee: Mehant Baid
> Attachments: DRILL-4025.patch
>
>
> Currently we invoke getFileStatus() to list all the files under a directory 
> even when we have the metadata cache file. The information is already present 
> in the cache so we don't need to perform this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4025) Don't invoke getFileStatus() when metadata cache is available

2015-11-05 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992829#comment-14992829
 ] 

Mehant Baid commented on DRILL-4025:


[~jnadeau] We aren't changing the behavior of checking if the cache file is in 
sync with the actual data. That check is done a couple of line earlier in the 
code 
[ParquetFormatPlugin.readBlockMeta()|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java#L229].
 What we are avoiding in my patch is the additional ls in 
[FileSelection.init()|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java#L141]
 to populate the FileStatus in the case it is null. However, I will run a small 
test to confirm this and report the result. 

> Don't invoke getFileStatus() when metadata cache is available
> -
>
> Key: DRILL-4025
> URL: https://issues.apache.org/jira/browse/DRILL-4025
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Mehant Baid
>Assignee: Mehant Baid
> Attachments: DRILL-4025.patch
>
>
> Currently we invoke getFileStatus() to list all the files under a directory 
> even when we have the metadata cache file. The information is already present 
> in the cache so we don't need to perform this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4025) Reduce getFileStatus() invocation for Parquet by 1

2015-11-05 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-4025:
---
Summary: Reduce getFileStatus() invocation for Parquet by 1  (was: Don't 
invoke getFileStatus() when metadata cache is available)

> Reduce getFileStatus() invocation for Parquet by 1
> --
>
> Key: DRILL-4025
> URL: https://issues.apache.org/jira/browse/DRILL-4025
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Mehant Baid
>Assignee: Mehant Baid
> Attachments: DRILL-4025.patch
>
>
> Currently we invoke getFileStatus() to list all the files under a directory 
> even when we have the metadata cache file. The information is already present 
> in the cache so we don't need to perform this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4025) Don't invoke getFileStatus() when metadata cache is available

2015-11-05 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992855#comment-14992855
 ] 

Mehant Baid commented on DRILL-4025:


Agreed, the title of the JIRA is a bit misleading, I'll change it. I ran a 
quick test and made sure the metadata cache and the data sync logic works as 
expected with my patch.

> Don't invoke getFileStatus() when metadata cache is available
> -
>
> Key: DRILL-4025
> URL: https://issues.apache.org/jira/browse/DRILL-4025
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Mehant Baid
>Assignee: Mehant Baid
> Attachments: DRILL-4025.patch
>
>
> Currently we invoke getFileStatus() to list all the files under a directory 
> even when we have the metadata cache file. The information is already present 
> in the cache so we don't need to perform this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4025) Don't invoke getFileStatus() when metadata cache is available

2015-11-03 Thread Mehant Baid (JIRA)
Mehant Baid created DRILL-4025:
--

 Summary: Don't invoke getFileStatus() when metadata cache is 
available
 Key: DRILL-4025
 URL: https://issues.apache.org/jira/browse/DRILL-4025
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.3.0
Reporter: Mehant Baid
Assignee: Mehant Baid


Currently we invoke getFileStatus() to list all the files under a directory 
even when we have the metadata cache file. The information is already present 
in the cache so we don't need to perform this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3941) Add timing instrumentation around Partition Pruning

2015-11-02 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3941:
---
Assignee: Aman Sinha  (was: Mehant Baid)

> Add timing instrumentation around Partition Pruning
> ---
>
> Key: DRILL-3941
> URL: https://issues.apache.org/jira/browse/DRILL-3941
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Aman Sinha
>
> We seem to spending a chunk time doing partition pruning, it would be good to 
> log timing information to indicate the amount of time we spend doing pruning. 
> A little more granularity to indicate the time taken to build the filter tree 
> and in the interpreter would also be good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3634) Hive Scan : Add fileCount (no of files scanned) or no of partitions scanned to the text plan

2015-11-02 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3634:
---
Assignee: Aman Sinha  (was: Mehant Baid)

> Hive Scan : Add fileCount (no of files scanned) or no of partitions scanned 
> to the text plan
> 
>
> Key: DRILL-3634
> URL: https://issues.apache.org/jira/browse/DRILL-3634
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Aman Sinha
> Fix For: Future
>
>
> The hive scan portion of the text plan only lists the files scanned. It would 
> be helpful if the text plan also had fileCount value or the number of 
> partitions scanned.
> Reason : Currently as part of our tests we are verifying plans using a regex 
> based verification and the expected regex is matching more than it should. 
> Fixing this might be hard. So if we have the fileCount/partitionCount as part 
> of the plan, the plan comparision will be more accurate



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3634) Hive Scan : Add fileCount (no of files scanned) or no of partitions scanned to the text plan

2015-11-02 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid reassigned DRILL-3634:
--

Assignee: Mehant Baid

> Hive Scan : Add fileCount (no of files scanned) or no of partitions scanned 
> to the text plan
> 
>
> Key: DRILL-3634
> URL: https://issues.apache.org/jira/browse/DRILL-3634
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Mehant Baid
> Fix For: Future
>
>
> The hive scan portion of the text plan only lists the files scanned. It would 
> be helpful if the text plan also had fileCount value or the number of 
> partitions scanned.
> Reason : Currently as part of our tests we are verifying plans using a regex 
> based verification and the expected regex is matching more than it should. 
> Fixing this might be hard. So if we have the fileCount/partitionCount as part 
> of the plan, the plan comparision will be more accurate



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3975) Partition Planning rule causes query failure due to IndexOutOfBoundsException on HDFS

2015-10-24 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972998#comment-14972998
 ] 

Mehant Baid commented on DRILL-3975:


This particular bug was happening when ParquetPruneScanRule was hitting IOOB in 
the logic you pointed out when there was no need to perform any splitting 
(since for auto partitioning scheme we get the partitioning column value from 
the file and not from the location). 

However while debugging this I found that the "selectionRoot" contained the 
scheme and "file" did not contain the scheme potentially causing IOOB you might 
be seeing. Stripping out the scheme makes sense, we cannot check for -1 
[Here|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/DFSPartitionLocation.java#L30]
 as it would cause the partitioning columns to be incorrectly empty. 

> Partition Planning rule causes query failure due to IndexOutOfBoundsException 
> on HDFS
> -
>
> Key: DRILL-3975
> URL: https://issues.apache.org/jira/browse/DRILL-3975
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jacques Nadeau
>
> In attempting to run the extended test suite provided by MapR, there are a 
> large number of queries that fail due to issues in the PruneScanRule and 
> specifically the DFSPartitionLocation constructor line 31. It is likely due 
> to issues with the code that are related to running on HDFS where this code 
> path has apparently not been tested.
> An example test query this type of failure occurred: 
> /src/drill-test-framework/resources/Functional/ctas/ctas_auto_partition/tpch0.01_multiple_partitions/data/q11.q
> Example stack trace below:
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> StringIndexOutOfBoundsException: String index out of range: -12
> [Error Id: f2941267-49b1-4f67-a17f-610ffb13fcb7 on 
> ip-172-31-30-32.us-west-2.compute.internal:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
> [drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
>  [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) 
> [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_85]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_85]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: Internal error: Error while 
> applying rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  [entries=[ReadEntryWithPath 
> [path=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2]],
>  
> selectionRoot=hdfs://ip-172-31-30-32:54310/drill/testdata/ctas_auto_partition/tpch_multiple_partitions/lineitem_twopart_ordered2,
>  numFiles=1, usedMetadataFile=false, columns=[`l_modline`, `l_moddate`]])]
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: Internal error: Error while applying 
> rule PruneScanRule:Filter_On_Scan_Parquet, args 
> [rel#43148:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#43147:Subset#4.LOGICAL.ANY([]).[],condition==($0,
>  1)), rel#43241:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, 
> ctasAutoPartition, 
> tpch_multiple_partitions/lineitem_twopart_ordered2],groupscan=ParquetGroupScan
>  

[jira] [Created] (DRILL-3965) Index out of bounds exception in partition pruning

2015-10-22 Thread Mehant Baid (JIRA)
Mehant Baid created DRILL-3965:
--

 Summary: Index out of bounds exception in partition pruning
 Key: DRILL-3965
 URL: https://issues.apache.org/jira/browse/DRILL-3965
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid


Hit IOOB while trying to perform partition pruning on a table that was created 
using CTAS auto partitioning with the below stack trace.

Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
range: -8
at java.lang.String.substring(String.java:1875) ~[na:1.7.0_79]
at 
org.apache.drill.exec.planner.DFSPartitionLocation.(DFSPartitionLocation.java:31)
 ~[drill-java-exec-1.2.0.jar:1.2.0]
at 
org.apache.drill.exec.planner.ParquetPartitionDescriptor.createPartitionSublists(ParquetPartitionDescriptor.java:126)
 ~[drill-java-exec-1.2.0.jar:1.2.0]
at 
org.apache.drill.exec.planner.AbstractPartitionDescriptor.iterator(AbstractPartitionDescriptor.java:53)
 ~[drill-java-exec-1.2.0.jar:1.2.0]
at 
org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:190)
 ~[drill-java-exec-1.2.0.jar:1.2.0]
at 
org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
 ~[drill-java-exec-1.2.0.jar:1.2.0]
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
 ~[calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3965) Index out of bounds exception in partition pruning

2015-10-22 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969986#comment-14969986
 ] 

Mehant Baid commented on DRILL-3965:


I don't think so, looking at the stack trace in the DRILL-3376 it seems like a 
separate issue.

> Index out of bounds exception in partition pruning
> --
>
> Key: DRILL-3965
> URL: https://issues.apache.org/jira/browse/DRILL-3965
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Aman Sinha
> Attachments: DRILL-3965.patch
>
>
> Hit IOOB while trying to perform partition pruning on a table that was 
> created using CTAS auto partitioning with the below stack trace.
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -8
>   at java.lang.String.substring(String.java:1875) ~[na:1.7.0_79]
>   at 
> org.apache.drill.exec.planner.DFSPartitionLocation.(DFSPartitionLocation.java:31)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.createPartitionSublists(ParquetPartitionDescriptor.java:126)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.AbstractPartitionDescriptor.iterator(AbstractPartitionDescriptor.java:53)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:190)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  ~[calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3965) Index out of bounds exception in partition pruning

2015-10-22 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3965:
---
Attachment: DRILL-3965.patch

[~amansinha100] can you please review.

> Index out of bounds exception in partition pruning
> --
>
> Key: DRILL-3965
> URL: https://issues.apache.org/jira/browse/DRILL-3965
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Mehant Baid
> Attachments: DRILL-3965.patch
>
>
> Hit IOOB while trying to perform partition pruning on a table that was 
> created using CTAS auto partitioning with the below stack trace.
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -8
>   at java.lang.String.substring(String.java:1875) ~[na:1.7.0_79]
>   at 
> org.apache.drill.exec.planner.DFSPartitionLocation.(DFSPartitionLocation.java:31)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.createPartitionSublists(ParquetPartitionDescriptor.java:126)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.AbstractPartitionDescriptor.iterator(AbstractPartitionDescriptor.java:53)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:190)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  ~[calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3965) Index out of bounds exception in partition pruning

2015-10-22 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3965:
---
Assignee: Aman Sinha  (was: Mehant Baid)

> Index out of bounds exception in partition pruning
> --
>
> Key: DRILL-3965
> URL: https://issues.apache.org/jira/browse/DRILL-3965
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Aman Sinha
> Attachments: DRILL-3965.patch
>
>
> Hit IOOB while trying to perform partition pruning on a table that was 
> created using CTAS auto partitioning with the below stack trace.
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -8
>   at java.lang.String.substring(String.java:1875) ~[na:1.7.0_79]
>   at 
> org.apache.drill.exec.planner.DFSPartitionLocation.(DFSPartitionLocation.java:31)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.createPartitionSublists(ParquetPartitionDescriptor.java:126)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.AbstractPartitionDescriptor.iterator(AbstractPartitionDescriptor.java:53)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:190)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  ~[calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3965) Index out of bounds exception in partition pruning

2015-10-22 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3965:
---
Attachment: (was: DRILL-3965.patch)

> Index out of bounds exception in partition pruning
> --
>
> Key: DRILL-3965
> URL: https://issues.apache.org/jira/browse/DRILL-3965
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Mehant Baid
> Attachments: DRILL-3965.patch
>
>
> Hit IOOB while trying to perform partition pruning on a table that was 
> created using CTAS auto partitioning with the below stack trace.
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -8
>   at java.lang.String.substring(String.java:1875) ~[na:1.7.0_79]
>   at 
> org.apache.drill.exec.planner.DFSPartitionLocation.(DFSPartitionLocation.java:31)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.createPartitionSublists(ParquetPartitionDescriptor.java:126)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.AbstractPartitionDescriptor.iterator(AbstractPartitionDescriptor.java:53)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:190)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  ~[calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3965) Index out of bounds exception in partition pruning

2015-10-22 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3965:
---
Attachment: DRILL-3965.patch

Updated patch with minor changes

> Index out of bounds exception in partition pruning
> --
>
> Key: DRILL-3965
> URL: https://issues.apache.org/jira/browse/DRILL-3965
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Mehant Baid
> Attachments: DRILL-3965.patch
>
>
> Hit IOOB while trying to perform partition pruning on a table that was 
> created using CTAS auto partitioning with the below stack trace.
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -8
>   at java.lang.String.substring(String.java:1875) ~[na:1.7.0_79]
>   at 
> org.apache.drill.exec.planner.DFSPartitionLocation.(DFSPartitionLocation.java:31)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.createPartitionSublists(ParquetPartitionDescriptor.java:126)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.AbstractPartitionDescriptor.iterator(AbstractPartitionDescriptor.java:53)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:190)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  ~[drill-java-exec-1.2.0.jar:1.2.0]
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  ~[calcite-core-1.4.0-drill-r5.jar:1.4.0-drill-r5]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3429) DrillAvgVarianceConvertlet may produce wrong results while rewriting stddev, variance

2015-10-19 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3429:
---
Attachment: (was: DRILL-3429.patch)

> DrillAvgVarianceConvertlet may produce wrong results while rewriting stddev, 
> variance
> -
>
> Key: DRILL-3429
> URL: https://issues.apache.org/jira/browse/DRILL-3429
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Aman Sinha
>Priority: Critical
> Fix For: 1.3.0
>
>
> DrillAvgVarianceConvertlet currently rewrites aggregate functions like avg, 
> stddev, variance to simple computations. 
> Eg: 
> Stddev( x ) => power(
>  (sum(x * x) - sum( x ) * sum( x ) / count( x ))
>  / count( x ),
>  .5)
> Consider the case when the input is an integer. Now the rewrite contains 
> multiplication and division, which will bind to functions that operate on 
> integers however the expected result should be a double and since double has 
> more precision than integer we should be operating on double during the 
> multiplication and division.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3429) DrillAvgVarianceConvertlet may produce wrong results while rewriting stddev, variance

2015-10-19 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3429:
---
Attachment: DRILL-3429.patch

Addressed review comment.

> DrillAvgVarianceConvertlet may produce wrong results while rewriting stddev, 
> variance
> -
>
> Key: DRILL-3429
> URL: https://issues.apache.org/jira/browse/DRILL-3429
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Aman Sinha
>Priority: Critical
> Fix For: 1.3.0
>
> Attachments: DRILL-3429.patch
>
>
> DrillAvgVarianceConvertlet currently rewrites aggregate functions like avg, 
> stddev, variance to simple computations. 
> Eg: 
> Stddev( x ) => power(
>  (sum(x * x) - sum( x ) * sum( x ) / count( x ))
>  / count( x ),
>  .5)
> Consider the case when the input is an integer. Now the rewrite contains 
> multiplication and division, which will bind to functions that operate on 
> integers however the expected result should be a double and since double has 
> more precision than integer we should be operating on double during the 
> multiplication and division.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3941) Add timing instrumentation around Partition Pruning

2015-10-15 Thread Mehant Baid (JIRA)
Mehant Baid created DRILL-3941:
--

 Summary: Add timing instrumentation around Partition Pruning
 Key: DRILL-3941
 URL: https://issues.apache.org/jira/browse/DRILL-3941
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid


We seem to spending a chunk time doing partition pruning, it would be good to 
log timing information to indicate the amount of time we spend doing pruning. A 
little more granularity to indicate the time taken to build the filter tree and 
in the interpreter would also be good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3429) DrillAvgVarianceConvertlet may produce wrong results while rewriting stddev, variance

2015-10-15 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3429:
---
Assignee: Aman Sinha  (was: Mehant Baid)

> DrillAvgVarianceConvertlet may produce wrong results while rewriting stddev, 
> variance
> -
>
> Key: DRILL-3429
> URL: https://issues.apache.org/jira/browse/DRILL-3429
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Aman Sinha
>Priority: Critical
> Fix For: 1.3.0
>
> Attachments: DRILL-3429.patch
>
>
> DrillAvgVarianceConvertlet currently rewrites aggregate functions like avg, 
> stddev, variance to simple computations. 
> Eg: 
> Stddev( x ) => power(
>  (sum(x * x) - sum( x ) * sum( x ) / count( x ))
>  / count( x ),
>  .5)
> Consider the case when the input is an integer. Now the rewrite contains 
> multiplication and division, which will bind to functions that operate on 
> integers however the expected result should be a double and since double has 
> more precision than integer we should be operating on double during the 
> multiplication and division.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3936) We don't handle out of memory condition during build phase of hash join

2015-10-14 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid reassigned DRILL-3936:
--

Assignee: Mehant Baid

> We don't handle out of memory condition during build phase of hash join
> ---
>
> Key: DRILL-3936
> URL: https://issues.apache.org/jira/browse/DRILL-3936
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Victoria Markman
>Assignee: Mehant Baid
>
> It looks like we just fall through ( see excerpt from HashJoinBatch.java 
> below )
> {code:java}
>   public void executeBuildPhase() throws SchemaChangeException, 
> ClassTransformationException, IOException {
> //Setup the underlying hash table
> // skip first batch if count is zero, as it may be an empty schema batch
> if (right.getRecordCount() == 0) {
>   for (final VectorWrapper w : right) {
> w.clear();
>   }
>   rightUpstream = next(right);
> }
> boolean moreData = true;
> while (moreData) {
>   switch (rightUpstream) {
>   case OUT_OF_MEMORY:
>   case NONE:
>   case NOT_YET:
>   case STOP:
> moreData = false;
> continue;
> ...
> {code}
> We don't handle it later either:
> {code:java}
>   public IterOutcome innerNext() {
> try {
>   /* If we are here for the first time, execute the build phase of the
>* hash join and setup the run time generated class for the probe side
>*/
>   if (state == BatchState.FIRST) {
> // Build the hash table, using the build side record batches.
> executeBuildPhase();
> //IterOutcome next = next(HashJoinHelper.LEFT_INPUT, 
> left);
> hashJoinProbe.setupHashJoinProbe(context, hyperContainer, left, 
> left.getRecordCount(), this, hashTable,
> hjHelper, joinType);
> // Update the hash table related stats for the operator
> updateStats(this.hashTable);
>   }
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3764) Support the ability to identify and/or skip records when a function evaluation fails

2015-10-09 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950005#comment-14950005
 ] 

Mehant Baid commented on DRILL-3764:


I had worked on providing a similar functionality with [~jnadeau] on providing 
a framework (annotations for errors in function template and necessary addition 
to the runtime code gen to handle errors) to be able to deal with errors in 
function evaluation. Here is the branch, 
https://github.com/mehant/drill/commit/3e81a776d1c1bb0ce7f64d8c5a905c87d71e42e0 
(this is old, most likely won't rebase cleanly, I can work on rebasing if 
deemed useful). The basic idea was to provide a way to specify different type 
of errors within the UDF and in case of an error use null for that row. 

> Support the ability to identify and/or skip records when a function 
> evaluation fails
> 
>
> Key: DRILL-3764
> URL: https://issues.apache.org/jira/browse/DRILL-3764
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.1.0
>Reporter: Aman Sinha
> Fix For: Future
>
>
> Drill can point out the filename and location of corrupted records in a file 
> but it does not have a good mechanism to deal with the following scenario: 
> Consider a text file with 2 records:
> {code}
> $ cat t4.csv
> 10,2001
> 11,http://www.cnn.com
> {code}
> {code}
> 0: jdbc:drill:zk=local> alter session set `exec.errors.verbose` = true;
> 0: jdbc:drill:zk=local> select cast(columns[0] as init), cast(columns[1] as 
> bigint) from dfs.`t4.csv`;
> Error: SYSTEM ERROR: NumberFormatException: http://www.cnn.com
> Fragment 0:0
> [Error Id: 72aad22c-a345-4100-9a57-dcd8436105f7 on 10.250.56.140:31010]
>   (java.lang.NumberFormatException) http://www.cnn.com
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeL():91
> 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varCharToLong():62
> org.apache.drill.exec.test.generated.ProjectorGen1.doEval():62
> org.apache.drill.exec.test.generated.ProjectorGen1.projectRecords():62
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():172
> {code}
> The problem is user does not have the context of where the error occurred 
> -either the file name or the record number.   This becomes a pain point 
> especially when CTAS is being used to do data conversion from (say) text 
> format to Parquet format.  The CTAS may be accessing thousands of files and 1 
> such casting (or another function) failure aborts the query. 
> It would substantially improve the user experience if we provided: 
> 1) the filename and record number where  this failure occurred
> 2) the ability to skip such records depending on a session option
> 3) the ability to write such records to a staging table for future ingestion
> Please see discussion on dev list: 
> http://mail-archives.apache.org/mod_mbox/drill-dev/201509.mbox/%3cCAFyDVvLuPLgTNZ56S6=J=9Vb=aBs=pdw7nrhkkdupbdxgfa...@mail.gmail.com%3e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3901) Performance regression with doing Explain of COUNT(*) over 100K files

2015-10-07 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947124#comment-14947124
 ] 

Mehant Baid commented on DRILL-3901:


+1. 
The change looks good to me. 

> Performance regression with doing Explain of COUNT(*) over 100K files
> -
>
> Key: DRILL-3901
> URL: https://issues.apache.org/jira/browse/DRILL-3901
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Aman Sinha
>Assignee: Mehant Baid
> Attachments: 
> 0001-DRILL-3901-Don-t-do-early-expansion-of-directory-in-.patch
>
>
> We are seeing a performance regression when doing an Explain of SELECT 
> COUNT(*) over 100K files in a flat directory (no subdirectories) on latest 
> master branch compared to a run that was done on Sept 26.   Some initial 
> details (I will have more later): 
> {code}
> master branch on Sept 26
>No metadata cache: 71.452 secs
>With metadata cache: 15.804 secs
> Latest master branch 
>No metadata cache: 110 secs
>With metadata cache: 32 secs
> {code}
> So, both cases show regression.  
> [~mehant] and I took an initial look at this and it appears we might be doing 
> the directory expansion twice.  
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3901) Performance regression with doing Explain of COUNT(*) over 100K files

2015-10-06 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945960#comment-14945960
 ] 

Mehant Baid commented on DRILL-3901:


Wanted to quickly update the status on this, I have a patch for avoiding 
listing files in a directory twice (will post patch soon). I am waiting for 
some performance feedback, will post the findings once I have them.  
[~sphillips] can you please file a separate JIRA for the issue you mentioned.

> Performance regression with doing Explain of COUNT(*) over 100K files
> -
>
> Key: DRILL-3901
> URL: https://issues.apache.org/jira/browse/DRILL-3901
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Aman Sinha
>Assignee: Mehant Baid
>
> We are seeing a performance regression when doing an Explain of SELECT 
> COUNT(*) over 100K files in a flat directory (no subdirectories) on latest 
> master branch compared to a run that was done on Sept 26.   Some initial 
> details (I will have more later): 
> {code}
> master branch on Sept 26
>No metadata cache: 71.452 secs
>With metadata cache: 15.804 secs
> Latest master branch 
>No metadata cache: 110 secs
>With metadata cache: 32 secs
> {code}
> So, both cases show regression.  
> [~mehant] and I took an initial look at this and it appears we might be doing 
> the directory expansion twice.  
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3788) Directory based partition pruning not taking effect with metadata caching

2015-09-24 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3788:
---
Attachment: DRILL-3788.patch

[~sphillips] can you please review.

> Directory based partition pruning not taking effect with metadata caching
> -
>
> Key: DRILL-3788
> URL: https://issues.apache.org/jira/browse/DRILL-3788
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Rahul Challapalli
>Assignee: Mehant Baid
>Priority: Critical
> Fix For: 1.2.0
>
> Attachments: DRILL-3788.patch, lineitem.tgz, plan.txt
>
>
> git.commit.id.abbrev=240a455
> Partition Pruning did not take place for the below query after I executed the 
> "refresh table metadata command"
> {code}
>  explain plan for 
> select
>   l_returnflag,
>   l_linestatus
> from
>   `lineitem/2006/1`
> where
>   dir0=1 or dir0=2
> {code}
> The logs did not indicate that "pruning did not take place"
> Before executing the refresh table metadata command, partition pruning did 
> take effect
> I am not attaching the data set as it is larger than 10MB. Reach out to me if 
> you need more information



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3577) Counting nested fields on CTAS-created-parquet file/s reports inaccurate results

2015-09-23 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3577:
---
Fix Version/s: (was: 1.2.0)
   1.3.0

> Counting nested fields on CTAS-created-parquet file/s reports inaccurate 
> results
> 
>
> Key: DRILL-3577
> URL: https://issues.apache.org/jira/browse/DRILL-3577
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.1.0
>Reporter: Hanifi Gunes
>Assignee: Mehant Baid
>Priority: Critical
> Fix For: 1.3.0
>
>
> I have not tried this at a smaller scale nor on JSON file directly but the 
> following seems to re-prod the issue
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> This should read
> {code}
> Fragment Number of records written
> 0_0   20200
> {code}
> 3. Count on nested fields via
> {code:sql}
> select count(t.others.additional) from dfs.`tmp`.`tp` t
> OR
> select count(t.others.other) from dfs.`tmp`.`tp` t
> {code}
> reports no rows as follows
> {code}
> EXPR$0
> 0
> {code}
> While
> {code:sql}
> select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not 
> null
> {code}
> reports expected 200 rows
> {code}
> EXPR$0
> 200
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3819) Remove redundant filter for files start with "."

2015-09-22 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3819:
---
Assignee: Deneche A. Hakim  (was: Mehant Baid)

> Remove redundant filter for files start with "."
> 
>
> Key: DRILL-3819
> URL: https://issues.apache.org/jira/browse/DRILL-3819
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Deneche A. Hakim
> Fix For: 1.2.0
>
> Attachments: DRILL-3819.patch
>
>
> Due to a minor issue in resolving merge conflict between drop table and 
> refresh metadata, we now have two checks for the same filter (files starting 
> with "."). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3817) Refresh metadata does not work when used with sub schema

2015-09-22 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3817:
---
Attachment: DRILL-3817.patch

Minor patch, [~vkorukanti] please review.

> Refresh metadata does not work when used with sub schema  
> --
>
> Key: DRILL-3817
> URL: https://issues.apache.org/jira/browse/DRILL-3817
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Mehant Baid
> Fix For: 1.2.0
>
> Attachments: DRILL-3817.patch
>
>
> refresh table metadata dfs.tmp.`lineitem` does not work, hit the following 
> exception
> org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: 
> org.apache.calcite.sql.SqlBasicCall cannot be cast to 
> org.apache.calcite.sql.SqlIdentifier
> If the sub schema is removed it works.
> refresh table metadata dfs.`/tmp/lineitem`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3819) Remove redundant filter for files start with "."

2015-09-22 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3819:
---
Attachment: DRILL-3819.patch

Its a minor patch, [~adeneche] please review.

> Remove redundant filter for files start with "."
> 
>
> Key: DRILL-3819
> URL: https://issues.apache.org/jira/browse/DRILL-3819
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Mehant Baid
> Fix For: 1.2.0
>
> Attachments: DRILL-3819.patch
>
>
> Due to a minor issue in resolving merge conflict between drop table and 
> refresh metadata, we now have two checks for the same filter (files starting 
> with "."). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3824) Cancelling the "refresh table metadata" command does not cancel it on the drillbit

2015-09-22 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903706#comment-14903706
 ] 

Mehant Baid commented on DRILL-3824:


This is a known issue not related to refresh table. We don't support 
cancellation during this stage, so commands like drop, show files etc will also 
have the same problem. We need to address this in a more broader sense.

> Cancelling the "refresh table metadata" command does not cancel it on the 
> drillbit
> --
>
> Key: DRILL-3824
> URL: https://issues.apache.org/jira/browse/DRILL-3824
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata, Query Planning & Optimization
>Reporter: Rahul Challapalli
>Assignee: Aman Sinha
>
> git.commit.id.abbrev=3c89b30
> I cancelled the below command from sqlline. As we can see, sqlline returned 
> immediately but on the backend the drillbit still continues executing the 
> "refresh" command. This is mis-leading to the end user.
> {code}
> 0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata 
> dfs.`/drill/testdata/tpch100_5files/lineitem`;
> Error: SQL statement execution canceled; ResultSet now closed. 
> (state=,code=0) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2424) Ignore hidden files in directory path

2015-09-22 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902736#comment-14902736
 ] 

Mehant Baid commented on DRILL-2424:


This was added recently. Drill should now ignore files beginning with a "." or 
"_"

> Ignore hidden files in directory path
> -
>
> Key: DRILL-2424
> URL: https://issues.apache.org/jira/browse/DRILL-2424
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON, Storage - Text & CSV
>Affects Versions: 0.7.0
>Reporter: Andries Engelbrecht
>Assignee: Steven Phillips
> Fix For: 1.2.0
>
>
> When streaming data to the DFS some records can be incomplete during the 
> temporary write phase for the last file(s). These file typically have a 
> different extension like '.tmp' or can be marked hidden with a prefix of '.'  
> .
> Querying the directory path will Drill will then cause a query error as some 
> records may not be complete in the temporary files. Having the ability to 
> have Drill ignore hidden files and/or to only read files of designated 
> extension in the workspace will resolve this problem.
> Example is using Flume to stream JSON files to a directory structure, the 
> HDFS sink creates .tmp files (can be hidden with . prefix) that contains 
> incomplete JSON objects till the file is closed and the .tmp extension (or 
> prefix) is removed. Attempting to query the directory structure with Drill 
> then results in errors due to the incomplete JSON object(s) in the tmp files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2424) Ignore hidden files in directory path

2015-09-22 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902743#comment-14902743
 ] 

Mehant Baid commented on DRILL-2424:


Looking at the code, there seems to have been some merge conflict issue between 
Drop table and Refresh metadata we now have the filter for files beginning with 
"." twice. Will file a JIRA and fix it.

> Ignore hidden files in directory path
> -
>
> Key: DRILL-2424
> URL: https://issues.apache.org/jira/browse/DRILL-2424
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON, Storage - Text & CSV
>Affects Versions: 0.7.0
>Reporter: Andries Engelbrecht
>Assignee: Mehant Baid
> Fix For: 1.2.0
>
>
> When streaming data to the DFS some records can be incomplete during the 
> temporary write phase for the last file(s). These file typically have a 
> different extension like '.tmp' or can be marked hidden with a prefix of '.'  
> .
> Querying the directory path will Drill will then cause a query error as some 
> records may not be complete in the temporary files. Having the ability to 
> have Drill ignore hidden files and/or to only read files of designated 
> extension in the workspace will resolve this problem.
> Example is using Flume to stream JSON files to a directory structure, the 
> HDFS sink creates .tmp files (can be hidden with . prefix) that contains 
> incomplete JSON objects till the file is closed and the .tmp extension (or 
> prefix) is removed. Attempting to query the directory structure with Drill 
> then results in errors due to the incomplete JSON object(s) in the tmp files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3819) Remove redundant filter for files start with "."

2015-09-22 Thread Mehant Baid (JIRA)
Mehant Baid created DRILL-3819:
--

 Summary: Remove redundant filter for files start with "."
 Key: DRILL-3819
 URL: https://issues.apache.org/jira/browse/DRILL-3819
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


Due to a minor issue in resolving merge conflict between drop table and refresh 
metadata, we now have two checks for the same filter (files starting with "."). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2424) Ignore hidden files in directory path

2015-09-22 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2424.

Resolution: Duplicate
  Assignee: Mehant Baid  (was: Steven Phillips)

> Ignore hidden files in directory path
> -
>
> Key: DRILL-2424
> URL: https://issues.apache.org/jira/browse/DRILL-2424
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON, Storage - Text & CSV
>Affects Versions: 0.7.0
>Reporter: Andries Engelbrecht
>Assignee: Mehant Baid
> Fix For: 1.2.0
>
>
> When streaming data to the DFS some records can be incomplete during the 
> temporary write phase for the last file(s). These file typically have a 
> different extension like '.tmp' or can be marked hidden with a prefix of '.'  
> .
> Querying the directory path will Drill will then cause a query error as some 
> records may not be complete in the temporary files. Having the ability to 
> have Drill ignore hidden files and/or to only read files of designated 
> extension in the workspace will resolve this problem.
> Example is using Flume to stream JSON files to a directory structure, the 
> HDFS sink creates .tmp files (can be hidden with . prefix) that contains 
> incomplete JSON objects till the file is closed and the .tmp extension (or 
> prefix) is removed. Attempting to query the directory structure with Drill 
> then results in errors due to the incomplete JSON object(s) in the tmp files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3817) Refresh metadata does not work when used with sub schema

2015-09-21 Thread Mehant Baid (JIRA)
Mehant Baid created DRILL-3817:
--

 Summary: Refresh metadata does not work when used with sub schema  
 Key: DRILL-3817
 URL: https://issues.apache.org/jira/browse/DRILL-3817
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


refresh table metadata dfs.tmp.`lineitem` does not work, hit the following 
exception

org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: 
org.apache.calcite.sql.SqlBasicCall cannot be cast to 
org.apache.calcite.sql.SqlIdentifier

If the sub schema is removed it works.
refresh table metadata dfs.`/tmp/lineitem`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3761) CastIntDecimal implementation should not update the input holder.

2015-09-15 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745877#comment-14745877
 ] 

Mehant Baid commented on DRILL-3761:


+1. 
We should also add logic to enforce the constraint that the input holders are 
immutable, this can be addressed in a separate JIRA.

> CastIntDecimal implementation should not update the input holder. 
> --
>
> Key: DRILL-3761
> URL: https://issues.apache.org/jira/browse/DRILL-3761
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Jinfeng Ni
>Assignee: Mehant Baid
> Attachments: 
> 0001-DRILL-3761-Modify-CastIntDecimal-implementation-so-t.patch
>
>
> CastIntDecimal implementation would update the input holder's value, which 
> may cause some side effect. This is especially true, when the run-time 
> generated code tries to re-use the holder for common expressions. 
> In general, Drill's build-in/UDF implementation had better not modify the 
> input holder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3535) Drop table support

2015-09-14 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-3535.

Resolution: Fixed

Fixed in 2a191847154203871454b229d8ef322766aa9ee4

> Drop table support
> --
>
> Key: DRILL-3535
> URL: https://issues.apache.org/jira/browse/DRILL-3535
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Mehant Baid
>Assignee: Mehant Baid
>
> Umbrella JIRA to track support for "Drop table" feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3535) Drop table support

2015-08-31 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724390#comment-14724390
 ] 

Mehant Baid commented on DRILL-3535:


[~amansinha100] [~vkorukanti] can you please review.

> Drop table support
> --
>
> Key: DRILL-3535
> URL: https://issues.apache.org/jira/browse/DRILL-3535
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Mehant Baid
>Assignee: Mehant Baid
>
> Umbrella JIRA to track support for "Drop table" feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3045) Drill is not partition pruning due to internal off-heap memory limit for planning phase

2015-08-28 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3045:
---
Attachment: (was: DRILL-3045.patch)

 Drill is not partition pruning due to internal off-heap memory limit for 
 planning phase
 ---

 Key: DRILL-3045
 URL: https://issues.apache.org/jira/browse/DRILL-3045
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.0.0
Reporter: Victoria Markman
Assignee: Mehant Baid
 Fix For: 1.2.0

 Attachments: DRILL-3045.patch


 The symptom is: we are running simple query of the form select x from t 
 where dir0='xyz and dir1='2015-01-01'; partition pruning works for a while 
 and then it stops working.
 Query does run (since we don't fail the query in the case when we failed to 
 prune) and return correct results. 
 drillbit.log
 {code}
 015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN  
 o.a.d.exec.memory.BufferAllocator - Unable to allocate buffer of size 5000 
 due to memory limit. Current allocation: 16776840
 java.lang.Exception: null
   at 
 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:220)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:231)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:333)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223)
  [optiq-core-0.9-drill-r20.jar:na]
   at 
 org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661)
  [optiq-core-0.9-drill-r20.jar:na]
   at 
 net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) 
 [optiq-core-0.9-drill-r20.jar:na]
   at 
 net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:275) 
 [optiq-core-0.9-drill-r20.jar:na]
   at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:206)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:773) 
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:204) 
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_65]
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_65]
   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
 2015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN  
 o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
 partition.
 java.lang.NullPointerException: null
   at 
 org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:334)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223)
  [optiq-core-0.9-drill-r20.jar:na]
   at 
 

[jira] [Updated] (DRILL-3045) Drill is not partition pruning due to internal off-heap memory limit for planning phase

2015-08-28 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3045:
---
Attachment: DRILL-3045.patch

addressed review comments.

 Drill is not partition pruning due to internal off-heap memory limit for 
 planning phase
 ---

 Key: DRILL-3045
 URL: https://issues.apache.org/jira/browse/DRILL-3045
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.0.0
Reporter: Victoria Markman
Assignee: Mehant Baid
 Fix For: 1.2.0

 Attachments: DRILL-3045.patch


 The symptom is: we are running simple query of the form select x from t 
 where dir0='xyz and dir1='2015-01-01'; partition pruning works for a while 
 and then it stops working.
 Query does run (since we don't fail the query in the case when we failed to 
 prune) and return correct results. 
 drillbit.log
 {code}
 015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN  
 o.a.d.exec.memory.BufferAllocator - Unable to allocate buffer of size 5000 
 due to memory limit. Current allocation: 16776840
 java.lang.Exception: null
   at 
 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:220)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:231)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:333)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223)
  [optiq-core-0.9-drill-r20.jar:na]
   at 
 org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661)
  [optiq-core-0.9-drill-r20.jar:na]
   at 
 net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) 
 [optiq-core-0.9-drill-r20.jar:na]
   at 
 net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:275) 
 [optiq-core-0.9-drill-r20.jar:na]
   at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:206)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:773) 
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:204) 
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_65]
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_65]
   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
 2015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN  
 o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
 partition.
 java.lang.NullPointerException: null
   at 
 org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:334)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223)
  [optiq-core-0.9-drill-r20.jar:na]
   at 
 

[jira] [Updated] (DRILL-3045) Drill is not partition pruning due to internal off-heap memory limit for planning phase

2015-08-28 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3045:
---
Attachment: DRILL-3045.patch

[~amansinha100] can you please review.

 Drill is not partition pruning due to internal off-heap memory limit for 
 planning phase
 ---

 Key: DRILL-3045
 URL: https://issues.apache.org/jira/browse/DRILL-3045
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.0.0
Reporter: Victoria Markman
Assignee: Mehant Baid
 Fix For: 1.2.0

 Attachments: DRILL-3045.patch


 The symptom is: we are running simple query of the form select x from t 
 where dir0='xyz and dir1='2015-01-01'; partition pruning works for a while 
 and then it stops working.
 Query does run (since we don't fail the query in the case when we failed to 
 prune) and return correct results. 
 drillbit.log
 {code}
 015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN  
 o.a.d.exec.memory.BufferAllocator - Unable to allocate buffer of size 5000 
 due to memory limit. Current allocation: 16776840
 java.lang.Exception: null
   at 
 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:220)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:231)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:333)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223)
  [optiq-core-0.9-drill-r20.jar:na]
   at 
 org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661)
  [optiq-core-0.9-drill-r20.jar:na]
   at 
 net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) 
 [optiq-core-0.9-drill-r20.jar:na]
   at 
 net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:275) 
 [optiq-core-0.9-drill-r20.jar:na]
   at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:206)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:773) 
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:204) 
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_65]
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_65]
   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
 2015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN  
 o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
 partition.
 java.lang.NullPointerException: null
   at 
 org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:334)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223)
  [optiq-core-0.9-drill-r20.jar:na]
   at 
 

[jira] [Updated] (DRILL-3045) Drill is not partition pruning due to internal heap memory limit

2015-08-27 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3045:
---
Summary: Drill is not partition pruning due to internal heap memory limit  
(was: Drill is leaking memory during partition pruning if directory tree has 
lots of files)

 Drill is not partition pruning due to internal heap memory limit
 

 Key: DRILL-3045
 URL: https://issues.apache.org/jira/browse/DRILL-3045
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.0.0
Reporter: Victoria Markman
Assignee: Jacques Nadeau
 Fix For: 1.2.0


 The symptom is: we are running simple query of the form select x from t 
 where dir0='xyz and dir1='2015-01-01'; partition pruning works for a while 
 and then it stops working.
 Query does run (since we don't fail the query in the case when we failed to 
 prune) and return correct results. 
 drillbit.log
 {code}
 015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN  
 o.a.d.exec.memory.BufferAllocator - Unable to allocate buffer of size 5000 
 due to memory limit. Current allocation: 16776840
 java.lang.Exception: null
   at 
 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:220)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:231)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:333)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223)
  [optiq-core-0.9-drill-r20.jar:na]
   at 
 org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661)
  [optiq-core-0.9-drill-r20.jar:na]
   at 
 net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) 
 [optiq-core-0.9-drill-r20.jar:na]
   at 
 net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:275) 
 [optiq-core-0.9-drill-r20.jar:na]
   at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:206)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:773) 
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:204) 
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_65]
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_65]
   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
 2015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN  
 o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
 partition.
 java.lang.NullPointerException: null
   at 
 org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:334)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223)
  [optiq-core-0.9-drill-r20.jar:na]
   at 
 

[jira] [Commented] (DRILL-3313) Eliminate redundant #load methods and unit-test loading exporting of vectors

2015-08-27 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717584#comment-14717584
 ] 

Mehant Baid commented on DRILL-3313:


+1. 
Jason's review comments addressed in the patch submitted by Parth.

 Eliminate redundant #load methods and unit-test loading  exporting of vectors
 --

 Key: DRILL-3313
 URL: https://issues.apache.org/jira/browse/DRILL-3313
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Execution - Data Types
Affects Versions: 1.0.0
Reporter: Hanifi Gunes
Assignee: Hanifi Gunes
 Fix For: 1.2.0


 Vectors have multiple #load methods that are used to populate data from raw 
 buffers. It is relatively tough to reason, maintain and unit-test loading and 
 exporting of data since there is many redundant code around load methods. 
 This issue proposes to have single #load method conforming to VV#load(def, 
 buffer) signature eliminating all other #load overrides.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3045) Drill is not partition pruning due to internal off-heap memory limit for planning phase

2015-08-27 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3045:
---
Summary: Drill is not partition pruning due to internal off-heap memory 
limit for planning phase  (was: Drill is not partition pruning due to internal 
heap memory limit)

 Drill is not partition pruning due to internal off-heap memory limit for 
 planning phase
 ---

 Key: DRILL-3045
 URL: https://issues.apache.org/jira/browse/DRILL-3045
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.0.0
Reporter: Victoria Markman
Assignee: Mehant Baid
 Fix For: 1.2.0


 The symptom is: we are running simple query of the form select x from t 
 where dir0='xyz and dir1='2015-01-01'; partition pruning works for a while 
 and then it stops working.
 Query does run (since we don't fail the query in the case when we failed to 
 prune) and return correct results. 
 drillbit.log
 {code}
 015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN  
 o.a.d.exec.memory.BufferAllocator - Unable to allocate buffer of size 5000 
 due to memory limit. Current allocation: 16776840
 java.lang.Exception: null
   at 
 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:220)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.buffer(TopLevelAllocator.java:231)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:333)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223)
  [optiq-core-0.9-drill-r20.jar:na]
   at 
 org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:661)
  [optiq-core-0.9-drill-r20.jar:na]
   at 
 net.hydromatic.optiq.tools.Programs$RuleSetProgram.run(Programs.java:165) 
 [optiq-core-0.9-drill-r20.jar:na]
   at 
 net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:275) 
 [optiq-core-0.9-drill-r20.jar:na]
   at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:206)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.sql.handlers.ExplainHandler.getPlan(ExplainHandler.java:61)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:773) 
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:204) 
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_65]
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_65]
   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
 2015-04-19 15:54:22,027 [2acc305b-8f77-09af-1376-f6475c6a23c3:foreman] WARN  
 o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
 partition.
 java.lang.NullPointerException: null
   at 
 org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:334)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:185)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:187)
  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.apache.drill.exec.planner.logical.partition.PruneScanRule$2.onMatch(PruneScanRule.java:110)
  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
   at 
 org.eigenbase.relopt.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:223)
  

[jira] [Commented] (DRILL-3702) PartitionPruning hit ClassCastException in Interpreter when the pruning filter expression is of non-nullable type.

2015-08-25 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712046#comment-14712046
 ] 

Mehant Baid commented on DRILL-3702:


+1

 PartitionPruning hit ClassCastException in Interpreter when the pruning 
 filter expression is of non-nullable type.
 --

 Key: DRILL-3702
 URL: https://issues.apache.org/jira/browse/DRILL-3702
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Mehant Baid
 Fix For: 1.2.0

 Attachments: 
 0001-DRILL-3702-Fix-partition-pruning-rule-when-the-pruni.patch


 I have the following parquet table, created using partition by clause:
 {code}
 create table mypart (id, name) partition by (id) as select cast(n_regionkey 
 as varchar(20)), n_name from cp.`tpch/nation.parquet`;
 {code}
 The generated parquet table consists of 5 files, each representing a 
 partition:
 {code}
 0_0_1.parquet 0_0_2.parquet 0_0_3.parquet 0_0_4.parquet 0_0_5.parquet
 {code}
 For the following query, partition pruning works as expected:
 {code}
 select id, name from mypart where id  = '0' ;
 00-01  Project(id=[$1], name=[$0])
 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
 [path=/tmp/mypart/0_0_1.parquet]], selectionRoot=file:/tmp/mypart, 
 numFiles=1, columns=[`id`, `name`]]])
 selectionRoot : file:/tmp/mypart,
 fileSet : [ /tmp/mypart/0_0_1.parquet ],
 cost : 5.0
 {code}
 However, the following query would hit ClassCastException when PruneScanRule 
 calls interpreter to evaluate the filtering condition, which happens to be 
 non-nullable.
 {code}
 select id, name from mypart where concat(id,'')  = '0' ;
 00-05  Project(id=[$1], name=[$0])
 00-06Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=file:/tmp/mypart]], 
 selectionRoot=file:/tmp/mypart, numFiles=1, columns=[`id`, `name`]]])
 selectionRoot : file:/tmp/mypart,
 fileSet : [ /tmp/mypart/0_0_1.parquet, /tmp/mypart/0_0_4.parquet, 
 /tmp/mypart/0_0_5.parquet, /tmp/mypart/0_0_2.parquet, 
 /tmp/mypart/0_0_3.parquet ],
 cost : 25.0
   },
 {code}
 Here is the error for the ClassCastException, raised in Interpreter:
 {code}
 java.lang.ClassCastException: org.apache.drill.exec.expr.holders.BitHolder 
 cannot be cast to org.apache.drill.exec.expr.holders.NullableBitHolder
 {code}
 The cause of the problem is that PruneScanRule assumes the output type of a 
 filter condition is NullableBit, while in this case the filter condition is 
 Bit type, which leads to ClassCastException. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3690) Partitioning pruning produces wrong results when there are nested expressions in the filter

2015-08-22 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3690:
---
Assignee: Aman Sinha  (was: Mehant Baid)

 Partitioning pruning produces wrong results when there are nested expressions 
 in the filter
 ---

 Key: DRILL-3690
 URL: https://issues.apache.org/jira/browse/DRILL-3690
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Aman Sinha
Priority: Blocker
 Fix For: 1.2.0


 Consider the following query:
 select 1 from foo where dir0 not in (1994) and col1 not in ('bar');
 The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 'bar')))
 In FindPartitionCondition we rewrite the filter to cherry pick the partition 
 column conditions so the interpreter can evaluate it, however when the 
 expression contains more than two levels of nesting (in this case 
 AND(NOT(=))) ) the expression does not get rewritten correctly. In this case 
 the expression gets rewritten as: AND(=($1, 1994)). NOT is missing from the 
 rewritten expression producing wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3690) Partitioning pruning produces wrong results when there are nested expressions in the filter

2015-08-22 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708186#comment-14708186
 ] 

Mehant Baid commented on DRILL-3690:


[~amansinha100] can you please review.

 Partitioning pruning produces wrong results when there are nested expressions 
 in the filter
 ---

 Key: DRILL-3690
 URL: https://issues.apache.org/jira/browse/DRILL-3690
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Aman Sinha
Priority: Blocker
 Fix For: 1.2.0


 Consider the following query:
 select 1 from foo where dir0 not in (1994) and col1 not in ('bar');
 The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 'bar')))
 In FindPartitionCondition we rewrite the filter to cherry pick the partition 
 column conditions so the interpreter can evaluate it, however when the 
 expression contains more than two levels of nesting (in this case 
 AND(NOT(=))) ) the expression does not get rewritten correctly. In this case 
 the expression gets rewritten as: AND(=($1, 1994)). NOT is missing from the 
 rewritten expression producing wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3690) Partitioning pruning produces wrong results when there are nested expressions in the filter

2015-08-22 Thread Mehant Baid (JIRA)
Mehant Baid created DRILL-3690:
--

 Summary: Partitioning pruning produces wrong results when there 
are nested expressions in the filter
 Key: DRILL-3690
 URL: https://issues.apache.org/jira/browse/DRILL-3690
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
Priority: Blocker
 Fix For: 1.2.0


Consider the following query:
select 1 from foo where dir0 not in (1994) and dir1 not in (1995);

The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 1995)))
In FindPartitionCondition we rewrite the filter to cherry pick the partition 
column conditions so the interpreter can evaluate it, however when the 
expression contains more than two levels of nesting (in this case AND(NOT(=))) 
) the expression does not get rewritten correctly. In this case the expression 
gets rewritten as: AND(=($1, 1994), =($2, 1995)). NOT is missing from the 
rewritten expression producing wrong results.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3690) Partitioning pruning produces wrong results when there are nested expressions in the filter

2015-08-22 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3690:
---
Description: 
Consider the following query:
select 1 from foo where dir0 not in (1994) and col1 not in ('bar');

The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 'bar')))
In FindPartitionCondition we rewrite the filter to cherry pick the partition 
column conditions so the interpreter can evaluate it, however when the 
expression contains more than two levels of nesting (in this case AND(NOT(=))) 
) the expression does not get rewritten correctly. In this case the expression 
gets rewritten as: AND(=($1, 1994)). NOT is missing from the rewritten 
expression producing wrong results.



  was:
Consider the following query:
select 1 from foo where dir0 not in (1994) and dir1 not in (1995);

The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 1995)))
In FindPartitionCondition we rewrite the filter to cherry pick the partition 
column conditions so the interpreter can evaluate it, however when the 
expression contains more than two levels of nesting (in this case AND(NOT(=))) 
) the expression does not get rewritten correctly. In this case the expression 
gets rewritten as: AND(=($1, 1994), =($2, 1995)). NOT is missing from the 
rewritten expression producing wrong results.




 Partitioning pruning produces wrong results when there are nested expressions 
 in the filter
 ---

 Key: DRILL-3690
 URL: https://issues.apache.org/jira/browse/DRILL-3690
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
Priority: Blocker
 Fix For: 1.2.0


 Consider the following query:
 select 1 from foo where dir0 not in (1994) and col1 not in ('bar');
 The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 'bar')))
 In FindPartitionCondition we rewrite the filter to cherry pick the partition 
 column conditions so the interpreter can evaluate it, however when the 
 expression contains more than two levels of nesting (in this case 
 AND(NOT(=))) ) the expression does not get rewritten correctly. In this case 
 the expression gets rewritten as: AND(=($1, 1994)). NOT is missing from the 
 rewritten expression producing wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2737) Sqlline throws Runtime exception when JDBC ResultSet throws a SQLException

2015-08-20 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705150#comment-14705150
 ] 

Mehant Baid commented on DRILL-2737:


+1

 Sqlline throws Runtime exception when JDBC ResultSet throws a SQLException
 --

 Key: DRILL-2737
 URL: https://issues.apache.org/jira/browse/DRILL-2737
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - CLI
Reporter: Parth Chandra
Assignee: Parth Chandra
 Fix For: 1.2.0

 Attachments: DRILL-2737.patch


 This is a tracking bug to provide a patch to Sqlline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2625) org.apache.drill.common.StackTrace should follow standard stacktrace format

2015-08-13 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695860#comment-14695860
 ] 

Mehant Baid commented on DRILL-2625:


+1

 org.apache.drill.common.StackTrace should follow standard stacktrace format
 ---

 Key: DRILL-2625
 URL: https://issues.apache.org/jira/browse/DRILL-2625
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 0.8.0
Reporter: Daniel Barclay (Drill)
Assignee: Mehant Baid
 Fix For: 1.2.0


 org.apache.drill.common.StackTrace uses a different textual format than JDK's 
 standard format for stack traces.
 It should probably use the standard format so that its stack trace output can 
 be used by tools that already can parse the standard format to provide 
 functionality such as displaying the corresponding source.
 (After correcting for DRILL-2624, StackTrace formats stack traces like this:
 org.apache.drill.common.StackTrace.init:1
 org.apache.drill.exec.server.Drillbit.run:20
 org.apache.drill.jdbc.DrillConnectionImpl.init:232
 The normal form is like this:
   at 
 org.apache.drill.exec.memory.TopLevelAllocator.close(TopLevelAllocator.java:162)
   at 
 org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:75)
   at com.google.common.io.Closeables.close(Closeables.java:77)
 )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2625) org.apache.drill.common.StackTrace should follow standard stacktrace format

2015-08-13 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-2625:
---
Assignee: Chris Westin  (was: Mehant Baid)

 org.apache.drill.common.StackTrace should follow standard stacktrace format
 ---

 Key: DRILL-2625
 URL: https://issues.apache.org/jira/browse/DRILL-2625
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 0.8.0
Reporter: Daniel Barclay (Drill)
Assignee: Chris Westin
 Fix For: 1.2.0


 org.apache.drill.common.StackTrace uses a different textual format than JDK's 
 standard format for stack traces.
 It should probably use the standard format so that its stack trace output can 
 be used by tools that already can parse the standard format to provide 
 functionality such as displaying the corresponding source.
 (After correcting for DRILL-2624, StackTrace formats stack traces like this:
 org.apache.drill.common.StackTrace.init:1
 org.apache.drill.exec.server.Drillbit.run:20
 org.apache.drill.jdbc.DrillConnectionImpl.init:232
 The normal form is like this:
   at 
 org.apache.drill.exec.memory.TopLevelAllocator.close(TopLevelAllocator.java:162)
   at 
 org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:75)
   at com.google.common.io.Closeables.close(Closeables.java:77)
 )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3579) Drill on Hive query fails if partition table has __HIVE_DEFAULT_PARTITION__

2015-08-08 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14663054#comment-14663054
 ] 

Mehant Baid commented on DRILL-3579:


+1

 Drill on Hive query fails if partition table has __HIVE_DEFAULT_PARTITION__
 ---

 Key: DRILL-3579
 URL: https://issues.apache.org/jira/browse/DRILL-3579
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Hive
Affects Versions: 1.1.0
 Environment: Drill 1.1 on Hive 1.0
Reporter: Hao Zhu
Assignee: Venki Korukanti
Priority: Critical
 Fix For: 1.2.0

 Attachments: DRILL-3579-1.patch


 If Hive's partition table has __HIVE_DEFAULT_PARTITION__ in the case of null 
 values in the partition column, Drill on Hive query will fail.
 Minimum reproduce:
 1.Hive:
 {code}
 CREATE TABLE h1_testpart2(id INT) PARTITIONED BY(id2 int);
 set hive.exec.dynamic.partition.mode=nonstrict;
 INSERT OVERWRITE TABLE h1_testpart2 PARTITION(id2) SELECT 1 as id1 , 20150101 
 as id2 from h1_passwords limit 1;
 INSERT OVERWRITE TABLE h1_testpart2 PARTITION(id2) SELECT 1 as id1 , null as 
 id2 from h1_passwords limit 1;
 {code}
 2. Filesystem looks like:
 {code}
 h1 h1_testpart2]# ls -altr
 total 2
 drwxrwxrwx 89 mapr mapr 87 Jul 30 00:04 ..
 drwxr-xr-x  2 mapr mapr  1 Jul 30 00:05 id2=20150101
 drwxr-xr-x  2 mapr mapr  1 Jul 30 00:05 id2=__HIVE_DEFAULT_PARTITION__
 drwxr-xr-x  4 mapr mapr  2 Jul 30 00:05 .
 {code}
 3.Drill will fail:
 {code}
 select * from h1_testpart2;
 Error: SYSTEM ERROR: NumberFormatException: For input string: 
 __HIVE_DEFAULT_PARTITION__
 Fragment 0:0
 [Error Id: 509eb392-db9a-42f3-96ea-fb597425f49f on h1.poc.com:31010]
   (java.lang.reflect.UndeclaredThrowableException) null
 org.apache.hadoop.security.UserGroupInformation.doAs():1581
 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():136
 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():131
 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
 org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106
 org.apache.drill.exec.physical.impl.ImplCreator.getExec():81
 org.apache.drill.exec.work.fragment.FragmentExecutor.run():235
 org.apache.drill.common.SelfCleaningRunnable.run():38
 java.util.concurrent.ThreadPoolExecutor.runWorker():1142
 java.util.concurrent.ThreadPoolExecutor$Worker.run():617
 java.lang.Thread.run():745
   Caused By (org.apache.drill.common.exceptions.ExecutionSetupException) 
 Failure while initializing HiveRecordReader: For input string: 
 __HIVE_DEFAULT_PARTITION__
 org.apache.drill.exec.store.hive.HiveRecordReader.init():241
 org.apache.drill.exec.store.hive.HiveRecordReader.init():138
 org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58
 org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34
 org.apache.drill.exec.physical.impl.ImplCreator$2.run():138
 org.apache.drill.exec.physical.impl.ImplCreator$2.run():136
 java.security.AccessController.doPrivileged():-2
 javax.security.auth.Subject.doAs():422
 org.apache.hadoop.security.UserGroupInformation.doAs():1566
 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():136
 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
 org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():131
 org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
 org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106
 org.apache.drill.exec.physical.impl.ImplCreator.getExec():81
 org.apache.drill.exec.work.fragment.FragmentExecutor.run():235
 org.apache.drill.common.SelfCleaningRunnable.run():38
 java.util.concurrent.ThreadPoolExecutor.runWorker():1142
 java.util.concurrent.ThreadPoolExecutor$Worker.run():617
 java.lang.Thread.run():745
   Caused By (java.lang.NumberFormatException) For input string: 
 __HIVE_DEFAULT_PARTITION__
 java.lang.NumberFormatException.forInputString():65
 java.lang.Integer.parseInt():580
 java.lang.Integer.parseInt():615
 
 org.apache.drill.exec.store.hive.HiveRecordReader.convertPartitionType():605
 org.apache.drill.exec.store.hive.HiveRecordReader.init():236
 org.apache.drill.exec.store.hive.HiveRecordReader.init():138
 org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58
 org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34
 org.apache.drill.exec.physical.impl.ImplCreator$2.run():138
 org.apache.drill.exec.physical.impl.ImplCreator$2.run():136
 java.security.AccessController.doPrivileged():-2
 

[jira] [Assigned] (DRILL-2912) Exception is not propagated correctly in case when directory contains mix of file types

2015-08-05 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid reassigned DRILL-2912:
--

Assignee: Mehant Baid  (was: Steven Phillips)

 Exception is not propagated correctly in case when directory contains mix of 
 file types
 ---

 Key: DRILL-2912
 URL: https://issues.apache.org/jira/browse/DRILL-2912
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow, Storage - JSON, Storage - Parquet
Reporter: Victoria Markman
Assignee: Mehant Baid
 Fix For: 1.2.0


 While trying to read from directory that has a mix of parquet and json files 
 I ran into an exception:
 {code}
 0: jdbc:drill:schema=dfs select max(dir0) from bigtable;
 Query failed: SYSTEM ERROR: Unexpected exception during fragment 
 initialization: Internal error: Error while applying rule 
 DrillPushProjIntoScan, args 
 [rel#4207:LogicalProject.NONE.ANY([]).[](input=rel#4206:Subset#0.ENUMERABLE.ANY([]).[],dir0=$1),
  rel#4198:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, test, 
 bigtable])]
 [72d7f7ee-3045-44d9-b13c-1d03bea4e22c on atsqa4-133.qa.lab:31010]
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}
 The real problem is that directory contains 2 parquet and one json files:
 {code}
 [Wed Apr 29 14:50:58 
 root@/mapr/vmarkman.cluster.com/test/bigtable/F114/2014-03-27 ] # pwd
 /mapr/vmarkman.cluster.com/test/bigtable/F114/2014-03-27
 [Wed Apr 29 14:51:06 
 root@/mapr/vmarkman.cluster.com/test/bigtable/F114/2014-03-27 ] # ls -ltr
 total 2
 -rwxr-xr-x 1 root root 483 Apr 16 16:05 0_0_0.parquet
 -rwxr-xr-x 1 root root 483 Apr 17 13:06 
 214c279334946e65-7e32c56eed93cbc2_1965630551_data.0.parq
 -rw-r--r-- 1 root root  17 Apr 23 15:24 t1.json
 {code}
 drillbit.log
 {code}
 [72d7f7ee-3045-44d9-b13c-1d03bea4e22c on atsqa4-133.qa.lab:31010]
 org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: Unexpected 
 exception during fragment initialization: Internal error: Error while 
 applying rule DrillPushProjIntoScan, args 
 [rel#4207:LogicalProject.NONE.ANY([]).[](input=rel#4206:Subset#0.ENUMERABLE.ANY([]).[],dir0=$1),
  rel#4198:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, test, 
 bigtable])]
 [72d7f7ee-3045-44d9-b13c-1d03bea4e22c on atsqa4-133.qa.lab:31010]
 at 
 org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:465)
  ~[drill-common-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
 at 
 org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:620)
  [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
 at 
 org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:717)
  [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
 at 
 org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:659)
  [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
 at 
 org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
 [drill-common-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
 at 
 org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:661)
  [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
 at 
 org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:762) 
 [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
 at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:212) 
 [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_71]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_71]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
 Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
 exception during fragment initialization: Internal error: Error while 
 applying rule DrillPushProjIntoScan, args 
 [rel#4207:LogicalProject.NONE.ANY([]).[](input=rel#4206:Subset#0.ENUMERABLE.ANY([]).[],dir0=$1),
  rel#4198:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, test, 
 bigtable])]
 ... 4 common frames omitted
 Caused by: java.lang.AssertionError: Internal error: Error while applying 
 rule DrillPushProjIntoScan, args 
 [rel#4207:LogicalProject.NONE.ANY([]).[](input=rel#4206:Subset#0.ENUMERABLE.ANY([]).[],dir0=$1),
  rel#4198:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[dfs, test, 
 bigtable])]
 at org.apache.calcite.util.Util.newInternal(Util.java:743) 
 ~[calcite-core-1.1.0-drill-r2.jar:1.1.0-drill-r2]
 at 
 org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:251)
  

[jira] [Assigned] (DRILL-3535) Drop table support

2015-08-05 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid reassigned DRILL-3535:
--

Assignee: Mehant Baid

 Drop table support
 --

 Key: DRILL-3535
 URL: https://issues.apache.org/jira/browse/DRILL-3535
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Mehant Baid
Assignee: Mehant Baid

 Umbrella JIRA to track support for Drop table feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3593) Reorganize classes that are exposed to storage plugins

2015-08-02 Thread Mehant Baid (JIRA)
Mehant Baid created DRILL-3593:
--

 Summary: Reorganize classes that are exposed to storage plugins
 Key: DRILL-3593
 URL: https://issues.apache.org/jira/browse/DRILL-3593
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


Based on the discussion on DRILL-3500 we want to reorganize some of the 
classes/ interfaces (QueryContext, PlannerSettings, OptimizerRulesContext ...) 
present at planning time and decide what is to be exposed to storage plugin's. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules

2015-08-02 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651033#comment-14651033
 ] 

Mehant Baid commented on DRILL-3500:


I've created DRILL-3593 for the reorg task. 

 Provide additional information while registering storage plugin optimizer 
 rules
 ---

 Key: DRILL-3500
 URL: https://issues.apache.org/jira/browse/DRILL-3500
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


 Currently all the optimizer rules internal to Drill have access to 
 QueryContext. This is used by a few rules like PruneScanRule which invoke the 
 interpreter to perform partition pruning. However the rules that belong to 
 specific storage plugins don't have access to this information. This JIRA 
 aims to do the following
 1. Add a new interface OptimizerRulesContext that will be implemented by 
 QueryContext. It will contain all the information needed by the rules. This 
 context will be passed to the storage plugin method while getting the 
 optimizer rules specific to that storage plugin.
 2. Restrict existing internal rules to only accept OptimizerRulesContext 
 instead of QueryContext so information in QueryContext has better 
 encapsulation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules

2015-08-02 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-3500.

Resolution: Fixed

Fixed in f8197cfe1bc3671aa6878ef9d1869b2fe8e57331

 Provide additional information while registering storage plugin optimizer 
 rules
 ---

 Key: DRILL-3500
 URL: https://issues.apache.org/jira/browse/DRILL-3500
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


 Currently all the optimizer rules internal to Drill have access to 
 QueryContext. This is used by a few rules like PruneScanRule which invoke the 
 interpreter to perform partition pruning. However the rules that belong to 
 specific storage plugins don't have access to this information. This JIRA 
 aims to do the following
 1. Add a new interface OptimizerRulesContext that will be implemented by 
 QueryContext. It will contain all the information needed by the rules. This 
 context will be passed to the storage plugin method while getting the 
 optimizer rules specific to that storage plugin.
 2. Restrict existing internal rules to only accept OptimizerRulesContext 
 instead of QueryContext so information in QueryContext has better 
 encapsulation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3121) Hive partition pruning is not happening

2015-07-25 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3121:
---
Assignee: Aman Sinha  (was: Mehant Baid)

 Hive partition pruning is not happening
 ---

 Key: DRILL-3121
 URL: https://issues.apache.org/jira/browse/DRILL-3121
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow
Affects Versions: 1.0.0
Reporter: Hao Zhu
Assignee: Aman Sinha
Priority: Critical
 Fix For: 1.2.0

 Attachments: DRILL-3121.patch


 Tested on 1.0.0 with below commit id, and hive 0.13.
 {code}
   select * from sys.version;
 +---+++--++
 | commit_id |   
 commit_message   |commit_time | 
 build_email  | build_time |
 +---+++--++
 | d8b19759657698581cc0d01d7038797952888123  | DRILL-3100: 
 TestImpersonationDisabledWithMiniDFS fails on Windows  | 15.05.2015 @ 
 01:18:03 EDT  | Unknown  | 15.05.2015 @ 03:07:10 EDT  |
 +---+++--++
 1 row selected (0.083 seconds)
 {code}
 How to reproduce:
 1. Use hive to create below partition table:
 {code}
 CREATE TABLE partition_table(id INT, username string)
  PARTITIONED BY(year STRING, month STRING)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ,;
 insert into table partition_table PARTITION(year='2014',month='11') select 
 1,'u' from passwords limit 1;
 insert into table partition_table PARTITION(year='2014',month='12') select 
 2,'s' from passwords limit 1;
 insert into table partition_table PARTITION(year='2015',month='01') select 
 3,'e' from passwords limit 1;
 insert into table partition_table PARTITION(year='2015',month='02') select 
 4,'r' from passwords limit 1;
 insert into table partition_table PARTITION(year='2015',month='03') select 
 5,'n' from passwords limit 1;
 {code}
 2. Hive query can do partition pruning for below 2 queries:
 {code}
 hive  explain EXTENDED select * from partition_table where year='2015' and 
 month in ( '02','03') ;
 partition values:
   month 02
   year 2015
 partition values:
   month 03
   year 2015  
 explain EXTENDED select * from partition_table where year='2015' and (month 
 = '02' and month = '03') ;
 partition values:
   month 02
   year 2015
 partition values:
   month 03
   year 2015
 {code}
 Hive only scans 2 partitions -- 2015/02 and 2015/03.
 3. Drill can not do partition pruning for below 2 queries:
 {code}
  explain plan for select * from hive.partition_table where `year`='2015' and 
  `month` in ('02','03');
 +--+--+
 | text | json |
 +--+--+
 | 00-00Screen
 00-01  Project(id=[$0], username=[$1], year=[$2], month=[$3])
 00-02SelectionVectorRemover
 00-03  Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, 
 '03')))])
 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, 
 tableName:partition_table), 
 inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4,
  maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, 
 maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], 
 columns=[`*`], partitions= [Partition(values:[2015, 01]), 
 Partition(values:[2015, 02]), Partition(values:[2015, 03])]]])
  explain plan for select * from hive.partition_table where `year`='2015' and 
  (`month` = '02' and `month` = '03' );
 +--+--+
 | text | json |
 +--+--+
 | 00-00Screen
 00-01  Project(id=[$0], username=[$1], year=[$2], month=[$3])
 00-02SelectionVectorRemover
 00-03  Filter(condition=[AND(=($2, '2015'), =($3, '02'), =($3, 
 '03'))])
 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, 
 tableName:partition_table), 
 inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4,
  maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, 
 maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], 
 columns=[`*`], partitions= [Partition(values:[2015, 01]), 
 Partition(values:[2015, 02]), 

[jira] [Updated] (DRILL-3121) Hive partition pruning is not happening

2015-07-25 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3121:
---
Attachment: DRILL-3121.patch

 Hive partition pruning is not happening
 ---

 Key: DRILL-3121
 URL: https://issues.apache.org/jira/browse/DRILL-3121
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow
Affects Versions: 1.0.0
Reporter: Hao Zhu
Assignee: Mehant Baid
Priority: Critical
 Fix For: 1.2.0

 Attachments: DRILL-3121.patch


 Tested on 1.0.0 with below commit id, and hive 0.13.
 {code}
   select * from sys.version;
 +---+++--++
 | commit_id |   
 commit_message   |commit_time | 
 build_email  | build_time |
 +---+++--++
 | d8b19759657698581cc0d01d7038797952888123  | DRILL-3100: 
 TestImpersonationDisabledWithMiniDFS fails on Windows  | 15.05.2015 @ 
 01:18:03 EDT  | Unknown  | 15.05.2015 @ 03:07:10 EDT  |
 +---+++--++
 1 row selected (0.083 seconds)
 {code}
 How to reproduce:
 1. Use hive to create below partition table:
 {code}
 CREATE TABLE partition_table(id INT, username string)
  PARTITIONED BY(year STRING, month STRING)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ,;
 insert into table partition_table PARTITION(year='2014',month='11') select 
 1,'u' from passwords limit 1;
 insert into table partition_table PARTITION(year='2014',month='12') select 
 2,'s' from passwords limit 1;
 insert into table partition_table PARTITION(year='2015',month='01') select 
 3,'e' from passwords limit 1;
 insert into table partition_table PARTITION(year='2015',month='02') select 
 4,'r' from passwords limit 1;
 insert into table partition_table PARTITION(year='2015',month='03') select 
 5,'n' from passwords limit 1;
 {code}
 2. Hive query can do partition pruning for below 2 queries:
 {code}
 hive  explain EXTENDED select * from partition_table where year='2015' and 
 month in ( '02','03') ;
 partition values:
   month 02
   year 2015
 partition values:
   month 03
   year 2015  
 explain EXTENDED select * from partition_table where year='2015' and (month 
 = '02' and month = '03') ;
 partition values:
   month 02
   year 2015
 partition values:
   month 03
   year 2015
 {code}
 Hive only scans 2 partitions -- 2015/02 and 2015/03.
 3. Drill can not do partition pruning for below 2 queries:
 {code}
  explain plan for select * from hive.partition_table where `year`='2015' and 
  `month` in ('02','03');
 +--+--+
 | text | json |
 +--+--+
 | 00-00Screen
 00-01  Project(id=[$0], username=[$1], year=[$2], month=[$3])
 00-02SelectionVectorRemover
 00-03  Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, 
 '03')))])
 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, 
 tableName:partition_table), 
 inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4,
  maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, 
 maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], 
 columns=[`*`], partitions= [Partition(values:[2015, 01]), 
 Partition(values:[2015, 02]), Partition(values:[2015, 03])]]])
  explain plan for select * from hive.partition_table where `year`='2015' and 
  (`month` = '02' and `month` = '03' );
 +--+--+
 | text | json |
 +--+--+
 | 00-00Screen
 00-01  Project(id=[$0], username=[$1], year=[$2], month=[$3])
 00-02SelectionVectorRemover
 00-03  Filter(condition=[AND(=($2, '2015'), =($3, '02'), =($3, 
 '03'))])
 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, 
 tableName:partition_table), 
 inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4,
  maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, 
 maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], 
 columns=[`*`], partitions= [Partition(values:[2015, 01]), 
 Partition(values:[2015, 02]), Partition(values:[2015, 

[jira] [Commented] (DRILL-3151) ResultSetMetaData not as specified by JDBC (null/dummy value, not /etc.)

2015-07-23 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639279#comment-14639279
 ] 

Mehant Baid commented on DRILL-3151:


+1

 ResultSetMetaData not as specified by JDBC (null/dummy value, not /etc.)
 --

 Key: DRILL-3151
 URL: https://issues.apache.org/jira/browse/DRILL-3151
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Reporter: Daniel Barclay (Drill)
Assignee: Parth Chandra
 Fix For: 1.2.0

 Attachments: DRILL-3151.3.patch.txt


 In Drill's JDBC driver, some ResultSetMetaData methods don't return what JDBC 
 specifies they should return.
 Some cases:
 {{getTableName(int)}}:
 - (JDBC says: {{table name or  if not applicable}})
 - Drill returns {{null}} (instead of empty string or table name)
 - (Drill indicates not applicable even when from named table, e.g., for  
 {{SELECT * FROM INFORMATION_SCHEMA.CATALOGS}}.)
 {{getSchemaName(int)}}:
 - (JDBC says: {{schema name or  if not applicable}})
 - Drill returns {{\-\-UNKNOWN--}} (instead of empty string or schema name)
 - (Drill indicates not applicable even when from named table, e.g., for  
 {{SELECT * FROM INFORMATION_SCHEMA.CATALOGS}}.)
 {{getCatalogName(int)}}:
 - (JDBC says: {{the name of the catalog for the table in which the given 
 column appears or  if not applicable}})
 - Drill returns {{\-\-UNKNOWN--}} (instead of empty string or catalog name)
 - (Drill indicates not applicable even when from named table, e.g., for  
 {{SELECT * FROM INFORMATION_SCHEMA.CATALOGS}}.)
 {{isSearchable(int)}}:
 - (JDBC says:  {{Indicates whether the designated column can be used in a 
 where clause.}})
 - Drill returns {{false}}.
 {{getColumnClassName(int}}:
 - (JDBC says: {{the fully-qualified name of the class in the Java programming 
 language that would be used by the method ResultSet.getObject to retrieve the 
 value in the specified column. This is the class name used for custom 
 mapping.}})
 - Drill returns {{none}} (instead of the correct class name).
 More cases:
 {{getColumnDisplaySize}}
 - (JDBC says (quite ambiguously): {{the normal maximum number of characters 
 allowed as the width of the designated column}})
 - Drill always returns {{10}}!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules

2015-07-21 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636067#comment-14636067
 ] 

Mehant Baid commented on DRILL-3500:


Yep, I was planning on doing that. 

 Provide additional information while registering storage plugin optimizer 
 rules
 ---

 Key: DRILL-3500
 URL: https://issues.apache.org/jira/browse/DRILL-3500
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


 Currently all the optimizer rules internal to Drill have access to 
 QueryContext. This is used by a few rules like PruneScanRule which invoke the 
 interpreter to perform partition pruning. However the rules that belong to 
 specific storage plugins don't have access to this information. This JIRA 
 aims to do the following
 1. Add a new interface OptimizerRulesContext that will be implemented by 
 QueryContext. It will contain all the information needed by the rules. This 
 context will be passed to the storage plugin method while getting the 
 optimizer rules specific to that storage plugin.
 2. Restrict existing internal rules to only accept OptimizerRulesContext 
 instead of QueryContext so information in QueryContext has better 
 encapsulation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3535) Drop table support

2015-07-21 Thread Mehant Baid (JIRA)
Mehant Baid created DRILL-3535:
--

 Summary: Drop table support
 Key: DRILL-3535
 URL: https://issues.apache.org/jira/browse/DRILL-3535
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Mehant Baid


Umbrella JIRA to track support for Drop table feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules

2015-07-21 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635763#comment-14635763
 ] 

Mehant Baid commented on DRILL-3500:


OptimizerRulesContext is essentially an interface added on top of existing 
information present in QueryContext so the name might be a bit misleading and 
can be changed. 

The main motivation behind adding the new interface (OptimizerRulesContext) was 
to enable Hive storage plugin to add a rule to perform interpreter based 
execution for partition pruning. I think Jason also needs this for some of his 
work for reading Hive Parquet files natively. Some information in QueryContext 
is needed to be able to perform this and the two main reasons to add the 
interface were:

1. Better encapsulation, since QueryContext is pretty heavy weight and we add a 
bunch of information to it, this interface would prevent any unnecessary 
information being leaked to the plugin.
2. One common interface exposing all information needed by optimizer rules that 
is common to both storage plugin specific rules and the internal rules. 
Currently in master all the internal optimizer rules (eg: 
[PruneScanRule|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java#L77]
 ) have access to information in QueryContext but storage plugin rules don't. 
This way we provide the same framework to build the rules independent of 
storage plugin.

 Provide additional information while registering storage plugin optimizer 
 rules
 ---

 Key: DRILL-3500
 URL: https://issues.apache.org/jira/browse/DRILL-3500
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


 Currently all the optimizer rules internal to Drill have access to 
 QueryContext. This is used by a few rules like PruneScanRule which invoke the 
 interpreter to perform partition pruning. However the rules that belong to 
 specific storage plugins don't have access to this information. This JIRA 
 aims to do the following
 1. Add a new interface OptimizerRulesContext that will be implemented by 
 QueryContext. It will contain all the information needed by the rules. This 
 context will be passed to the storage plugin method while getting the 
 optimizer rules specific to that storage plugin.
 2. Restrict existing internal rules to only accept OptimizerRulesContext 
 instead of QueryContext so information in QueryContext has better 
 encapsulation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules

2015-07-21 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636053#comment-14636053
 ] 

Mehant Baid commented on DRILL-3500:


PlannerSettings currently mostly contains planner related options. However I 
think it makes sense to consolidate. PlannerSettings will need to keep an 
additional reference to the allocator present in the QueryContext. I will make 
the changes and post a patch.

 Provide additional information while registering storage plugin optimizer 
 rules
 ---

 Key: DRILL-3500
 URL: https://issues.apache.org/jira/browse/DRILL-3500
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


 Currently all the optimizer rules internal to Drill have access to 
 QueryContext. This is used by a few rules like PruneScanRule which invoke the 
 interpreter to perform partition pruning. However the rules that belong to 
 specific storage plugins don't have access to this information. This JIRA 
 aims to do the following
 1. Add a new interface OptimizerRulesContext that will be implemented by 
 QueryContext. It will contain all the information needed by the rules. This 
 context will be passed to the storage plugin method while getting the 
 optimizer rules specific to that storage plugin.
 2. Restrict existing internal rules to only accept OptimizerRulesContext 
 instead of QueryContext so information in QueryContext has better 
 encapsulation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3503) Make PruneScanRule have a pluggable partitioning mechanism

2015-07-20 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3503:
---
Attachment: DRILL-3503_part2.patch
DRILL-3503_part1.patch

First patch is a minor formatting patch generated automatically using IDE. 
Second patch is the actual change.

 Make PruneScanRule have a pluggable partitioning mechanism
 --

 Key: DRILL-3503
 URL: https://issues.apache.org/jira/browse/DRILL-3503
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0

 Attachments: DRILL-3503_part1.patch, DRILL-3503_part2.patch


 Currently PruneScanRule performs partition pruning for file system. Some of 
 the code relies on certain aspects of how partitioning is done in DFS. This 
 JIRA aims to abstract out the behavior of the underlying partition scheme and 
 delegate to the specific storage plugin to get that information. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3503) Make PruneScanRule have a pluggable partitioning mechanism

2015-07-20 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3503:
---
Assignee: Aman Sinha  (was: Mehant Baid)

 Make PruneScanRule have a pluggable partitioning mechanism
 --

 Key: DRILL-3503
 URL: https://issues.apache.org/jira/browse/DRILL-3503
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Aman Sinha
 Fix For: 1.2.0

 Attachments: DRILL-3503_part1.patch, DRILL-3503_part2.patch


 Currently PruneScanRule performs partition pruning for file system. Some of 
 the code relies on certain aspects of how partitioning is done in DFS. This 
 JIRA aims to abstract out the behavior of the underlying partition scheme and 
 delegate to the specific storage plugin to get that information. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3503) Make PruneScanRule have a pluggable partitioning mechanism

2015-07-16 Thread Mehant Baid (JIRA)
Mehant Baid created DRILL-3503:
--

 Summary: Make PruneScanRule have a pluggable partitioning mechanism
 Key: DRILL-3503
 URL: https://issues.apache.org/jira/browse/DRILL-3503
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


Currently PruneScanRule performs partition pruning for file system. Some of the 
code relies on certain aspects of how partitioning is done in DFS. This JIRA 
aims to abstract out the behavior of the underlying partition scheme and 
delegate to the specific storage plugin to get that information. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules

2015-07-16 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629246#comment-14629246
 ] 

Mehant Baid commented on DRILL-3500:


[~jaltekruse] can you please review

 Provide additional information while registering storage plugin optimizer 
 rules
 ---

 Key: DRILL-3500
 URL: https://issues.apache.org/jira/browse/DRILL-3500
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Jason Altekruse
 Fix For: 1.2.0


 Currently all the optimizer rules internal to Drill have access to 
 QueryContext. This is used by a few rules like PruneScanRule which invoke the 
 interpreter to perform partition pruning. However the rules that belong to 
 specific storage plugins don't have access to this information. This JIRA 
 aims to do the following
 1. Add a new interface OptimizerRulesContext that will be implemented by 
 QueryContext. It will contain all the information needed by the rules. This 
 context will be passed to the storage plugin method while getting the 
 optimizer rules specific to that storage plugin.
 2. Restrict existing internal rules to only accept OptimizerRulesContext 
 instead of QueryContext so information in QueryContext has better 
 encapsulation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3500) Provide additional information while registering storage plugin optimizer rules

2015-07-15 Thread Mehant Baid (JIRA)
Mehant Baid created DRILL-3500:
--

 Summary: Provide additional information while registering storage 
plugin optimizer rules
 Key: DRILL-3500
 URL: https://issues.apache.org/jira/browse/DRILL-3500
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0


Currently all the optimizer rules internal to Drill have access to 
QueryContext. This is used by a few rules like PruneScanRule which invoke the 
interpreter to perform partition pruning. However the rules that belong to 
specific storage plugins don't have access to this information. This JIRA aims 
to do the following

1. Add a new interface OptimizerRulesContext that will be implemented by 
QueryContext. It will contain all the information needed by the rules. This 
context will be passed to the storage plugin method while getting the optimizer 
rules specific to that storage plugin.

2. Restrict existing internal rules to only accept OptimizerRulesContext 
instead of QueryContext so information in QueryContext has better encapsulation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2862) Convert_to/Convert_From throw assertion when an incorrect encoding type is specified

2015-07-10 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622873#comment-14622873
 ] 

Mehant Baid commented on DRILL-2862:


+1.

 Convert_to/Convert_From throw assertion when an incorrect encoding type is 
 specified
 

 Key: DRILL-2862
 URL: https://issues.apache.org/jira/browse/DRILL-2862
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Reporter: Neeraja
Assignee: Parth Chandra
 Fix For: 1.2.0

 Attachments: DRILL-2862.2.patch.txt


 Below is the error from SQLLine. Replacing UTF-8 to UTF8 works fine.
 The error message need to accurately represent the problem.
 0: jdbc:drill: select Convert_from(t.address.state,'UTF-8') from customers t 
 limit 10;
 Query failed: AssertionError: 
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3121) Hive partition pruning is not happening

2015-07-09 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3121:
---
Priority: Critical  (was: Major)

 Hive partition pruning is not happening
 ---

 Key: DRILL-3121
 URL: https://issues.apache.org/jira/browse/DRILL-3121
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow
Affects Versions: 1.0.0
Reporter: Hao Zhu
Assignee: Mehant Baid
Priority: Critical
 Fix For: 1.2.0


 Tested on 1.0.0 with below commit id, and hive 0.13.
 {code}
   select * from sys.version;
 +---+++--++
 | commit_id |   
 commit_message   |commit_time | 
 build_email  | build_time |
 +---+++--++
 | d8b19759657698581cc0d01d7038797952888123  | DRILL-3100: 
 TestImpersonationDisabledWithMiniDFS fails on Windows  | 15.05.2015 @ 
 01:18:03 EDT  | Unknown  | 15.05.2015 @ 03:07:10 EDT  |
 +---+++--++
 1 row selected (0.083 seconds)
 {code}
 How to reproduce:
 1. Use hive to create below partition table:
 {code}
 CREATE TABLE partition_table(id INT, username string)
  PARTITIONED BY(year STRING, month STRING)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ,;
 insert into table partition_table PARTITION(year='2014',month='11') select 
 1,'u' from passwords limit 1;
 insert into table partition_table PARTITION(year='2014',month='12') select 
 2,'s' from passwords limit 1;
 insert into table partition_table PARTITION(year='2015',month='01') select 
 3,'e' from passwords limit 1;
 insert into table partition_table PARTITION(year='2015',month='02') select 
 4,'r' from passwords limit 1;
 insert into table partition_table PARTITION(year='2015',month='03') select 
 5,'n' from passwords limit 1;
 {code}
 2. Hive query can do partition pruning for below 2 queries:
 {code}
 hive  explain EXTENDED select * from partition_table where year='2015' and 
 month in ( '02','03') ;
 partition values:
   month 02
   year 2015
 partition values:
   month 03
   year 2015  
 explain EXTENDED select * from partition_table where year='2015' and (month 
 = '02' and month = '03') ;
 partition values:
   month 02
   year 2015
 partition values:
   month 03
   year 2015
 {code}
 Hive only scans 2 partitions -- 2015/02 and 2015/03.
 3. Drill can not do partition pruning for below 2 queries:
 {code}
  explain plan for select * from hive.partition_table where `year`='2015' and 
  `month` in ('02','03');
 +--+--+
 | text | json |
 +--+--+
 | 00-00Screen
 00-01  Project(id=[$0], username=[$1], year=[$2], month=[$3])
 00-02SelectionVectorRemover
 00-03  Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, 
 '03')))])
 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, 
 tableName:partition_table), 
 inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4,
  maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, 
 maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], 
 columns=[`*`], partitions= [Partition(values:[2015, 01]), 
 Partition(values:[2015, 02]), Partition(values:[2015, 03])]]])
  explain plan for select * from hive.partition_table where `year`='2015' and 
  (`month` = '02' and `month` = '03' );
 +--+--+
 | text | json |
 +--+--+
 | 00-00Screen
 00-01  Project(id=[$0], username=[$1], year=[$2], month=[$3])
 00-02SelectionVectorRemover
 00-03  Filter(condition=[AND(=($2, '2015'), =($3, '02'), =($3, 
 '03'))])
 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, 
 tableName:partition_table), 
 inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4,
  maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, 
 maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], 
 columns=[`*`], partitions= [Partition(values:[2015, 01]), 
 Partition(values:[2015, 02]), Partition(values:[2015, 03])]]])
 {code}
 Drill scans 3 

[jira] [Updated] (DRILL-3121) Hive partition pruning is not happening

2015-07-09 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3121:
---
Issue Type: Improvement  (was: Bug)

 Hive partition pruning is not happening
 ---

 Key: DRILL-3121
 URL: https://issues.apache.org/jira/browse/DRILL-3121
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow
Affects Versions: 1.0.0
Reporter: Hao Zhu
Assignee: Mehant Baid
 Fix For: 1.2.0


 Tested on 1.0.0 with below commit id, and hive 0.13.
 {code}
   select * from sys.version;
 +---+++--++
 | commit_id |   
 commit_message   |commit_time | 
 build_email  | build_time |
 +---+++--++
 | d8b19759657698581cc0d01d7038797952888123  | DRILL-3100: 
 TestImpersonationDisabledWithMiniDFS fails on Windows  | 15.05.2015 @ 
 01:18:03 EDT  | Unknown  | 15.05.2015 @ 03:07:10 EDT  |
 +---+++--++
 1 row selected (0.083 seconds)
 {code}
 How to reproduce:
 1. Use hive to create below partition table:
 {code}
 CREATE TABLE partition_table(id INT, username string)
  PARTITIONED BY(year STRING, month STRING)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ,;
 insert into table partition_table PARTITION(year='2014',month='11') select 
 1,'u' from passwords limit 1;
 insert into table partition_table PARTITION(year='2014',month='12') select 
 2,'s' from passwords limit 1;
 insert into table partition_table PARTITION(year='2015',month='01') select 
 3,'e' from passwords limit 1;
 insert into table partition_table PARTITION(year='2015',month='02') select 
 4,'r' from passwords limit 1;
 insert into table partition_table PARTITION(year='2015',month='03') select 
 5,'n' from passwords limit 1;
 {code}
 2. Hive query can do partition pruning for below 2 queries:
 {code}
 hive  explain EXTENDED select * from partition_table where year='2015' and 
 month in ( '02','03') ;
 partition values:
   month 02
   year 2015
 partition values:
   month 03
   year 2015  
 explain EXTENDED select * from partition_table where year='2015' and (month 
 = '02' and month = '03') ;
 partition values:
   month 02
   year 2015
 partition values:
   month 03
   year 2015
 {code}
 Hive only scans 2 partitions -- 2015/02 and 2015/03.
 3. Drill can not do partition pruning for below 2 queries:
 {code}
  explain plan for select * from hive.partition_table where `year`='2015' and 
  `month` in ('02','03');
 +--+--+
 | text | json |
 +--+--+
 | 00-00Screen
 00-01  Project(id=[$0], username=[$1], year=[$2], month=[$3])
 00-02SelectionVectorRemover
 00-03  Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, 
 '03')))])
 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, 
 tableName:partition_table), 
 inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4,
  maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, 
 maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], 
 columns=[`*`], partitions= [Partition(values:[2015, 01]), 
 Partition(values:[2015, 02]), Partition(values:[2015, 03])]]])
  explain plan for select * from hive.partition_table where `year`='2015' and 
  (`month` = '02' and `month` = '03' );
 +--+--+
 | text | json |
 +--+--+
 | 00-00Screen
 00-01  Project(id=[$0], username=[$1], year=[$2], month=[$3])
 00-02SelectionVectorRemover
 00-03  Filter(condition=[AND(=($2, '2015'), =($3, '02'), =($3, 
 '03'))])
 00-04Scan(groupscan=[HiveScan [table=Table(dbName:default, 
 tableName:partition_table), 
 inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/00_0:0+4,
  maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/00_0:0+4, 
 maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/00_0:0+4], 
 columns=[`*`], partitions= [Partition(values:[2015, 01]), 
 Partition(values:[2015, 02]), Partition(values:[2015, 03])]]])
 {code}
 Drill scans 3 partitions -- 2015/01, 2015/02 and 

[jira] [Commented] (DRILL-3334) java.lang.IllegalStateException: Failure while reading vector.: raised when using dynamic schema in JSON

2015-07-08 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619906#comment-14619906
 ] 

Mehant Baid commented on DRILL-3334:


[~hgunes] I don't think HashJoinBatch currently supports any changes in schema 
(join column or non-join column). However it seems like a limitation we can 
most likely overcome.

  java.lang.IllegalStateException: Failure while reading vector.: raised when 
 using dynamic schema in JSON
 --

 Key: DRILL-3334
 URL: https://issues.apache.org/jira/browse/DRILL-3334
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.0.0
 Environment: Single Node running on OSX
 and
 MapR Hadoop SandBox + Drill
Reporter: Tugdual Grall
Assignee: Hanifi Gunes
 Fix For: 1.2.0

 Attachments: test.zip


 I have a simple data set based on 3 JSON documents:
  - 1 customer
  - 2 orders
 (I have attached the document to the JIRA)
 when I do the following query that is a join between order and customers I 
 can raise some unexpected exception.
 A working query:
 {code}
 SELECT customers.id, orders.total
 FROM  dfs.ecommerce.`customers/*.json` customers,
  dfs.ecommerce.`orders/*.json` orders
 WHERE customers.id = orders.cust_id
 AND customers.country = 'FRANCE'
 {code}
 It works since orders.total is present in all orders
 Now when I execute the following query (tax is not present in all document)
 {code}
 SELECT customers.id, orders.tax
 FROM  dfs.ecommerce.`customers/*.json` customers,
  dfs.ecommerce.`orders/*.json` orders
 WHERE customers.id = orders.cust_id
 AND customers.country = 'FRANCE'
 {code}
 Thsi query raise the following exception:
 {code}
 org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
 java.lang.IllegalStateException: Failure while reading vector. Expected 
 vector class of org.apache.drill.exec.vector.NullableIntVector but was 
 holding vector class org.apache.drill.exec.vector.NullableBigIntVector. 
 Fragment 0:0 [Error Id: a7ad300a-4446-41f3-8b1c-4bb7d1dbfb52 on 
 maprdemo:31010]
 {code}
 If you cannot reproduce with tax, you can try with the field:
  orders.cool
 or simply move the tax field from one document to the others.
 (the field must be present in 1 document only)
 It looks like Drill is losing the list of columns present globally.
 Note: if I use a field that does not exist in any document it is working ( 
 orders.this_is_crazy )
 Note: if I use * instead of a projection this raise another exception:
 {code}
 org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
 org.apache.drill.exec.exception.SchemaChangeException: Hash join does not 
 support schema changes Fragment 0:0 [Error Id: 
 0b20d580-37a3-491a-9987-4d04fb6f2d43 on maprdemo:31010]
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3464) Index out of bounds exception while performing concat()

2015-07-07 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3464:
---
Assignee: Jinfeng Ni  (was: Mehant Baid)

 Index out of bounds exception while performing concat()
 ---

 Key: DRILL-3464
 URL: https://issues.apache.org/jira/browse/DRILL-3464
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Jinfeng Ni
 Fix For: 1.2.0

 Attachments: DRILL-3464.patch


 We hit IOOB while performing concat() on a single input in DrillOptiq. Below 
 is the stack trace:
 at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_67]
 at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_67]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.getDrillFunctionFromOptiqCall(DrillOptiq.java:373)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:106)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:77)
  ~[classes/:na]
 at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) ~[classes/:na]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq.toDrill(DrillOptiq.java:74) 
 ~[classes/:na]
 at 
 org.apache.drill.exec.planner.common.DrillProjectRelBase.getProjectExpressions(DrillProjectRelBase.java:111)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.physical.ProjectPrel.getPhysicalOperator(ProjectPrel.java:57)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.physical.ScreenPrel.getPhysicalOperator(ScreenPrel.java:51)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPop(DefaultSqlHandler.java:392)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:167)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178)
  ~[classes/:na]
 at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:903) 
 [classes/:na]
 at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:242) 
 [classes/:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3464) Index out of bounds exception while performing concat()

2015-07-07 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3464:
---
Attachment: DRILL-3464.patch

[~jni] could you please review.

 Index out of bounds exception while performing concat()
 ---

 Key: DRILL-3464
 URL: https://issues.apache.org/jira/browse/DRILL-3464
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.2.0

 Attachments: DRILL-3464.patch


 We hit IOOB while performing concat() on a single input in DrillOptiq. Below 
 is the stack trace:
 at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_67]
 at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_67]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.getDrillFunctionFromOptiqCall(DrillOptiq.java:373)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:106)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:77)
  ~[classes/:na]
 at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) ~[classes/:na]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq.toDrill(DrillOptiq.java:74) 
 ~[classes/:na]
 at 
 org.apache.drill.exec.planner.common.DrillProjectRelBase.getProjectExpressions(DrillProjectRelBase.java:111)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.physical.ProjectPrel.getPhysicalOperator(ProjectPrel.java:57)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.physical.ScreenPrel.getPhysicalOperator(ScreenPrel.java:51)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPop(DefaultSqlHandler.java:392)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:167)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178)
  ~[classes/:na]
 at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:903) 
 [classes/:na]
 at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:242) 
 [classes/:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3464) Index out of bounds exception while performing concat()

2015-07-07 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3464:
---
Attachment: DRILL-3464.patch

 Index out of bounds exception while performing concat()
 ---

 Key: DRILL-3464
 URL: https://issues.apache.org/jira/browse/DRILL-3464
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Jinfeng Ni
 Fix For: 1.2.0

 Attachments: DRILL-3464.patch


 We hit IOOB while performing concat() on a single input in DrillOptiq. Below 
 is the stack trace:
 at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_67]
 at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_67]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.getDrillFunctionFromOptiqCall(DrillOptiq.java:373)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:106)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:77)
  ~[classes/:na]
 at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) ~[classes/:na]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq.toDrill(DrillOptiq.java:74) 
 ~[classes/:na]
 at 
 org.apache.drill.exec.planner.common.DrillProjectRelBase.getProjectExpressions(DrillProjectRelBase.java:111)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.physical.ProjectPrel.getPhysicalOperator(ProjectPrel.java:57)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.physical.ScreenPrel.getPhysicalOperator(ScreenPrel.java:51)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPop(DefaultSqlHandler.java:392)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:167)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178)
  ~[classes/:na]
 at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:903) 
 [classes/:na]
 at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:242) 
 [classes/:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3464) Index out of bounds exception while performing concat()

2015-07-07 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3464:
---
Attachment: (was: DRILL-3464.patch)

 Index out of bounds exception while performing concat()
 ---

 Key: DRILL-3464
 URL: https://issues.apache.org/jira/browse/DRILL-3464
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Jinfeng Ni
 Fix For: 1.2.0

 Attachments: DRILL-3464.patch


 We hit IOOB while performing concat() on a single input in DrillOptiq. Below 
 is the stack trace:
 at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[na:1.7.0_67]
 at java.util.ArrayList.get(ArrayList.java:411) ~[na:1.7.0_67]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.getDrillFunctionFromOptiqCall(DrillOptiq.java:373)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:106)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq$RexToDrill.visitCall(DrillOptiq.java:77)
  ~[classes/:na]
 at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) ~[classes/:na]
 at 
 org.apache.drill.exec.planner.logical.DrillOptiq.toDrill(DrillOptiq.java:74) 
 ~[classes/:na]
 at 
 org.apache.drill.exec.planner.common.DrillProjectRelBase.getProjectExpressions(DrillProjectRelBase.java:111)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.physical.ProjectPrel.getPhysicalOperator(ProjectPrel.java:57)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.physical.ScreenPrel.getPhysicalOperator(ScreenPrel.java:51)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPop(DefaultSqlHandler.java:392)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:167)
  ~[classes/:na]
 at 
 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:178)
  ~[classes/:na]
 at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:903) 
 [classes/:na]
 at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:242) 
 [classes/:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3463) Unit test of project pushdown in TestUnionAll should put more precisely plan attribute in plan verification.

2015-07-07 Thread Mehant Baid (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617563#comment-14617563
 ] 

Mehant Baid commented on DRILL-3463:


Looks good. +1

 Unit test of project pushdown in TestUnionAll should put more precisely plan 
 attribute  in plan verification. 
 --

 Key: DRILL-3463
 URL: https://issues.apache.org/jira/browse/DRILL-3463
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Reporter: Jinfeng Ni
Assignee: Mehant Baid
 Fix For: 1.2.0

 Attachments: 
 0001-DRILL-3463-Unit-test-of-project-pushdown-in-TestUnio.patch


 As part of fix for DRILL-2802, it was discovered that several unit test cases 
 for project pushdown in TestUnionAll did not put the desired plan attributes 
 in to the expected plan result.
 To verify project pushdown is working properly, one simple way is to verify 
 that the the column list in the Scan operator contains the desired columns. 
 This should be the part of plan verification. However, the unit test cases in 
 TestUnionAll did not do that. In stead, it tries to match a pattern of 
 Project -- Scan, which seems not serving the purpose it desired.
 For instance,
 {code}
 final String[] expectedPlan = {UnionAll.*\n. +
 *Project.*\n +
 .*Scan.*\n +
 {code}
 should be replaced by 
 {code}
  final String[] expectedPlan = {UnionAll.*\n. +
  *Project.*\n +
 .*Scan.*columns=\\[`n_comment`, `n_nationkey`, `n_name`\\].*\n 
 {code}
 if we want to verify the column 'n_comment', 'n_nationkey', 'n_name' are 
 pushed into Scan operator.
 To fix this, modify the expected plan result, such that it contains the plan 
 attributes that should be able to verify whether the project pushdown is 
 working or not.
 This will help catch project pushdown failure, and avoid causing more false 
 alarm in plan verification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3056) Numeric literal in an IN list is casted to decimal even when decimal type is disabled

2015-07-06 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-3056.

Resolution: Fixed

Even though the record type indicates Decimal type when the IN list is 
converted we still use double data type.

 Numeric literal in an IN list is casted to decimal even when decimal type is 
 disabled
 -

 Key: DRILL-3056
 URL: https://issues.apache.org/jira/browse/DRILL-3056
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.0.0
Reporter: Victoria Markman
Assignee: Mehant Baid
 Fix For: 1.2.0


 {code}
 0: jdbc:drill:schema=dfs select * from sys.options where name like 
 '%decimal%';
 +++++++++
 |name|kind|type|   status   |  num_val   | string_val 
 |  bool_val  | float_val  |
 +++++++++
 | planner.enable_decimal_data_type | BOOLEAN| SYSTEM | DEFAULT| 
 null   | null   | false  | null   |
 +++++++++
 1 row selected (0.212 seconds)
 {code}
 In list that contains more than 20 numeric literals.
 We are casting number with the decimal point to decimal type even though 
 decimal type is disabled:
 {code}
 0: jdbc:drill:schema=dfs explain plan including all attributes for select * 
 from t1 where a1 in 
 (1,2,3,4,5,6,7,8,9,0,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25.0);
 +++
 |text|json|
 +++
 | 00-00Screen : rowType = RecordType(ANY *): rowcount = 10.0, cumulative 
 cost = {24.0 rows, 158.0 cpu, 0.0 io, 0.0 network, 35.2 memory}, id = 4921
 00-01  Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 10.0, 
 cumulative cost = {23.0 rows, 157.0 cpu, 0.0 io, 0.0 network, 35.2 memory}, 
 id = 4920
 00-02Project(T7¦¦*=[$0]) : rowType = RecordType(ANY T7¦¦*): rowcount 
 = 10.0, cumulative cost = {23.0 rows, 157.0 cpu, 0.0 io, 0.0 network, 35.2 
 memory}, id = 4919
 00-03  HashJoin(condition=[=($2, $3)], joinType=[inner]) : rowType = 
 RecordType(ANY T7¦¦*, ANY a1, ANY a10, DECIMAL(11, 1) ROW_VALUE): rowcount = 
 10.0, cumulative cost = {23.0 rows, 157.0 cpu, 0.0 io, 0.0 network, 35.2 
 memory}, id = 4918
 00-05Project(T7¦¦*=[$0], a1=[$1], a10=[$1]) : rowType = 
 RecordType(ANY T7¦¦*, ANY a1, ANY a10): rowcount = 10.0, cumulative cost = 
 {10.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4915
 00-07  Project(T7¦¦*=[$0], a1=[$1]) : rowType = RecordType(ANY 
 T7¦¦*, ANY a1): rowcount = 10.0, cumulative cost = {10.0 rows, 20.0 cpu, 0.0 
 io, 0.0 network, 0.0 memory}, id = 4914
 00-08Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/subqueries/t1]], 
 selectionRoot=/drill/testdata/subqueries/t1, numFiles=1, columns=[`*`]]]) : 
 rowType = (DrillRecordRow[*, a1]): rowcount = 10.0, cumulative cost = {10.0 
 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4913
 00-04HashAgg(group=[{0}]) : rowType = RecordType(DECIMAL(11, 1) 
 ROW_VALUE): rowcount = 1.0, cumulative cost = {2.0 rows, 9.0 cpu, 0.0 io, 0.0 
 network, 17.6 memory}, id = 4917
 00-06  Values : rowType = RecordType(DECIMAL(11, 1) ROW_VALUE): 
 rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 network, 
 0.0 memory}, id = 4916
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3128) LENGTH(..., CAST(... AS VARCHAR(0) ) ) yields ClassCastException

2015-07-06 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3128:
---
Fix Version/s: (was: 1.2.0)
   1.4.0

 LENGTH(..., CAST(... AS VARCHAR(0) ) ) yields ClassCastException
 

 Key: DRILL-3128
 URL: https://issues.apache.org/jira/browse/DRILL-3128
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Reporter: Daniel Barclay (Drill)
Assignee: Mehant Baid
 Fix For: 1.4.0


 Trying to make a function call with a function name of {{LENGTH}}, with two 
 arguments, and with the second argument being a cast expression having a 
 target type of {{VARCHAR(0)}} yields a {{ClassCastException}} (at least for 
 several cases of source expression):
 {noformat}
 0: jdbc:drill:zk=local SELECT LENGTH(1, CAST('x' AS VARCHAR(0) ) ) FROM 
 INFORMATION_SCHEMA.CATALOGS;
 Error: SYSTEM ERROR: java.lang.ClassCastException: 
 org.apache.drill.common.expression.CastExpression cannot be cast to 
 org.apache.drill.common.expression.ValueExpressions$QuotedString
 [Error Id: 1860730b-b69b-4400-bb2c-935a56aa456e on dev-linux2:31010] 
 (state=,code=0)
 0: jdbc:drill:zk=local SELECT LENGTH(1, CAST(1 AS VARCHAR(0) ) ) FROM 
 INFORMATION_SCHEMA.CATALOGS;
 Error: SYSTEM ERROR: java.lang.ClassCastException: 
 org.apache.drill.common.expression.CastExpression cannot be cast to 
 org.apache.drill.common.expression.ValueExpressions$QuotedString
 [Error Id: 476c4848-4b53-4c1e-9005-2bab3a2a91a4 on dev-linux2:31010] 
 (state=,code=0)
 0: jdbc:drill:zk=local SELECT LENGTH(1, CAST(NULL AS VARCHAR(0) ) ) FROM 
 INFORMATION_SCHEMA.CATALOGS;
 Error: SYSTEM ERROR: java.lang.ClassCastException: 
 org.apache.drill.common.expression.TypedNullConstant cannot be cast to 
 org.apache.drill.common.expression.ValueExpressions$QuotedString
 [Error Id: d888a336-2b18-45d9-a5e8-f4c2406a292e on dev-linux2:31010] 
 (state=,code=0)
 0: jdbc:drill:zk=local 
 {noformat}
 This case (not with {{VARCHAR(0)}}) also yields a {{ClassCastException}}:
 {noformat}
 0: jdbc:drill:zk=local SELECT LENGTH(1, CAST(1 AS VARCHAR(2) ) ) FROM 
 INFORMATION_SCHEMA.CATALOGS;
 Error: SYSTEM ERROR: java.lang.ClassCastException: 
 org.apache.drill.common.expression.CastExpression cannot be cast to 
 org.apache.drill.common.expression.ValueExpressions$QuotedString
 [Error Id: 04bd6cb1-2dd7-4938-ab9b-4d460aaaf05f on dev-linux2:31010] 
 (state=,code=0)
 0: jdbc:drill:zk=local 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-1951) Can't cast numeric value with decimal point read from CSV file into integer data type

2015-07-06 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-1951:
---
Fix Version/s: (was: 1.2.0)
   1.4.0

 Can't cast numeric value with decimal point read from CSV file into integer 
 data type
 -

 Key: DRILL-1951
 URL: https://issues.apache.org/jira/browse/DRILL-1951
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Mehant Baid
 Fix For: 1.4.0


 sales.csv file:
 {code}
 997,Ford,ME350,3000.00, comment#1
 1999,Chevy,Venture,4900.00, comment#2
 1999,Chevy,Venture,5000.00, comment#3
 1996,Jeep,Cherokee,1.01, comment#4
 0: jdbc:drill:schema=dfs select cast(columns[3] as decimal(18,2))  from 
 `sales.csv`;
 ++
 |   EXPR$0   |
 ++
 | 3000.00|
 | 4900.00|
 | 5000.00|
 | 1.01   |
 ++
 4 rows selected (0.093 seconds)
 {code}
 -- Can cast to decimal
 {code}
 0: jdbc:drill:schema=dfs select cast(columns[3] as decimal(18,2))  from 
 `sales.csv`;
 ++
 |   EXPR$0   |
 ++
 | 3000.00|
 | 4900.00|
 | 5000.00|
 | 1.01   |
 ++
 4 rows selected (0.095 seconds)
 {code}
 -- Can cast to float
 {code}
 0: jdbc:drill:schema=dfs select cast(columns[3] as float)  from `sales.csv`;
 ++
 |   EXPR$0   |
 ++
 | 3000.0 |
 | 4900.0 |
 | 5000.0 |
 | 1.01   |
 ++
 4 rows selected (0.112 seconds)
 {code}-- Can't cast to INT/BIGINT
 {code}
 0: jdbc:drill:schema=dfs select cast(columns[3] as bigint)  from `sales.csv`;
 Query failed: Query failed: Failure while running fragment., 3000.00 [ 
 4818451a-c731-48a9-9992-1e81ab1d520d on atsqa4-134.qa.lab:31010 ]
 [ 4818451a-c731-48a9-9992-1e81ab1d520d on atsqa4-134.qa.lab:31010 ]
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}
 -- Same works with json/parquet files
 {code}
 0: jdbc:drill:schema=dfs select a1  from `t1.json`;
 ++
 | a1 |
 ++
 | 10.01  |
 ++
 1 row selected (0.077 seconds)
 0: jdbc:drill:schema=dfs select cast(a1 as int)  from `t1.json`;
 ++
 |   EXPR$0   |
 ++
 | 10 |
 ++
 0: jdbc:drill:schema=dfs select * from test_cast;
 ++
 | a1 |
 ++
 | 10.0100|
 ++
 1 row selected (0.06 seconds)
 0: jdbc:drill:schema=dfs select cast(a1 as int) from test_cast;
 ++
 |   EXPR$0   |
 ++
 | 10 |
 ++
 1 row selected (0.094 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3460) Implement function validation in Drill

2015-07-06 Thread Mehant Baid (JIRA)
Mehant Baid created DRILL-3460:
--

 Summary: Implement function validation in Drill
 Key: DRILL-3460
 URL: https://issues.apache.org/jira/browse/DRILL-3460
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Mehant Baid
Assignee: Mehant Baid
 Fix For: 1.3.0


Since the schema of the table is not known during the validation phase of 
Calcite, Drill ends up skipping most of the validation checks in Calcite. 

This causes certain problems at execution time, for example when we fail 
function resolution or function execution due to incorrect types provided to 
the function. The worst manifestation of this problem is in the case when Drill 
tries to apply implicit casting and produces incorrect results. There are cases 
when its fine the apply the implicit cast but it doesn't make sense for a 
particular function. 

This JIRA is aimed to provide a new approach to be able to perform validation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2860) Unable to cast integer column from parquet file to interval day

2015-07-06 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-2860:
---
Fix Version/s: (was: 1.2.0)
   1.3.0

 Unable to cast integer column from parquet file to interval day
 ---

 Key: DRILL-2860
 URL: https://issues.apache.org/jira/browse/DRILL-2860
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Reporter: Victoria Markman
Assignee: Mehant Baid
 Fix For: 1.3.0

 Attachments: t1.parquet


 I can cast numeric literal to interval day:
 {code}
 0: jdbc:drill:schema=dfs select cast(1 as interval day) from t1;
 ++
 |   EXPR$0   |
 ++
 | P1D|
 | P1D|
 | P1D|
 | P1D|
 | P1D|
 | P1D|
 | P1D|
 | P1D|
 | P1D|
 | P1D|
 ++
 10 rows selected (0.122 seconds)
 {code}
 Get an error when I'm trying to do the same from parquet file:
 {code}
 0: jdbc:drill:schema=dfs select cast(a1 as interval day) from t1 where a1 = 
 1;
 Query failed: SYSTEM ERROR: Invalid format: 1
 Fragment 0:0
 [6a4adf04-f3db-4feb-8010-ebc3bfced1e3 on atsqa4-134.qa.lab:31010]
   (java.lang.IllegalArgumentException) Invalid format: 1
 org.joda.time.format.PeriodFormatter.parseMutablePeriod():326
 org.joda.time.format.PeriodFormatter.parsePeriod():304
 org.joda.time.Period.parse():92
 org.joda.time.Period.parse():81
 org.apache.drill.exec.test.generated.ProjectorGen180.doEval():77
 org.apache.drill.exec.test.generated.ProjectorGen180.projectRecords():62
 
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():170
 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93
 
 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():130
 org.apache.drill.exec.record.AbstractRecordBatch.next():144
 
 org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():118
 org.apache.drill.exec.physical.impl.BaseRootExec.next():74
 
 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
 org.apache.drill.exec.physical.impl.BaseRootExec.next():64
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():198
 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():192
 java.security.AccessController.doPrivileged():-2
 javax.security.auth.Subject.doAs():415
 org.apache.hadoop.security.UserGroupInformation.doAs():1469
 org.apache.drill.exec.work.fragment.FragmentExecutor.run():192
 org.apache.drill.common.SelfCleaningRunnable.run():38
 java.util.concurrent.ThreadPoolExecutor.runWorker():1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run():615
 java.lang.Thread.run():745
 Error: exception while executing query: Failure while executing query. 
 (state=,code=0)
 {code}
 If I try casting a1 to an integer I run into drill-2859



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2456) regexp_replace using hex codes fails on larger JSON data sets

2015-07-06 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-2456:
---
Fix Version/s: (was: 1.2.0)
   1.3.0

 regexp_replace using hex codes fails on larger JSON data sets
 -

 Key: DRILL-2456
 URL: https://issues.apache.org/jira/browse/DRILL-2456
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 0.7.0
 Environment: Drill 0.7
 MapR 4.0.1
 CentOS
Reporter: Andries Engelbrecht
Assignee: Mehant Baid
 Fix For: 1.3.0

 Attachments: drillbit.log


 This query works with only 1 file
 select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id)  from 
 dfs.twitter.`/feed/2015/03/13/17/FlumeData.1426267859699.json` group by 
 `text` order by count(id) desc limit 10;
 This one fails with multiple files
 select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id)  from 
 dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit 
 10;
 Query failed: Query failed: Failure while trying to start remote fragment, 
 Encountered an illegal char on line 1, column 31: '' [ 
 43ff1aa4-4a71-455d-b817-ec5eb8d179bb on twitternode:31010 ]
 Using text in regexp_replace does work for same dataset.
 This query works fine on full data set.
 select regexp_replace(`text`, '[^ -~¡-ÿ]', '°'), count(id)  from 
 dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit 
 10;
 Attached snippet drillbit.log for error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3430) CAST to interval type doesn't accept standard-format strings

2015-07-06 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3430:
---
Fix Version/s: (was: 1.2.0)
   1.3.0

 CAST to interval type doesn't accept standard-format strings
 

 Key: DRILL-3430
 URL: https://issues.apache.org/jira/browse/DRILL-3430
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Reporter: Daniel Barclay (Drill)
Assignee: Mehant Baid
 Fix For: 1.3.0


 Cast specification evaluation is not compliant with the SQL standard.  
 Mainly, it yields errors for standard-format strings that are specified to 
 successfully yield interval values.
 In ISO/IEC 9075-2:2011(E) section 6.13 cast specification, General Rule 19 
 case b says that, in a cast specification casting to an interval type, a 
 character string value that is a valid interval literal (interval 
 literal) or unquoted interval string yields an interval value.
 (interval literal is the INTERVAL '1-6' YEAR TO MONTH syntax; unquoted 
 interval string is the 1-6 syntax.)
 Drill currently rejects both of those syntaxes.  Note the casts to type 
 INTERVAL HOUR and the resulting error messages in the following:
 {noformat}
 0: jdbc:drill:zk=local SELECT CAST( CAST( 'INTERVAL ''1'' HOUR' AS 
 VARCHAR(100) ) AS INTERVAL HOUR) FROM INFORMATION_SCHEMA.CATALOGS;
 Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: INTERVAL '1' 
 HOUR
 Fragment 0:0
 [Error Id: b4bed61a-1efe-4e06-86d4-fff8f9829d50 on dev-linux2:31010] 
 (state=,code=0)
 0: jdbc:drill:zk=local SELECT CAST( CAST( '1' AS VARCHAR(100) ) AS INTERVAL 
 HOUR) FROM INFORMATION_SCHEMA.CATALOGS;
 Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: 1
 Fragment 0:0
 [Error Id: 91dec1ed-5cac-4235-93d7-49a2a0f03a1a on dev-linux2:31010] 
 (state=,code=0)
 0: jdbc:drill:zk=local 
 {noformat}
 (The extra cast to VARCHAR is a workaround for a CHAR-vs.-VARCHAR bug.)
 Drill should accept the standard formats or at least document the 
 non-compliance for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   >