[jira] [Updated] (DRILL-3366) Short circuit of OR expression causes incorrect partitioning

2015-06-24 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3366:
---
Assignee: Jinfeng Ni  (was: Steven Phillips)

> Short circuit of OR expression causes incorrect partitioning
> 
>
> Key: DRILL-3366
> URL: https://issues.apache.org/jira/browse/DRILL-3366
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Reporter: Steven Phillips
>Assignee: Jinfeng Ni
> Attachments: DRILL-3366.patch
>
>
> CTAS partitioning relies on evaluating the expression 
> newPartitionValue(column A) || newPartitionValue(column B) || ..
> to determine if there is a new partition should start. The 
> "newPartitionValue" function returns true if the current value of the 
> expression is different from the previous value. The function holds some 
> state in the workspace (the previous value), and thus needs to be evaluated 
> every time. Short circuit expression evaluation causes this to not be the 
> case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3366) Short circuit of OR expression causes incorrect partitioning

2015-06-24 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3366:
---
Attachment: DRILL-3366.patch

By using a function name different than "OR", even though it maps to the exact 
same function, the short-circuit is not used, so this problem is avoided.

> Short circuit of OR expression causes incorrect partitioning
> 
>
> Key: DRILL-3366
> URL: https://issues.apache.org/jira/browse/DRILL-3366
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Attachments: DRILL-3366.patch
>
>
> CTAS partitioning relies on evaluating the expression 
> newPartitionValue(column A) || newPartitionValue(column B) || ..
> to determine if there is a new partition should start. The 
> "newPartitionValue" function returns true if the current value of the 
> expression is different from the previous value. The function holds some 
> state in the workspace (the previous value), and thus needs to be evaluated 
> every time. Short circuit expression evaluation causes this to not be the 
> case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3366) Short circuit of OR expression causes incorrect partitioning

2015-06-24 Thread Steven Phillips (JIRA)
Steven Phillips created DRILL-3366:
--

 Summary: Short circuit of OR expression causes incorrect 
partitioning
 Key: DRILL-3366
 URL: https://issues.apache.org/jira/browse/DRILL-3366
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Codegen
Reporter: Steven Phillips
Assignee: Steven Phillips


CTAS partitioning relies on evaluating the expression newPartitionValue(column 
A) || newPartitionValue(column B) || ..

to determine if there is a new partition should start. The "newPartitionValue" 
function returns true if the current value of the expression is different from 
the previous value. The function holds some state in the workspace (the 
previous value), and thus needs to be evaluated every time. Short circuit 
expression evaluation causes this to not be the case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3059) Random : Error in parquet record reader

2015-06-24 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3059:
---
Fix Version/s: (was: 1.1.0)
   1.2.0

> Random : Error in parquet record reader
> ---
>
> Key: DRILL-3059
> URL: https://issues.apache.org/jira/browse/DRILL-3059
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
> Fix For: 1.2.0
>
> Attachments: j1.tar.gz, j6.tar.gz
>
>
> Commit # ffbb9c7adc6360744bee186e1f69d47dc743f73e
> Query :
> {code}
> select count(*) from j1 where c_time not in ( select c_time from j6)
> {code}
> Error from the logs
> {code}
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: 
> org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet 
> record reader.
> Message: Failure in setting up reader
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
>   optional binary c_varchar (UTF8);
>   optional int32 c_integer;
>   optional int64 c_bigint;
>   optional float c_float;
>   optional double c_double;
>   optional int32 c_date (DATE);
>   optional int32 c_time (TIME_MILLIS);
>   optional int64 c_timestamp (TIMESTAMP_MILLIS);
>   optional boolean c_boolean;
>   optional double d9;
>   optional double d18;
>   optional double d28;
>   optional double d38;
> }
> , metadata: {}}, blocks: [BlockMetaData{100, 10252 [ColumnMetaData{SNAPPY 
> [c_varchar] BINARY  [BIT_PACKED, RLE, PLAIN], 4}, ColumnMetaData{SNAPPY 
> [c_integer] INT32  [BIT_PACKED, RLE, PLAIN], 446}, ColumnMetaData{SNAPPY 
> [c_bigint] INT64  [BIT_PACKED, RLE, PLAIN], 598}, ColumnMetaData{SNAPPY 
> [c_float] FLOAT  [BIT_PACKED, RLE, PLAIN], 811}, ColumnMetaData{SNAPPY 
> [c_double] DOUBLE  [BIT_PACKED, RLE, PLAIN], 962}, ColumnMetaData{SNAPPY 
> [c_date] INT32  [BIT_PACKED, RLE, PLAIN], 1203}, ColumnMetaData{SNAPPY 
> [c_time] INT32  [BIT_PACKED, RLE, PLAIN], 1344}, ColumnMetaData{SNAPPY 
> [c_timestamp] INT64  [BIT_PACKED, RLE, PLAIN], 1495}, ColumnMetaData{SNAPPY 
> [c_boolean] BOOLEAN  [BIT_PACKED, RLE, PLAIN], 1710}, ColumnMetaData{SNAPPY 
> [d9] DOUBLE  [BIT_PACKED, RLE, PLAIN], 1760}, ColumnMetaData{SNAPPY [d18] 
> DOUBLE  [BIT_PACKED, RLE, PLAIN], 1997}, ColumnMetaData{SNAPPY [d28] DOUBLE  
> [BIT_PACKED, RLE, PLAIN], 2240}, ColumnMetaData{SNAPPY [d38] DOUBLE  
> [BIT_PACKED, RLE, PLAIN], 2482}]}]}
> Fragment 0:0
> [Error Id: 67261cd3-edac-4158-b331-fd37b7f40223 on atsqa6c83.qa.lab:31010]
>   at org.apache.drill.jdbc.DrillCursor.next(DrillCursor.java:161)
>   at 
> net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:137)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:154)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:80)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: org.apache.drill.common.exceptions.DrillRuntimeException: Error in 
> parquet record reader.
> Message: Failure in setting up reader
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
>   optional binary c_varchar (UTF8);
>   optional int32 c_integer;
>   optional int64 c_bigint;
>   optional float c_float;
>   optional double c_double;
>   optional int32 c_date (DATE);
>   optional int32 c_time (TIME_MILLIS);
>   optional int64 c_timestamp (TIMESTAMP_MILLIS);
>   optional boolean c_boolean;
>   optional double d9;
>   optional double d18;
>   optional double d28;
>   optional double d38;
> }
> , metadata: {}}, blocks: [BlockMetaData{100, 10252 [ColumnMetaData{SNAPPY 
> [c_varchar] BINARY  [BIT_PACKED, RLE, PLAIN], 4}, ColumnMetaData{SNAPPY 
> [c_integer] INT32  [BIT_PACKED, RLE, PLAIN], 446}, ColumnMetaData{SNAPPY 
> [c_bigint] INT64  [BIT_PACKED, RLE, PLAIN], 598}, ColumnMetaData{SNAPPY 
> [c_float] FLOAT  [BIT_PACKED, RLE, PLAIN], 811}, ColumnMetaData{SNAPPY 
> [c_double] DOUBLE  [BIT_PACKED, RLE, PLAIN], 962}, ColumnMetaData{SNAPPY 
> [c_date] INT32  [BIT_PACKED, RLE, PLAIN], 1203}, ColumnMetaData{SNAPPY 
> [c_time] INT32  [BIT_PACKED, RLE, PLAIN], 1344}, ColumnMetaData{SNAPPY 
> [c_timestamp] INT64  [BIT_PACKED, RLE, PLAIN], 1495}, ColumnMetaData{SNAPPY 
> [c_boolean] BOOLEAN  [BIT_PACKED, RLE, PLAIN], 1710}, ColumnMetaData{SNAPPY 
> 

[jira] [Updated] (DRILL-2293) CTAS does not clean up when it fails

2015-06-24 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2293:
---
Fix Version/s: (was: 1.1.0)
   1.2.0

> CTAS does not clean up when it fails
> 
>
> Key: DRILL-2293
> URL: https://issues.apache.org/jira/browse/DRILL-2293
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
> Fix For: 1.2.0
>
>
> git.commit.id.abbrev=6676f2d
> Data Set :
> {code}
> {
>   "id" : 1,
>   "map":{"rm": [
> {"mapid":"m1","mapvalue":{"col1":1,"col2":[0,1,2,3,4,5]},"rptd": [{ "a": 
> "foo"},{"b":"boo"}]},
> {"mapid":"m2","mapvalue":{"col1":0,"col2":[]},"rptd": [{ "a": 
> "bar"},{"c":1},{"d":4.5}]}
>   ]}
> }
> {code}
> The below query fails :
> {code}
> create table rep_map as select d.map from `temp.json` d;
> Query failed: Query stopped., index: -4, length: 4 (expected: range(0, 
> 16384)) [ d76e3f74-7e2c-406f-a7fd-5efc68227e75 on qa-node190.qa.lab:31010 ]
> {code}
> However drill created a folder 'rep_map' and the folder contained a broken 
> parquet file. 
> {code}
> create table rep_map as select d.map from `temp.json` d;
> +++
> | ok |  summary   |
> +++
> | false  | Table 'rep_map' already exists. |
> {code}
> Drill should clean up properly in case of a failure.
> I raised a different issue for the actual failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3361) CTAS Auto Partitionning : Fails when we use boolean as the partition type

2015-06-24 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3361:
---
Component/s: (was: Storage - Parquet)
 (was: Query Planning & Optimization)
 Execution - Data Types

> CTAS Auto Partitionning : Fails when we use boolean as the partition type
> -
>
> Key: DRILL-3361
> URL: https://issues.apache.org/jira/browse/DRILL-3361
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Reporter: Rahul Challapalli
>Assignee: Jinfeng Ni
> Fix For: 1.1.0
>
> Attachments: DRILL-3361.patch, error.log
>
>
> git.commit.id.abbrev=5a34d81
> The below query fails :
> {code}
> create table region partition by (r_bool) as select r.*, true r_bool from 
> cp.`tpch/region.parquet` r;
> Error: SYSTEM ERROR: 
> Fragment 0:0
> [Error Id: 0b6baadc-034b-47d2-9edd-f2c8752de571 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> I attached the log file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3361) CTAS Auto Partitionning : Fails when we use boolean as the partition type

2015-06-24 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3361:
---
Attachment: DRILL-3361.patch

> CTAS Auto Partitionning : Fails when we use boolean as the partition type
> -
>
> Key: DRILL-3361
> URL: https://issues.apache.org/jira/browse/DRILL-3361
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Reporter: Rahul Challapalli
>Assignee: Jinfeng Ni
> Fix For: 1.1.0
>
> Attachments: DRILL-3361.patch, error.log
>
>
> git.commit.id.abbrev=5a34d81
> The below query fails :
> {code}
> create table region partition by (r_bool) as select r.*, true r_bool from 
> cp.`tpch/region.parquet` r;
> Error: SYSTEM ERROR: 
> Fragment 0:0
> [Error Id: 0b6baadc-034b-47d2-9edd-f2c8752de571 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> I attached the log file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3361) CTAS Auto Partitionning : Fails when we use boolean as the partition type

2015-06-24 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600255#comment-14600255
 ] 

Steven Phillips commented on DRILL-3361:


There are two problems:

1. There is no implementation of "newPartitionValue" function for the Bit type
This actually shouldn't be a problem, because we can use implicit cast and use 
a type for which there is an implementation. However,

2. The workspace variables don't get properly initialized when an implicit cast 
is used, which results in the NPE. This appears to be a more general problem.

I will filed DRILL-3362 for #2. Solving #1 will fix this issue as it pertains 
to CTAS.

> CTAS Auto Partitionning : Fails when we use boolean as the partition type
> -
>
> Key: DRILL-3361
> URL: https://issues.apache.org/jira/browse/DRILL-3361
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Reporter: Rahul Challapalli
>Assignee: Jinfeng Ni
> Fix For: 1.1.0
>
> Attachments: error.log
>
>
> git.commit.id.abbrev=5a34d81
> The below query fails :
> {code}
> create table region partition by (r_bool) as select r.*, true r_bool from 
> cp.`tpch/region.parquet` r;
> Error: SYSTEM ERROR: 
> Fragment 0:0
> [Error Id: 0b6baadc-034b-47d2-9edd-f2c8752de571 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> I attached the log file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3362) Implicit cast causes workspace variable to not be initialized

2015-06-24 Thread Steven Phillips (JIRA)
Steven Phillips created DRILL-3362:
--

 Summary: Implicit cast causes workspace variable to not be 
initialized
 Key: DRILL-3362
 URL: https://issues.apache.org/jira/browse/DRILL-3362
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Codegen
Reporter: Steven Phillips
Assignee: Chris Westin


In DRILL-3361, the "newPartitionValue" function is missing the implementation 
for Bit type. 

When implicit cast is not needed, the workspace variable is initialized in the 
__DRILL_INIT__() function of the generated class. But when implicit cast is 
added, this initialization does not happen. This results in a NPE.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3361) CTAS Auto Partitionning : Fails when we use boolean as the partition type

2015-06-24 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600170#comment-14600170
 ] 

Steven Phillips commented on DRILL-3361:


Here is the actual stack trace:

org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 

Fragment 0:0

[Error Id: a7205659-1b40-4ad1-95cf-3ce1b9e42234 on localhost:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523)
 ~[drill-common-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:325)
 [drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:181)
 [drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:294)
 [drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_21]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_21]
at java.lang.Thread.run(Thread.java:722) [na:1.7.0_21]
Caused by: java.lang.NullPointerException
at 
org.apache.drill.exec.test.generated.ProjectorGen81.doEval(ProjectorTemplate.java:67)
 ~[na:na]
at 
org.apache.drill.exec.test.generated.ProjectorGen81.projectRecords(ProjectorTemplate.java:62)
 ~[na:na]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:172)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:92)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) 
~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:79)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) 
~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:260)
 ~[drill-java-exec-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1

[jira] [Commented] (DRILL-3333) Add support for auto-partitioning in parquet writer

2015-06-23 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598649#comment-14598649
 ] 

Steven Phillips commented on DRILL-:


Updated reviewboard https://reviews.apache.org/r/35739/


> Add support for auto-partitioning in parquet writer
> ---
>
> Key: DRILL-
> URL: https://issues.apache.org/jira/browse/DRILL-
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Attachments: DRILL-.patch, DRILL-.patch, 
> DRILL-_2015-06-22_15:22:11.patch, DRILL-_2015-06-23_17:38:32.patch
>
>
> When a table is created with a partition by clause, the parquet writer will 
> create separate files for the different partition values. The data will first 
> be sorted by the partition keys, and the parquet writer will create new file 
> when it encounters a new value for the partition columns.
> When data is queried against the data that was created this way, partition 
> pruning will work if the filter contains a partition column. And unlike 
> directory based partitioning, no view is required, nor is it necessary to 
> reference the dir* column names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3333) Add support for auto-partitioning in parquet writer

2015-06-23 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-:
---
Attachment: DRILL-_2015-06-23_17:38:32.patch

> Add support for auto-partitioning in parquet writer
> ---
>
> Key: DRILL-
> URL: https://issues.apache.org/jira/browse/DRILL-
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Attachments: DRILL-.patch, DRILL-.patch, 
> DRILL-_2015-06-22_15:22:11.patch, DRILL-_2015-06-23_17:38:32.patch
>
>
> When a table is created with a partition by clause, the parquet writer will 
> create separate files for the different partition values. The data will first 
> be sorted by the partition keys, and the parquet writer will create new file 
> when it encounters a new value for the partition columns.
> When data is queried against the data that was created this way, partition 
> pruning will work if the filter contains a partition column. And unlike 
> directory based partitioning, no view is required, nor is it necessary to 
> reference the dir* column names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2906) Json reader with extended json adds extra column

2015-06-22 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2906:
---
Fix Version/s: (was: 1.1.0)
   1.2.0

> Json reader with extended json adds extra column
> 
>
> Key: DRILL-2906
> URL: https://issues.apache.org/jira/browse/DRILL-2906
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON, Storage - Writer
>Reporter: Mehant Baid
>Assignee: Steven Phillips
> Fix For: 1.2.0
>
>
> Performing a CTAS with 'store.format' = 'json' and querying the table results 
> in projecting an addition field '*' will null values. Below is a simple repro
> 0: jdbc:drill:zk=local> create table t as select timestamp '1980-10-01 
> 00:00:00' from cp.`employee.json` limit 1;
> ++---+
> |  Fragment  | Number of records written |
> ++---+
> | 0_0| 1 |
> ++---+
> 1 row selected (0.314 seconds)
> 0: jdbc:drill:zk=local> select * from t;
> +++
> |   EXPR$0   | *  |
> +++
> | 1980-10-01 00:00:00.0 | null   |
> +++
> Notice in the above result set we get an extra column '*' with null value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3214) Config option to cast empty string to null does not cast empty string to null

2015-06-22 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3214:
---
Assignee: Sean Hsuan-Yi Chu  (was: Steven Phillips)

> Config option to cast empty string to null does not cast empty string to null
> -
>
> Key: DRILL-3214
> URL: https://issues.apache.org/jira/browse/DRILL-3214
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.0.0
> Environment: faec150598840c40827e6493992d81209aa936da
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.1.0
>
>
> Config option drill.exec.functions.cast_empty_string_to_null does not seem to 
> be working as designed.
> Disable casting of empty strings to null. 
> {code}
> 0: jdbc:drill:schema=dfs.tmp> alter session set 
> `drill.exec.functions.cast_empty_string_to_null` = false;
> +---+--+
> |  ok   | summary  |
> +---+--+
> | true  | drill.exec.functions.cast_empty_string_to_null updated.  |
> +---+--+
> 1 row selected (0.078 seconds)
> {code}
> In this query we see empty strings are retained in query output in columns[1].
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT columns[0], columns[1], columns[2] from 
> `threeColsDouble.csv`;
> +--+-+-+
> |  EXPR$0  | EXPR$1  | EXPR$2  |
> +--+-+-+
> | 156  | 234 | 1   |
> | 2653543  | 434 | 0   |
> | 367345   | 567567  | 23  |
> | 34554| 1234| 45  |
> | 4345 | 567678  | 19876   |
> | 34556| 0   | 1109|
> | 5456 | -1  | 1098|
> | 6567 | | 34534   |
> | 7678 | 1   | 6   |
> | 8798 | 456 | 243 |
> | 265354   | 234 | 123 |
> | 367345   | | 234 |
> | 34554| 1   | 2   |
> | 4345 | 0   | 10  |
> | 34556| -1  | 19  |
> | 5456 | 23423   | 345 |
> | 6567 | 0   | 2348|
> | 7678 | 1   | 2   |
> | 8798 | | 45  |
> | 099  | 19  | 17  |
> +--+-+-+
> 20 rows selected (0.13 seconds)
> {code}
> Casting empty strings to integer leads to NumberFormatException
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT columns[0], cast(columns[1] as int), 
> columns[2] from `threeColsDouble.csv`;
> Error: SYSTEM ERROR: java.lang.NumberFormatException: 
> Fragment 0:0
> [Error Id: b08f4247-263a-460d-b37b-91a70375f7ba on centos-03.qa.lab:31010] 
> (state=,code=0)
> {code}
> Enable casting empty string to null.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> alter session set 
> `drill.exec.functions.cast_empty_string_to_null` = true;
> +---+--+
> |  ok   | summary  |
> +---+--+
> | true  | drill.exec.functions.cast_empty_string_to_null updated.  |
> +---+--+
> 1 row selected (0.077 seconds)
> {code}
> Run query
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT columns[0], cast(columns[1] as int), 
> columns[2] from `threeColsDouble.csv`;
> Error: SYSTEM ERROR: java.lang.NumberFormatException: 
> Fragment 0:0
> [Error Id: de633399-15f9-4a79-a21f-262bd5551207 on centos-03.qa.lab:31010] 
> (state=,code=0)
> {code}
> Note from the output of below query that the empty strings are not casted to 
> null, although drill.exec.functions.cast_empty_string_to_null was set to true.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT columns[0], columns[1], columns[2] from 
> `threeColsDouble.csv`;
> +--+-+-+
> |  EXPR$0  | EXPR$1  | EXPR$2  |
> +--+-+-+
> | 156  | 234 | 1   |
> | 2653543  | 434 | 0   |
> | 367345   | 567567  | 23  |
> | 34554| 1234| 45  |
> | 4345 | 567678  | 19876   |
> | 34556| 0   | 1109|
> | 5456 | -1  | 1098|
> | 6567 | | 34534   |
> | 7678 | 1   | 6   |
> | 8798 | 456 | 243 |
> | 265354   | 234 | 123 |
> | 367345   | | 234 |
> | 34554| 1   | 2   |
> | 4345 | 0   | 10  |
> | 34556| -1  | 19  |
> | 5456 | 23423   | 345 |
> | 6567 | 0   | 2348|
> | 7678 | 1   | 2   |
> | 8798 | | 45  |
> | 099  | 19  | 17  |
> +--+-+-+
> 20 rows selected (0.125 seconds)
> {code}



--
This message

[jira] [Updated] (DRILL-3214) Config option to cast empty string to null does not cast empty string to null

2015-06-22 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3214:
---
Component/s: (was: Storage - Text & CSV)
 Functions - Drill

> Config option to cast empty string to null does not cast empty string to null
> -
>
> Key: DRILL-3214
> URL: https://issues.apache.org/jira/browse/DRILL-3214
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.0.0
> Environment: faec150598840c40827e6493992d81209aa936da
>Reporter: Khurram Faraaz
>Assignee: Steven Phillips
> Fix For: 1.1.0
>
>
> Config option drill.exec.functions.cast_empty_string_to_null does not seem to 
> be working as designed.
> Disable casting of empty strings to null. 
> {code}
> 0: jdbc:drill:schema=dfs.tmp> alter session set 
> `drill.exec.functions.cast_empty_string_to_null` = false;
> +---+--+
> |  ok   | summary  |
> +---+--+
> | true  | drill.exec.functions.cast_empty_string_to_null updated.  |
> +---+--+
> 1 row selected (0.078 seconds)
> {code}
> In this query we see empty strings are retained in query output in columns[1].
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT columns[0], columns[1], columns[2] from 
> `threeColsDouble.csv`;
> +--+-+-+
> |  EXPR$0  | EXPR$1  | EXPR$2  |
> +--+-+-+
> | 156  | 234 | 1   |
> | 2653543  | 434 | 0   |
> | 367345   | 567567  | 23  |
> | 34554| 1234| 45  |
> | 4345 | 567678  | 19876   |
> | 34556| 0   | 1109|
> | 5456 | -1  | 1098|
> | 6567 | | 34534   |
> | 7678 | 1   | 6   |
> | 8798 | 456 | 243 |
> | 265354   | 234 | 123 |
> | 367345   | | 234 |
> | 34554| 1   | 2   |
> | 4345 | 0   | 10  |
> | 34556| -1  | 19  |
> | 5456 | 23423   | 345 |
> | 6567 | 0   | 2348|
> | 7678 | 1   | 2   |
> | 8798 | | 45  |
> | 099  | 19  | 17  |
> +--+-+-+
> 20 rows selected (0.13 seconds)
> {code}
> Casting empty strings to integer leads to NumberFormatException
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT columns[0], cast(columns[1] as int), 
> columns[2] from `threeColsDouble.csv`;
> Error: SYSTEM ERROR: java.lang.NumberFormatException: 
> Fragment 0:0
> [Error Id: b08f4247-263a-460d-b37b-91a70375f7ba on centos-03.qa.lab:31010] 
> (state=,code=0)
> {code}
> Enable casting empty string to null.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> alter session set 
> `drill.exec.functions.cast_empty_string_to_null` = true;
> +---+--+
> |  ok   | summary  |
> +---+--+
> | true  | drill.exec.functions.cast_empty_string_to_null updated.  |
> +---+--+
> 1 row selected (0.077 seconds)
> {code}
> Run query
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT columns[0], cast(columns[1] as int), 
> columns[2] from `threeColsDouble.csv`;
> Error: SYSTEM ERROR: java.lang.NumberFormatException: 
> Fragment 0:0
> [Error Id: de633399-15f9-4a79-a21f-262bd5551207 on centos-03.qa.lab:31010] 
> (state=,code=0)
> {code}
> Note from the output of below query that the empty strings are not casted to 
> null, although drill.exec.functions.cast_empty_string_to_null was set to true.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT columns[0], columns[1], columns[2] from 
> `threeColsDouble.csv`;
> +--+-+-+
> |  EXPR$0  | EXPR$1  | EXPR$2  |
> +--+-+-+
> | 156  | 234 | 1   |
> | 2653543  | 434 | 0   |
> | 367345   | 567567  | 23  |
> | 34554| 1234| 45  |
> | 4345 | 567678  | 19876   |
> | 34556| 0   | 1109|
> | 5456 | -1  | 1098|
> | 6567 | | 34534   |
> | 7678 | 1   | 6   |
> | 8798 | 456 | 243 |
> | 265354   | 234 | 123 |
> | 367345   | | 234 |
> | 34554| 1   | 2   |
> | 4345 | 0   | 10  |
> | 34556| -1  | 19  |
> | 5456 | 23423   | 345 |
> | 6567 | 0   | 2348|
> | 7678 | 1   | 2   |
> | 8798 | | 45  |
> | 099  | 19  | 17  |
> +--+-+-+
> 20 rows selected (0.125 seconds)
>

[jira] [Updated] (DRILL-2873) CTAS reports error when timestamp values in CSV file are quoted

2015-06-22 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2873:
---
Fix Version/s: (was: 1.1.0)
   1.2.0

> CTAS reports error when timestamp values in CSV file are quoted
> ---
>
> Key: DRILL-2873
> URL: https://issues.apache.org/jira/browse/DRILL-2873
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 0.9.0
> Environment: 64e3ec52b93e9331aa5179e040eca19afece8317 | DRILL-2611: 
> value vectors should report valid value count | 16.04.2015 @ 13:53:34 EDT
>Reporter: Khurram Faraaz
>Assignee: Steven Phillips
> Fix For: 1.2.0
>
>
> When timestamp values are quoted in quotes (") inside a CSV data file, CTAS 
> statement reports error.
> Failing CTAS
> {code}
> 0: jdbc:drill:> create table prqFrmCSV02 as select cast(columns[0] as int) 
> col_int, cast(columns[1] as bigint) col_bgint, cast(columns[2] as char(10)) 
> col_char, cast(columns[3] as varchar(18)) col_vchar, cast(columns[4] as 
> timestamp) col_tmstmp, cast(columns[5] as date) col_date, cast(columns[6] as 
> boolean) col_boln, cast(columns[7] as double) col_dbl from `csvToPrq.csv`;
> Query failed: SYSTEM ERROR: Invalid format: ""2015-04-23 23:47:00.124""
> [a601a66a-b305-4a92-9836-f39edcdc8fe8 on centos-02.qa.lab:31010]
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> Stack trace from drillbit.log
> {code}
> 2015-04-24 18:41:09,721 [2ac571ba-778f-f3d5-c60f-af2e536905a3:frag:0:0] ERROR 
> o.a.drill.exec.ops.FragmentContext - Fragment Context received failure -- 
> Fragment: 0:0
> org.apache.drill.common.exceptions.DrillUserException: SYSTEM ERROR: Invalid 
> format: ""2015-04-23 23:47:00.124""
> [a601a66a-b305-4a92-9836-f39edcdc8fe8 on centos-02.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.DrillUserException$Builder.build(DrillUserException.java:115)
>  ~[drill-common-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.common.exceptions.ErrorHelper.wrap(ErrorHelper.java:39) 
> ~[drill-common-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.ops.FragmentContext.fail(FragmentContext.java:151) 
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:131)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:74) 
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:76)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:64) 
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:164)
>  ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPool

[jira] [Updated] (DRILL-2743) Parquet file metadata caching

2015-06-22 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2743:
---
Fix Version/s: (was: 1.1.0)
   1.2.0

> Parquet file metadata caching
> -
>
> Key: DRILL-2743
> URL: https://issues.apache.org/jira/browse/DRILL-2743
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Parquet
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.2.0
>
> Attachments: DRILL-2743.patch, drill.parquet_metadata
>
>
> To run a query against parquet files, we have to first recursively search the 
> directory tree for all of the files, get the block locations for each file, 
> and read the footer from each file, and this is done during the planning 
> phase. When there are many files, this can result in a very large delay in 
> running the query, and it does not scale.
> However, there isn't really any need to read the footers during planning, if 
> we instead treat each parquet file as a single work unit, all we need to know 
> are the block locations for the file, the number of rows, and the columns. We 
> should store only the information which we need for planning in a file 
> located in the top directory for a given parquet table, and then we can delay 
> reading of the footers until execution time, which can be done in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2686) Move writeJson() methods from PhysicalPlanReader to corresponding classes

2015-06-22 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2686:
---
Fix Version/s: (was: 1.1.0)
   1.2.0

> Move writeJson() methods from PhysicalPlanReader to corresponding classes
> -
>
> Key: DRILL-2686
> URL: https://issues.apache.org/jira/browse/DRILL-2686
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 0.7.0
>Reporter: Sudheesh Katkam
>Assignee: Steven Phillips
> Fix For: 1.2.0
>
>
> From Chris's comment https://reviews.apache.org/r/32795/
> It would have been better to have a writeJson(ObjectMapper) method added to 
> each of OptionList, PhysicalOperator, -and ExecutionControls-, and for 
> PhysicalPlanReader just to have a getMapper() that is used to get the 
> argument needed for those. In that form, we don't have to add a new method to 
> PhysicalPlanReader for each thing that we want to add to it. We just get its 
> mapper and write whatever it is to it.
> We'd have
> {code}
> final ObjectMapper mapper = reader.getMapper();
> options.writeJson(mapper);
> executionControls.writeJson(mapper);
> {code}
> So as we add more things to the plan, we don't have to add more methods to 
> it. Each object knows how to write itself, given the mapper. And if we ever 
> need to add them to anything else, that object just needs to expose its 
> mapper in a similar way, rather than having a method per item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3333) Add support for auto-partitioning in parquet writer

2015-06-22 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-:
---
Attachment: DRILL-_2015-06-22_15:22:11.patch

> Add support for auto-partitioning in parquet writer
> ---
>
> Key: DRILL-
> URL: https://issues.apache.org/jira/browse/DRILL-
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Aman Sinha
> Attachments: DRILL-.patch, DRILL-.patch, 
> DRILL-_2015-06-22_15:22:11.patch
>
>
> When a table is created with a partition by clause, the parquet writer will 
> create separate files for the different partition values. The data will first 
> be sorted by the partition keys, and the parquet writer will create new file 
> when it encounters a new value for the partition columns.
> When data is queried against the data that was created this way, partition 
> pruning will work if the filter contains a partition column. And unlike 
> directory based partitioning, no view is required, nor is it necessary to 
> reference the dir* column names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3333) Add support for auto-partitioning in parquet writer

2015-06-22 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596757#comment-14596757
 ] 

Steven Phillips commented on DRILL-:


Updated reviewboard https://reviews.apache.org/r/35739/


> Add support for auto-partitioning in parquet writer
> ---
>
> Key: DRILL-
> URL: https://issues.apache.org/jira/browse/DRILL-
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Aman Sinha
> Attachments: DRILL-.patch, DRILL-.patch, 
> DRILL-_2015-06-22_15:22:11.patch
>
>
> When a table is created with a partition by clause, the parquet writer will 
> create separate files for the different partition values. The data will first 
> be sorted by the partition keys, and the parquet writer will create new file 
> when it encounters a new value for the partition columns.
> When data is queried against the data that was created this way, partition 
> pruning will work if the filter contains a partition column. And unlike 
> directory based partitioning, no view is required, nor is it necessary to 
> reference the dir* column names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3333) Add support for auto-partitioning in parquet writer

2015-06-22 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596464#comment-14596464
 ] 

Steven Phillips commented on DRILL-:


Created reviewboard https://reviews.apache.org/r/35739/


> Add support for auto-partitioning in parquet writer
> ---
>
> Key: DRILL-
> URL: https://issues.apache.org/jira/browse/DRILL-
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Aman Sinha
> Attachments: DRILL-.patch, DRILL-.patch
>
>
> When a table is created with a partition by clause, the parquet writer will 
> create separate files for the different partition values. The data will first 
> be sorted by the partition keys, and the parquet writer will create new file 
> when it encounters a new value for the partition columns.
> When data is queried against the data that was created this way, partition 
> pruning will work if the filter contains a partition column. And unlike 
> directory based partitioning, no view is required, nor is it necessary to 
> reference the dir* column names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3333) Add support for auto-partitioning in parquet writer

2015-06-22 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-:
---
Attachment: DRILL-.patch

> Add support for auto-partitioning in parquet writer
> ---
>
> Key: DRILL-
> URL: https://issues.apache.org/jira/browse/DRILL-
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Aman Sinha
> Attachments: DRILL-.patch, DRILL-.patch
>
>
> When a table is created with a partition by clause, the parquet writer will 
> create separate files for the different partition values. The data will first 
> be sorted by the partition keys, and the parquet writer will create new file 
> when it encounters a new value for the partition columns.
> When data is queried against the data that was created this way, partition 
> pruning will work if the filter contains a partition column. And unlike 
> directory based partitioning, no view is required, nor is it necessary to 
> reference the dir* column names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3333) Add support for auto-partitioning in parquet writer

2015-06-22 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-:
---
Attachment: DRILL-.patch

> Add support for auto-partitioning in parquet writer
> ---
>
> Key: DRILL-
> URL: https://issues.apache.org/jira/browse/DRILL-
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
> Attachments: DRILL-.patch
>
>
> When a table is created with a partition by clause, the parquet writer will 
> create separate files for the different partition values. The data will first 
> be sorted by the partition keys, and the parquet writer will create new file 
> when it encounters a new value for the partition columns.
> When data is queried against the data that was created this way, partition 
> pruning will work if the filter contains a partition column. And unlike 
> directory based partitioning, no view is required, nor is it necessary to 
> reference the dir* column names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3333) Add support for auto-partitioning in parquet writer

2015-06-22 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-:
---
Assignee: Aman Sinha

> Add support for auto-partitioning in parquet writer
> ---
>
> Key: DRILL-
> URL: https://issues.apache.org/jira/browse/DRILL-
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Aman Sinha
> Attachments: DRILL-.patch
>
>
> When a table is created with a partition by clause, the parquet writer will 
> create separate files for the different partition values. The data will first 
> be sorted by the partition keys, and the parquet writer will create new file 
> when it encounters a new value for the partition columns.
> When data is queried against the data that was created this way, partition 
> pruning will work if the filter contains a partition column. And unlike 
> directory based partitioning, no view is required, nor is it necessary to 
> reference the dir* column names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3333) Add support for auto-partitioning in parquet writer

2015-06-22 Thread Steven Phillips (JIRA)
Steven Phillips created DRILL-:
--

 Summary: Add support for auto-partitioning in parquet writer
 Key: DRILL-
 URL: https://issues.apache.org/jira/browse/DRILL-
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips


When a table is created with a partition by clause, the parquet writer will 
create separate files for the different partition values. The data will first 
be sorted by the partition keys, and the parquet writer will create new file 
when it encounters a new value for the partition columns.

When data is queried against the data that was created this way, partition 
pruning will work if the filter contains a partition column. And unlike 
directory based partitioning, no view is required, nor is it necessary to 
reference the dir* column names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3324) CTAS broken with the new auto partition feature ( Not in master yet)

2015-06-19 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593695#comment-14593695
 ] 

Steven Phillips commented on DRILL-3324:


I think we should hold off on filing bugs until the feature is in master. If 
you are working on my branch, just let me know, and I will make sure I address 
it before merging my changes.

> CTAS broken with the new auto partition feature ( Not in master yet)
> 
>
> Key: DRILL-3324
> URL: https://issues.apache.org/jira/browse/DRILL-3324
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Writer
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Blocker
> Fix For: 1.1.0
>
> Attachments: error.log
>
>
> git.commit.id.abbrev=1f02105
> I tried running a simple ctas query from stevens branch 
> (https://github.com/StevenMPhillips/incubator-drill/tree/partitioning3) which 
> contains the auto partition feature, and it failed with an IOOB exception. (
> {code}
> create table l as select l_orderkey, l_linenumber from 
> cp.`tpch/lineitem.parquet`;
> Error: SYSTEM ERROR: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> [Error Id: a6696e99-f1c6-4ee8-abf0-a869a829a0a9 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> This is just plain old ctas without the new auto partition feature which is 
> broken.
> Attached the log file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3301) ILIKE does not support escape characters.

2015-06-16 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588412#comment-14588412
 ] 

Steven Phillips commented on DRILL-3301:


Escape is supported, e.g.:

ilike(a, 'abc#t, '#')

I don't think we are planning on updating the syntax to support the "LIKE" 
syntax, since that probably requires modifying Calcite, and ILIKE is not 
standard SQL.

> ILIKE does not support escape characters.
> -
>
> Key: DRILL-3301
> URL: https://issues.apache.org/jira/browse/DRILL-3301
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Patrick Toole
>
> The like operator properly supports escaping characters. Because the ILIKE is 
> implemented as a function, it does not support escaping.
> The grammar needs to be updated to accept ILIKE in the same locations as LIKE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3246) Query planning support for partition by clause in Drill's CTAS statement

2015-06-05 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575188#comment-14575188
 ] 

Steven Phillips commented on DRILL-3246:


+1

> Query planning support for partition by clause in Drill's CTAS statement
> 
>
> Key: DRILL-3246
> URL: https://issues.apache.org/jira/browse/DRILL-3246
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Affects Versions: 1.0.0
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.1.0
>
>
> We are going to add "PARTITION BY" clause in Drill's CTAS statement. The 
> "PARTITION BY" clause will specify the list of columns out of the result 
> table's column list that will be used to partition the data.  
> CREATE TABLE  table_name  [ (col_name,  ) ]
> [PARTITION BY (col_name, ...)]
> AS SELECT_STATEMENT;
> Semantics restriction for the PARTITION BY clause:
>  -  All the columns in the PARTITION BY clause have to be in the table's 
> column list, or the SELECT_STATEMENT has a * column, when the base table in 
> the SELECT_STATEMENT is schema-less.  Otherwise, an query validation error 
> would be raised.
>  - When the partition column is resolved to * column in a schema-less query, 
> this * column could not be a result of join operation. This restriction is 
> added, since for * out of join operation, query planner would not know which 
> table might produce this partition column. 
> Example :
> {code}
> create table mytable1  partition by (r_regionkey) as 
>   select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable2  partition by (r_regionkey) as 
>   select * from cp.`tpch/region.parquet`
> {code}
> {code}
> create table mytable3  partition by (r_regionkey) as
>   select r.r_regionkey, r.r_name, n.n_nationkey, n.n_name 
>   from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
>   where n.n_regionkey = r.r_regionkey
> {code}
> Invalid case 1: Partition column is not in table's column list. 
> {code}
> create table mytable4  partition by (r_regionkey2) as 
>   select r_regionkey, r_name from cp.`tpch/region.parquet`
> {code}
> Invalid case 2: Partition column is resolved to * out of a join operator.
> {code}
> create table mytable5  partition by (r_regionkey) as
>   select * 
>   from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
>   where n.n_regionkey = r.r_regionkey
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2686) Move writeJson() methods from PhysicalPlanReader to corresponding classes

2015-05-26 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2686:
---
Component/s: (was: Storage - JSON)
 Query Planning & Optimization

> Move writeJson() methods from PhysicalPlanReader to corresponding classes
> -
>
> Key: DRILL-2686
> URL: https://issues.apache.org/jira/browse/DRILL-2686
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 0.7.0
>Reporter: Sudheesh Katkam
>Assignee: Steven Phillips
> Fix For: 1.1.0
>
>
> From Chris's comment https://reviews.apache.org/r/32795/
> It would have been better to have a writeJson(ObjectMapper) method added to 
> each of OptionList, PhysicalOperator, -and ExecutionControls-, and for 
> PhysicalPlanReader just to have a getMapper() that is used to get the 
> argument needed for those. In that form, we don't have to add a new method to 
> PhysicalPlanReader for each thing that we want to add to it. We just get its 
> mapper and write whatever it is to it.
> We'd have
> {code}
> final ObjectMapper mapper = reader.getMapper();
> options.writeJson(mapper);
> executionControls.writeJson(mapper);
> {code}
> So as we add more things to the plan, we don't have to add more methods to 
> it. Each object knows how to write itself, given the mapper. And if we ever 
> need to add them to anything else, that object just needs to expose its 
> mapper in a similar way, rather than having a method per item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3169) gz files cannot be accessed without gz formats extension definition

2015-05-22 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556721#comment-14556721
 ] 

Steven Phillips commented on DRILL-3169:


The way the compression extension works is it is added in addition to the 
format extendsion. So, for example,

googlebooks.tsv.gz should work.

You shouldn't define the compression extension as one of the format extensions. 
I might add code to prevent a user from doing this.

> gz files cannot be accessed without gz formats extension definition
> ---
>
> Key: DRILL-3169
> URL: https://issues.apache.org/jira/browse/DRILL-3169
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: Mac OS X
>Reporter: Kristine Hahn
>
> To reproduce the problem:
> 1. Put a gz file on the file system.
> 2. Define a plugin with and without a gz extension. For example:
> {noformat}
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "file:///",
>   "workspaces": {
> "ngram": {
>   "location": "/Users/khahn/drill/apache-drill-1.0.0",
>   "writable": false,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv",
> "gz"
>   ],
>   "delimiter": "\t"
> }
>   }
> }
> {noformat}
> 3. Try to query the gz file. 
> Expected results: success with and without the gz extension
> Actual results: error without the gz extension defined in formats.
> *Output--no gz extension in formats*
> {noformat}
> 0: jdbc:drill:zk=local> SELECT *  FROM ngram.`/googlebooks.gz`;
> May 22, 2015 6:06:51 AM org.apache.calcite.sql.validate.SqlValidatorException 
> 
> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 
> 'ngram./googlebooks.gz' not found
> May 22, 2015 6:06:51 AM org.apache.calcite.runtime.CalciteException 
> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, 
> column 16 to line 1, column 20: Table 'ngram./googlebooks.gz' not found
> Error: PARSE ERROR: From line 1, column 16 to line 1, column 20: Table 
> 'ngram./googlebooks.gz' not found
> [Error Id: 28f38441-81a0-4167-afad-86a8169d383b on 172.30.1.90:31010] 
> (state=,code=0)
> {noformat}
> *Output with gz extension defined in formats*
> {noformat}
> 0: jdbc:drill:zk=local> SELECT *  FROM ngram.`/googlebooks.gz`;
> +---+
> |columns|
> +---+
> | ["ZOCOR should be taken with","2002","7","5"] |
> | ["ZOCOR should be taken with","2003","12","12"]   |
> . . .
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3139) Query against yelp academic dataset causes exception

2015-05-18 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549486#comment-14549486
 ] 

Steven Phillips commented on DRILL-3139:


That file contains a record which Drill is unable to parse.

If you run the current release candidate, you will see this message:

java.lang.RuntimeException: java.sql.SQLException: DATA_READ ERROR: Error 
parsing JSON - You tried to start when you are using a ValueWriter of type 
NullableBitWriterImpl.

File  /Users/sphillips/yelp/yelp_academic_dataset_business.json
Record  10597
Fragment 0:0

[Error Id: d3f9eb54-970a-4bc8-9bfc-487c27a1d619 on localhost:31010]
at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
at 
sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:85)
at sqlline.TableOutputFormat.print(TableOutputFormat.java:116)
at sqlline.SqlLine.print(SqlLine.java:1583)
at sqlline.Commands.execute(Commands.java:852)
at sqlline.Commands.sql(Commands.java:751)
at sqlline.SqlLine.dispatch(SqlLine.java:738)
at sqlline.SqlLine.begin(SqlLine.java:612)
at sqlline.SqlLine.start(SqlLine.java:366)
at sqlline.SqlLine.main(SqlLine.java:259)

The error message could still be improved, but at least now it shows which 
record is failing to parse.

> Query against yelp academic dataset causes exception
> 
>
> Key: DRILL-3139
> URL: https://issues.apache.org/jira/browse/DRILL-3139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 0.9.0
> Environment: OSX
>Reporter: Chris Westin
>Assignee: Daniel Barclay (Drill)
>
> I was following along the tutorial for "Analyzing the Yelp Academic Dataset."
> I tried the first query "Querying Yelp Business Data" WITHOUT the limit 
> clause:
> select * from
>  
> dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json`;
> (Adjust for your own download path).
> A bunch of data comes out, followed by an exception:
> "Dietary Restrictions":{}} | business   | null  |
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
> query.
>   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>   at sqlline.SqlLine.print(SqlLine.java:1809)
>   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>   at sqlline.SqlLine.dispatch(SqlLine.java:889)
>   at sqlline.SqlLine.begin(SqlLine.java:763)
>   at sqlline.SqlLine.start(SqlLine.java:498)
>   at sqlline.SqlLine.main(SqlLine.java:460)
> 0: jdbc:drill:zk=local>
> Note that this was tried against the 0.9.0 tarball referred to in the 
> tutorials and download links.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3123) Dir0 has issues when we have a '/' at the beginning of the path

2015-05-17 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547397#comment-14547397
 ] 

Steven Phillips commented on DRILL-3123:


I think this is kind of a strange query, and the partition column feature was 
not really designed to be used this way. I am inclined to return no "directory" 
columns at all when doing globbing.

> Dir0 has issues when we have a '/' at the beginning of the path
> ---
>
> Key: DRILL-3123
> URL: https://issues.apache.org/jira/browse/DRILL-3123
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.0.0
>Reporter: Rahul Challapalli
>Assignee: Daniel Barclay (Drill)
>
> Follow the below steps :
> {code}
> hadoop fs -mkdir /drill/testdata/repro1/20150120
> hadoop fs -mkdir /drill/testdata/repro1/20150121
> 1. Add the below workspace :
> "repro1": {
>   "location": "/drill/testdata/repro1",
>   "writable": true,
>   "defaultInputFormat": "parquet"
> }
> 2. Now copy a sample json file into both the above directories
> {code}
> The below query returns incorrect results :
> {code}
> select * from dfs.repro1.`/*/sample.json` limit 1;
> +---+-+---+-+--+
> |   dir0|  dir1   |   dir2| id  | val  |
> +---+-+---+-+--+
> | testdata  | repro1  | 20150121  | 1   | 1|
> +---+-+---+-+--+
> {code}
> The same query worked from an older build (commit # 
> d10769f478900ff1868d206086874bdd67a45e7d)
> {code}
> select * from dfs.repro1.`/*/sample.json` limit 1;
> ++++
> |dir0| id |val |
> ++++
> | 20150121   | 1  | 1  |
> ++++
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3099) FileSelection's selectionRoot does not include the scheme and authority

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3099:
---
Fix Version/s: 1.0.0

> FileSelection's selectionRoot does not include the scheme and authority
> ---
>
> Key: DRILL-3099
> URL: https://issues.apache.org/jira/browse/DRILL-3099
> Project: Apache Drill
>  Issue Type: Bug
> Environment: This will result in erroneous partition columns if the 
> original root URI include these.
> This also results in {{TestDirectoryExplorerUDFs}} to fail on Windows.
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3085) In ExternalSortBatch, Memory Leak in Runtime Generation Code

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3085:
---
Fix Version/s: 1.0.0

> In ExternalSortBatch, Memory Leak in Runtime Generation Code
> 
>
> Key: DRILL-3085
> URL: https://issues.apache.org/jira/browse/DRILL-3085
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
>Priority: Critical
> Fix For: 1.0.0
>
>
> This case is related to DRILL-3065.
> In ExternalSortBatch, we have an MSorter to do sorting thing. In this class, 
> there are two SelectionVector4, vector4 and aux. If we fail at the time just 
> after either gets new memory, the close() method would fail to clean their 
> allocated memory properly.
> To reproduce this problem, inject an exception at the last step of 
> MSortTemplate.setup()
> Detailed Information:
> 1. Query: 
> select n_name from cp.`tpch/nation.parquet` order by n_name
> *. Using this query alone cannot help reproduce the issue. We still need to 
> inject the exception at the right place.
> 2. Data:
> cp.`tpch/nation.parquet`
> 3. Log:
> java.lang.IllegalStateException: Failure while closing accountor.  Expected 
> private and shared pools to be set to initial values.  However, one or more 
> were not.  Stats are
>   zoneinitallocated   delta 
>   private 0   0   0 
>   shared  3221225472  3195686243  25539229.
>   at 
> org.apache.drill.exec.memory.AtomicRemainder.close(AtomicRemainder.java:200)
>   at org.apache.drill.exec.memory.Accountor.close(Accountor.java:386)
>   at 
> org.apache.drill.exec.memory.TopLevelAllocator.close(TopLevelAllocator.java:175)
>   at 
> org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:75)
>   at com.google.common.io.Closeables.close(Closeables.java:77)
>   at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
>   at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:292)
>   at org.apache.drill.BaseTestQuery.closeClient(BaseTestQuery.java:238)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> Exception in thread "Drillbit-ShutdownHook#0" java.lang.RuntimeException: 
> Caught exception closing Drillbit started from
> org.apache.drill.common.StackTrace.:36
> org.apache.drill.exec.server.Drillbit.run:250
> org.apache.drill.BaseTestQuery.openClient:180
> org.apache.drill.BaseTestQuery.setupDefaultTestCluster:116
> sun.reflect.NativeMethodAccessorImpl.invoke0:-2
> sun.reflect.NativeMethodAccessorImpl.invoke:57
> sun.reflect.DelegatingMethodAccessorImpl.invoke:43
> java.lang.reflect.Method.invoke:606
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall:47
> org.junit.internal.runners.model.ReflectiveCallable.run:12
> org.junit.runners.model.FrameworkMethod.invokeExplosively:44
> org.junit.internal.runners.statements.RunBefores.evaluate:24
> org.junit.internal.runners.statements.RunAfters.evaluate:27
> org.junit.runners.ParentRunner.run:309
> org.junit.runner.JUnitCore.run:160
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs:74
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart:211
> com.intellij.rt.execution.junit.JUnitStarter.main:67
> sun.reflect.NativeMethodAccessorImpl.invoke0:-2
> sun.reflect.NativeMethodAccessorImpl.invoke:57
> sun.reflect.DelegatingMethodAccessorImpl.invoke:43
> java.lang.reflect.Method.invoke:606
>   at 
> org.apache.drill.exec.s

[jira] [Updated] (DRILL-3089) Revert to 4 forked test and allow override from command line

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3089:
---
Fix Version/s: 1.0.0

> Revert to 4 forked test and allow override from command line
> 
>
> Key: DRILL-3089
> URL: https://issues.apache.org/jira/browse/DRILL-3089
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
> Fix For: 1.0.0
>
> Attachments: 
> 0001-DRILL-3089-Revert-to-4-forked-test-and-allow-overrid.patch
>
>
> The current, one forked test per CPU core, can be quite resource intensive on 
> modern machines with 8-16 cores (See discussion on DRILL-2039).
> We should revert to 4 forked test by default and allow overriding this from 
> command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3098) Set Unix style "line.separator" for tests

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3098:
---
Fix Version/s: 1.0.0

> Set Unix style "line.separator" for tests
> -
>
> Key: DRILL-3098
> URL: https://issues.apache.org/jira/browse/DRILL-3098
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
> Fix For: 1.0.0
>
>
> Both Calcite and Jackson's Object mapper uses this to format JSON and SQL. If 
> left to platform setting, some tests break on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3100) TestImpersonationDisabledWithMiniDFS fails on Windows

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3100:
---
Fix Version/s: 1.0.0

> TestImpersonationDisabledWithMiniDFS fails on Windows
> -
>
> Key: DRILL-3100
> URL: https://issues.apache.org/jira/browse/DRILL-3100
> Project: Apache Drill
>  Issue Type: Bug
> Environment: {noformat}
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> java.lang.IllegalArgumentException: Pathname 
> /Q:/git/apache-drill/exec/java-exec/target/1431653578758-0 from 
> hdfs://127.0.0.1:30538/Q:/git/apache-drill/exec/java-exec/target/1431653578758-0
>  is not a valid DFS filename.
> [Error Id: 4f100f1c-4071-4ef0-8b77-ea5c605f7d76 on 127.0.0.1:31013]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118)
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:111)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:1)
>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:218)
>   at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:1)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> {noformat}
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3093) Leaking RawBatchBuffer

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3093:
---
Fix Version/s: 1.0.0

> Leaking RawBatchBuffer
> --
>
> Key: DRILL-3093
> URL: https://issues.apache.org/jira/browse/DRILL-3093
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
> Attachments: DRILL-3093.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3093) Leaking RawBatchBuffer

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3093.

Resolution: Fixed

fixed in 7f575df

> Leaking RawBatchBuffer
> --
>
> Key: DRILL-3093
> URL: https://issues.apache.org/jira/browse/DRILL-3093
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Steven Phillips
> Attachments: DRILL-3093.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3063) TestQueriesOnLargeFile leaks memory with 16M limit

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3063.

Resolution: Fixed

fixedi n e58a306

> TestQueriesOnLargeFile leaks memory with 16M limit
> --
>
> Key: DRILL-3063
> URL: https://issues.apache.org/jira/browse/DRILL-3063
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Chris Westin
>Assignee: Chris Westin
>Priority: Critical
> Fix For: 1.0.0
>
>
> I ran the TestQueriesOnLargeFile unit test with a limited memory environment, 
> limiting direct memory to 16M. At the end of the test, the shutdown hook 
> reports a memory leak.
> Here is the test launch configuration:
> -Xms512m
> -Xmx3g
> -Ddrill.exec.http.enabled=false
> -Ddrill.exec.sys.store.provider.local.write=false
> -Dorg.apache.drill.exec.server.Drillbit.system_options="org.apache.drill.exec.compile.ClassTransformer.scalar_replacement=on"
> -XX:MaxPermSize=256M -XX:MaxDirectMemorySize=3072M
> -XX:+CMSClassUnloadingEnabled -ea
> -Ddrill.exec.memory.top.max=16777216
> Here's what I see at the end:
>   at 
> org.apache.drill.exec.server.Drillbit$ShutdownThread.run(Drillbit.java:333)
> Caused by: java.lang.IllegalStateException: Failure while closing accountor.  
> Expected private and shared pools to be set to initial values.  However, one 
> or more were not.  Stats are
>   zoneinitallocated   delta 
>   private 0   0   0 
>   shared  1677721613777216300.
>   at 
> org.apache.drill.exec.memory.AtomicRemainder.close(AtomicRemainder.java:200)
>   at org.apache.drill.exec.memory.Accountor.close(Accountor.java:386)
>   at 
> org.apache.drill.exec.memory.TopLevelAllocator.close(TopLevelAllocator.java:171)
>   at 
> org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:75)
>   at com.google.common.io.Closeables.close(Closeables.java:77)
>   at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
>   at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:292)
>   at 
> org.apache.drill.exec.server.Drillbit$ShutdownThread.run(Drillbit.java:330)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3066) AtomicRemainder - Tried to close remainder, but it has already been closed.

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3066.

Resolution: Fixed

fixed in aaf9fb8

> AtomicRemainder - Tried to close remainder, but it has already been closed.
> ---
>
> Key: DRILL-3066
> URL: https://issues.apache.org/jira/browse/DRILL-3066
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.0.0
> Environment: 21cc578b6b8c8f3ca1ebffd3dbb92e35d68bc726 
>Reporter: Khurram Faraaz
>Assignee: Sudheesh Katkam
>Priority: Minor
> Fix For: 1.0.0
>
>
> I see the below stack trace in drillbit.log when I try query a corrupt 
> parquet file. Test was run on 4 node cluster on CentOS.
> AtomicRemainder - Tried to close remainder, but it has already been closed.
> {code}
> 2015-05-13 20:42:58,893 [2aac48ac-82d3-0f5a-2bac-537e82b3ac02:frag:0:0] WARN  
> o.a.d.exec.memory.AtomicRemainder - Tried to close remainder, but it has 
> already been closed
> java.lang.Exception: null
> at 
> org.apache.drill.exec.memory.AtomicRemainder.close(AtomicRemainder.java:196) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at org.apache.drill.exec.memory.Accountor.close(Accountor.java:386) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.TopLevelAllocator$ChildAllocator.close(TopLevelAllocator.java:310)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.ops.FragmentContext.suppressingClose(FragmentContext.java:405)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.ops.FragmentContext.close(FragmentContext.java:399) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:312)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cancel(FragmentExecutor.java:135)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.QueryManager.cancelExecutingFragments(QueryManager.java:202)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:836)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:780)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
> [drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:782)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:891) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.access$2700(Foreman.java:107) 
> [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateListener.moveToState(Foreman.java:1161)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:481)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:461)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:90)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:86)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:291)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:255)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 

[jira] [Resolved] (DRILL-3089) Revert to 4 forked test and allow override from command line

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3089.

Resolution: Fixed

fixed in 7c78244

> Revert to 4 forked test and allow override from command line
> 
>
> Key: DRILL-3089
> URL: https://issues.apache.org/jira/browse/DRILL-3089
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
> Attachments: 
> 0001-DRILL-3089-Revert-to-4-forked-test-and-allow-overrid.patch
>
>
> The current, one forked test per CPU core, can be quite resource intensive on 
> modern machines with 8-16 cores (See discussion on DRILL-2039).
> We should revert to 4 forked test by default and allow overriding this from 
> command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3098) Set Unix style "line.separator" for tests

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3098.

Resolution: Fixed

Resolved in 984ee01

> Set Unix style "line.separator" for tests
> -
>
> Key: DRILL-3098
> URL: https://issues.apache.org/jira/browse/DRILL-3098
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>
> Both Calcite and Jackson's Object mapper uses this to format JSON and SQL. If 
> left to platform setting, some tests break on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3099) FileSelection's selectionRoot does not include the scheme and authority

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3099.

Resolution: Fixed

36ff259

> FileSelection's selectionRoot does not include the scheme and authority
> ---
>
> Key: DRILL-3099
> URL: https://issues.apache.org/jira/browse/DRILL-3099
> Project: Apache Drill
>  Issue Type: Bug
> Environment: This will result in erroneous partition columns if the 
> original root URI include these.
> This also results in {{TestDirectoryExplorerUDFs}} to fail on Windows.
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3100) TestImpersonationDisabledWithMiniDFS fails on Windows

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3100.

Resolution: Fixed

Resolved in d8b1975

> TestImpersonationDisabledWithMiniDFS fails on Windows
> -
>
> Key: DRILL-3100
> URL: https://issues.apache.org/jira/browse/DRILL-3100
> Project: Apache Drill
>  Issue Type: Bug
> Environment: {noformat}
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> java.lang.IllegalArgumentException: Pathname 
> /Q:/git/apache-drill/exec/java-exec/target/1431653578758-0 from 
> hdfs://127.0.0.1:30538/Q:/git/apache-drill/exec/java-exec/target/1431653578758-0
>  is not a valid DFS filename.
> [Error Id: 4f100f1c-4071-4ef0-8b77-ea5c605f7d76 on 127.0.0.1:31013]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118)
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:111)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:1)
>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:218)
>   at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:1)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> {noformat}
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3093) Leaking RawBatchBuffer

2015-05-14 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544764#comment-14544764
 ] 

Steven Phillips commented on DRILL-3093:


+1

> Leaking RawBatchBuffer
> --
>
> Key: DRILL-3093
> URL: https://issues.apache.org/jira/browse/DRILL-3093
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Steven Phillips
> Attachments: DRILL-3093.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2875) IllegalStateException when querying the public yelp json dataset

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2875:
---
Assignee: Jacques Nadeau  (was: Steven Phillips)

> IllegalStateException when querying the public yelp json dataset
> 
>
> Key: DRILL-2875
> URL: https://issues.apache.org/jira/browse/DRILL-2875
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Reporter: Rahul Challapalli
>Assignee: Jacques Nadeau
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: DRILL-2875.patch, error.log
>
>
> git.commit.id.abbrev=5cd36c5
> The below query fails from sqlline after displaying a few results
> {code}
>  select attributes from 
> `json_kvgenflatten/yelp_academic_dataset_business.json`;
> ...after displaying fefw records .
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
> query.
>   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>   at sqlline.SqlLine.print(SqlLine.java:1809)
>   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>   at sqlline.SqlLine.dispatch(SqlLine.java:889)
>   at sqlline.SqlLine.begin(SqlLine.java:763)
>   at sqlline.SqlLine.start(SqlLine.java:498)
>   at sqlline.SqlLine.main(SqlLine.java:460)
> {code}
> I attached the error log and the data set. Let me know if you need anything 
> else



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3088) IllegalStateException: Cleanup before finished. 0 out of 1 strams have finished

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3088:
---
Fix Version/s: 1.0.0

> IllegalStateException: Cleanup before finished. 0 out of 1 strams have 
> finished
> ---
>
> Key: DRILL-3088
> URL: https://issues.apache.org/jira/browse/DRILL-3088
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Rahul Challapalli
>Assignee: Mehant Baid
> Fix For: 1.0.0
>
> Attachments: DRILL-3088.patch, error.log, j2.tar.gz, j6.tar.gz
>
>
> git.commit.id.abbrev=d10769f
> Query :
> {code}
> select * from j2 where c_bigint not in ( select cast(c_integer as bigint) 
> from j6 ) and c_varchar not in ( '', '', '', '0008 397933 38800', 
> ' 00 0') and c_boolean in ( 'true' ) and c_date not in ( select 
> distinct c_date from j6)
> {code}
> Error from the logs :
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> java.lang.IllegalStateException: Cleanup before finished. 0 out of 1 strams 
> have finished
> Fragment 5:30
> [Error Id: 593d62dd-f509-4c22-ba5b-8d11cf85ecc0 on atsqa6c85.qa.lab:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:522)
>  ~[drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:315)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:283)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_51]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
>   at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> Caused by: java.lang.IllegalStateException: Cleanup before finished. 0 out of 
> 1 strams have finished
>   at 
> org.apache.drill.exec.work.batch.BaseRawBatchBuffer.cleanup(BaseRawBatchBuffer.java:116)
>  ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.close(UnorderedReceiverBatch.java:217)
>  ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:122) 
> ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:333)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:278)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   ... 4 common frames omitted
> {code}
> Attached the dataset and more information from the logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3088) IllegalStateException: Cleanup before finished. 0 out of 1 strams have finished

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3088:
---
Assignee: Mehant Baid  (was: Steven Phillips)

> IllegalStateException: Cleanup before finished. 0 out of 1 strams have 
> finished
> ---
>
> Key: DRILL-3088
> URL: https://issues.apache.org/jira/browse/DRILL-3088
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Rahul Challapalli
>Assignee: Mehant Baid
> Attachments: DRILL-3088.patch, error.log, j2.tar.gz, j6.tar.gz
>
>
> git.commit.id.abbrev=d10769f
> Query :
> {code}
> select * from j2 where c_bigint not in ( select cast(c_integer as bigint) 
> from j6 ) and c_varchar not in ( '', '', '', '0008 397933 38800', 
> ' 00 0') and c_boolean in ( 'true' ) and c_date not in ( select 
> distinct c_date from j6)
> {code}
> Error from the logs :
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> java.lang.IllegalStateException: Cleanup before finished. 0 out of 1 strams 
> have finished
> Fragment 5:30
> [Error Id: 593d62dd-f509-4c22-ba5b-8d11cf85ecc0 on atsqa6c85.qa.lab:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:522)
>  ~[drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:315)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:283)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_51]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
>   at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> Caused by: java.lang.IllegalStateException: Cleanup before finished. 0 out of 
> 1 strams have finished
>   at 
> org.apache.drill.exec.work.batch.BaseRawBatchBuffer.cleanup(BaseRawBatchBuffer.java:116)
>  ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.close(UnorderedReceiverBatch.java:217)
>  ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:122) 
> ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:333)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:278)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   ... 4 common frames omitted
> {code}
> Attached the dataset and more information from the logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3088) IllegalStateException: Cleanup before finished. 0 out of 1 strams have finished

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3088:
---
Attachment: DRILL-3088.patch

> IllegalStateException: Cleanup before finished. 0 out of 1 strams have 
> finished
> ---
>
> Key: DRILL-3088
> URL: https://issues.apache.org/jira/browse/DRILL-3088
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
> Attachments: DRILL-3088.patch, error.log, j2.tar.gz, j6.tar.gz
>
>
> git.commit.id.abbrev=d10769f
> Query :
> {code}
> select * from j2 where c_bigint not in ( select cast(c_integer as bigint) 
> from j6 ) and c_varchar not in ( '', '', '', '0008 397933 38800', 
> ' 00 0') and c_boolean in ( 'true' ) and c_date not in ( select 
> distinct c_date from j6)
> {code}
> Error from the logs :
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> java.lang.IllegalStateException: Cleanup before finished. 0 out of 1 strams 
> have finished
> Fragment 5:30
> [Error Id: 593d62dd-f509-4c22-ba5b-8d11cf85ecc0 on atsqa6c85.qa.lab:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:522)
>  ~[drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:315)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:283)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_51]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
>   at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> Caused by: java.lang.IllegalStateException: Cleanup before finished. 0 out of 
> 1 strams have finished
>   at 
> org.apache.drill.exec.work.batch.BaseRawBatchBuffer.cleanup(BaseRawBatchBuffer.java:116)
>  ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.close(UnorderedReceiverBatch.java:217)
>  ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:122) 
> ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:333)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:278)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   ... 4 common frames omitted
> {code}
> Attached the dataset and more information from the logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3088) IllegalStateException: Cleanup before finished. 0 out of 1 strams have finished

2015-05-14 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544538#comment-14544538
 ] 

Steven Phillips commented on DRILL-3088:


Created reviewboard https://reviews.apache.org/r/34239/


> IllegalStateException: Cleanup before finished. 0 out of 1 strams have 
> finished
> ---
>
> Key: DRILL-3088
> URL: https://issues.apache.org/jira/browse/DRILL-3088
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
> Attachments: DRILL-3088.patch, error.log, j2.tar.gz, j6.tar.gz
>
>
> git.commit.id.abbrev=d10769f
> Query :
> {code}
> select * from j2 where c_bigint not in ( select cast(c_integer as bigint) 
> from j6 ) and c_varchar not in ( '', '', '', '0008 397933 38800', 
> ' 00 0') and c_boolean in ( 'true' ) and c_date not in ( select 
> distinct c_date from j6)
> {code}
> Error from the logs :
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> java.lang.IllegalStateException: Cleanup before finished. 0 out of 1 strams 
> have finished
> Fragment 5:30
> [Error Id: 593d62dd-f509-4c22-ba5b-8d11cf85ecc0 on atsqa6c85.qa.lab:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:522)
>  ~[drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:315)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:283)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_51]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
>   at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> Caused by: java.lang.IllegalStateException: Cleanup before finished. 0 out of 
> 1 strams have finished
>   at 
> org.apache.drill.exec.work.batch.BaseRawBatchBuffer.cleanup(BaseRawBatchBuffer.java:116)
>  ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.close(UnorderedReceiverBatch.java:217)
>  ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:122) 
> ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:333)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:278)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   ... 4 common frames omitted
> {code}
> Attached the dataset and more information from the logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3088) IllegalStateException: Cleanup before finished. 0 out of 1 strams have finished

2015-05-14 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544522#comment-14544522
 ] 

Steven Phillips commented on DRILL-3088:


The NestedLoopJoin operator does not kill upstream or consume the remaining 
batches on the left side when it reaches the end of the right side.

> IllegalStateException: Cleanup before finished. 0 out of 1 strams have 
> finished
> ---
>
> Key: DRILL-3088
> URL: https://issues.apache.org/jira/browse/DRILL-3088
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
> Attachments: error.log, j2.tar.gz, j6.tar.gz
>
>
> git.commit.id.abbrev=d10769f
> Query :
> {code}
> select * from j2 where c_bigint not in ( select cast(c_integer as bigint) 
> from j6 ) and c_varchar not in ( '', '', '', '0008 397933 38800', 
> ' 00 0') and c_boolean in ( 'true' ) and c_date not in ( select 
> distinct c_date from j6)
> {code}
> Error from the logs :
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> java.lang.IllegalStateException: Cleanup before finished. 0 out of 1 strams 
> have finished
> Fragment 5:30
> [Error Id: 593d62dd-f509-4c22-ba5b-8d11cf85ecc0 on atsqa6c85.qa.lab:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:522)
>  ~[drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:315)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:283)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_51]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
>   at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> Caused by: java.lang.IllegalStateException: Cleanup before finished. 0 out of 
> 1 strams have finished
>   at 
> org.apache.drill.exec.work.batch.BaseRawBatchBuffer.cleanup(BaseRawBatchBuffer.java:116)
>  ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.close(UnorderedReceiverBatch.java:217)
>  ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.close(BaseRootExec.java:122) 
> ~[drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:333)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:278)
>  [drill-java-exec-1.0.0-SNAPSHOT-rebuffed.jar:1.0.0-SNAPSHOT]
>   ... 4 common frames omitted
> {code}
> Attached the dataset and more information from the logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2780) java.lang.IllegalStateException files open exceptions in drillbit.out

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-2780.

Resolution: Fixed

Fixed in ed200e2

> java.lang.IllegalStateException files open exceptions in drillbit.out
> -
>
> Key: DRILL-2780
> URL: https://issues.apache.org/jira/browse/DRILL-2780
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>
> There are many stacktraces like this in drillbit.out 
> {code}
> Exception in thread "Thread-2" java.lang.IllegalStateException: Not all files 
> opened using this FileSystem are closed. There are still [84] files open.
> File '/drill/testdata/tpch100/parquet/lineitem/part-m-00045.parquet' opened 
> at callstack:
> 
> org.apache.drill.exec.store.parquet.columnreaders.PageReader.(PageReader.java:105)
> 
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.(ColumnReader.java:88)
> 
> org.apache.drill.exec.store.parquet.columnreaders.VarLengthColumn.(VarLengthColumn.java:39)
> 
> org.apache.drill.exec.store.parquet.columnreaders.VarLengthValuesColumn.(VarLengthValuesColumn.java:43)
> 
> org.apache.drill.exec.store.parquet.columnreaders.NullableVarLengthValuesColumn.(NullableVarLengthValuesColumn.java:39)
> 
> org.apache.drill.exec.store.parquet.columnreaders.VarLengthColumnReaders$NullableVarBinaryColumn.(VarLengthColumnReaders.java:303)
> 
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory.getReader(ColumnReaderFactory.java:176)
> 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.setup(ParquetRecordReader.java:303)
> 
> org.apache.drill.exec.physical.impl.ScanBatch.(ScanBatch.java:100)
> 
> org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:157)
> 
> org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:56)
> 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62)
> 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39)
> 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitSubScan(AbstractPhysicalVisitor.java:126)
> 
> org.apache.drill.exec.store.parquet.ParquetRowGroupScan.accept(ParquetRowGroupScan.java:107)
> 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74)
> 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62)
> 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39)
> 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitIteratorValidator(AbstractPhysicalVisitor.java:214)
> 
> org.apache.drill.exec.physical.config.IteratorValidator.accept(IteratorValidator.java:34)
> 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74)
> 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62)
> 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39)
> {code}
> We should not be getting any exceptions in drillbit.out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (DRILL-3049) Increase sort spooling threshold

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reopened DRILL-3049:


> Increase sort spooling threshold
> 
>
> Key: DRILL-3049
> URL: https://issues.apache.org/jira/browse/DRILL-3049
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3049) Increase sort spooling threshold

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3049.

Resolution: Fixed

> Increase sort spooling threshold
> 
>
> Key: DRILL-3049
> URL: https://issues.apache.org/jira/browse/DRILL-3049
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (DRILL-3051) Integer overflow in TimedRunnable

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reopened DRILL-3051:


> Integer overflow in TimedRunnable
> -
>
> Key: DRILL-3051
> URL: https://issues.apache.org/jira/browse/DRILL-3051
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>
> This can cause the timeout to become negative. Causes query to fail.
> Only see this when querying a large number of files (e.g. ~150K)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3050) Increase query context max memory

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3050.

Resolution: Fixed

> Increase query context max memory
> -
>
> Key: DRILL-3050
> URL: https://issues.apache.org/jira/browse/DRILL-3050
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (DRILL-3050) Increase query context max memory

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reopened DRILL-3050:


> Increase query context max memory
> -
>
> Key: DRILL-3050
> URL: https://issues.apache.org/jira/browse/DRILL-3050
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3051) Integer overflow in TimedRunnable

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3051.

Resolution: Fixed

> Integer overflow in TimedRunnable
> -
>
> Key: DRILL-3051
> URL: https://issues.apache.org/jira/browse/DRILL-3051
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>
> This can cause the timeout to become negative. Causes query to fail.
> Only see this when querying a large number of files (e.g. ~150K)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3048) Disable assertions by default

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3048.

Resolution: Fixed

Fixed in 20b3688

> Disable assertions by default
> -
>
> Key: DRILL-3048
> URL: https://issues.apache.org/jira/browse/DRILL-3048
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3049) Increase sort spooling threshold

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3049.

Resolution: Pending Closed

01a36f1

> Increase sort spooling threshold
> 
>
> Key: DRILL-3049
> URL: https://issues.apache.org/jira/browse/DRILL-3049
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3050) Increase query context max memory

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3050.

Resolution: Pending Closed

Fixed in b3d097b

> Increase query context max memory
> -
>
> Key: DRILL-3050
> URL: https://issues.apache.org/jira/browse/DRILL-3050
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3051) Integer overflow in TimedRunnable

2015-05-14 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-3051.

Resolution: Pending Closed

Fixed in 83d8ebe

> Integer overflow in TimedRunnable
> -
>
> Key: DRILL-3051
> URL: https://issues.apache.org/jira/browse/DRILL-3051
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>
> This can cause the timeout to become negative. Causes query to fail.
> Only see this when querying a large number of files (e.g. ~150K)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3084) Add drill-* convenience methods for common cli startup commands

2015-05-14 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544156#comment-14544156
 ] 

Steven Phillips commented on DRILL-3084:


I think we should call "drill-conf" simply "drill".

> Add drill-* convenience methods for common cli startup commands
> ---
>
> Key: DRILL-3084
> URL: https://issues.apache.org/jira/browse/DRILL-3084
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.0.0
>
> Attachments: DRILL-3084.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2875) IllegalStateException when querying the public yelp json dataset

2015-05-13 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2875:
---
Attachment: DRILL-2875.patch

> IllegalStateException when querying the public yelp json dataset
> 
>
> Key: DRILL-2875
> URL: https://issues.apache.org/jira/browse/DRILL-2875
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: DRILL-2875.patch, error.log
>
>
> git.commit.id.abbrev=5cd36c5
> The below query fails from sqlline after displaying a few results
> {code}
>  select attributes from 
> `json_kvgenflatten/yelp_academic_dataset_business.json`;
> ...after displaying fefw records .
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
> query.
>   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>   at sqlline.SqlLine.print(SqlLine.java:1809)
>   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>   at sqlline.SqlLine.dispatch(SqlLine.java:889)
>   at sqlline.SqlLine.begin(SqlLine.java:763)
>   at sqlline.SqlLine.start(SqlLine.java:498)
>   at sqlline.SqlLine.main(SqlLine.java:460)
> {code}
> I attached the error log and the data set. Let me know if you need anything 
> else



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2875) IllegalStateException when querying the public yelp json dataset

2015-05-13 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543125#comment-14543125
 ] 

Steven Phillips commented on DRILL-2875:


Record number 10597 is this:

{"business_id": "2jXXBLPA6Qk1j6vOUXV9sQ", "full_address": "365 Convention 
Center Dr\nEastside\nLas Vegas, NV 89109", "hours": {}, "open": false, 
"categories": ["Nightlife"], "city": "Las Vegas", "review_count": 7, "name": 
"The Beach", "neighborhoods": ["Eastside"], "longitude": -115.155494, "state": 
"NV", "stars": 3.5, "latitude": 36.13176390003, "attributes": {"Accepts 
Credit Cards": {}, "Music": {"dj": true, "background_music": false, "jukebox": 
false, "live": true, "video": false, "karaoke": false}, "Alcohol": "full_bar"}, 
"type": "business"}

Note specifically: "Accepts Credit Cards": {}

In the other records in this file, the "Accepts Credit Cards" field is a 
boolean, but here it is an empty map. 

It would be helpful to have a more helpful message here. I have a patch that 
will at least give the correct record number, as right now it displays the 
number relative to the current batch, which isn't helpful for finding the 
record in the file.

> IllegalStateException when querying the public yelp json dataset
> 
>
> Key: DRILL-2875
> URL: https://issues.apache.org/jira/browse/DRILL-2875
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: error.log
>
>
> git.commit.id.abbrev=5cd36c5
> The below query fails from sqlline after displaying a few results
> {code}
>  select attributes from 
> `json_kvgenflatten/yelp_academic_dataset_business.json`;
> ...after displaying fefw records .
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
> query.
>   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>   at sqlline.SqlLine.print(SqlLine.java:1809)
>   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>   at sqlline.SqlLine.dispatch(SqlLine.java:889)
>   at sqlline.SqlLine.begin(SqlLine.java:763)
>   at sqlline.SqlLine.start(SqlLine.java:498)
>   at sqlline.SqlLine.main(SqlLine.java:460)
> {code}
> I attached the error log and the data set. Let me know if you need anything 
> else



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2875) IllegalStateException when querying the public yelp json dataset

2015-05-13 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2875:
---
Priority: Critical  (was: Major)

> IllegalStateException when querying the public yelp json dataset
> 
>
> Key: DRILL-2875
> URL: https://issues.apache.org/jira/browse/DRILL-2875
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: error.log
>
>
> git.commit.id.abbrev=5cd36c5
> The below query fails from sqlline after displaying a few results
> {code}
>  select attributes from 
> `json_kvgenflatten/yelp_academic_dataset_business.json`;
> ...after displaying fefw records .
> java.lang.RuntimeException: java.sql.SQLException: Failure while executing 
> query.
>   at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>   at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>   at sqlline.SqlLine.print(SqlLine.java:1809)
>   at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>   at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>   at sqlline.SqlLine.dispatch(SqlLine.java:889)
>   at sqlline.SqlLine.begin(SqlLine.java:763)
>   at sqlline.SqlLine.start(SqlLine.java:498)
>   at sqlline.SqlLine.main(SqlLine.java:460)
> {code}
> I attached the error log and the data set. Let me know if you need anything 
> else



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3069) Wrong result for aggregate query with filter on SF100

2015-05-13 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543025#comment-14543025
 ] 

Steven Phillips commented on DRILL-3069:


+1

> Wrong result for aggregate query with filter  on SF100 
> ---
>
> Key: DRILL-3069
> URL: https://issues.apache.org/jira/browse/DRILL-3069
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.0.0
>Reporter: Aman Sinha
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
> Attachments: DRILL-3069.patch
>
>
> Wrong result on TPCH sf100: 
> {code}
> 0: jdbc:drill:zk=10.10.103.32:5181> select max(l_suppkey) from lineitem where 
> l_suppkey = 3872;
> ++
> |   EXPR$0   |
> ++
> | 991683 |
> ++
> 1 row selected
> {code}
> Plan looks correct: 
> {code}
> +++
> |text|json|
> +++
> | 00-00Screen
> 00-01  StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 00-02UnionExchange
> 01-01  StreamAgg(group=[{}], EXPR$0=[MAX($0)])
> 01-02Filter(condition=[=($0, 3872)])
> 01-03  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:/data/parquet/tpch/scale100/lineitem]], 
> selectionRoot=/data/parquet/tpch/scale100/lineitem, numFiles=1, 
> columns=[`l_suppkey`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3051) Integer overflow in TimedRunnable

2015-05-12 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3051:
---
Summary: Integer overflow in TimedRunnable  (was: Integer overflow in query 
time runnable)

> Integer overflow in TimedRunnable
> -
>
> Key: DRILL-3051
> URL: https://issues.apache.org/jira/browse/DRILL-3051
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>
> This can cause the timeout to become negative. Causes query to fail.
> Only see this when querying a large number of files (e.g. ~150K)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3048) Disable assertions by default

2015-05-12 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3048:
---
Fix Version/s: 1.0.0

> Disable assertions by default
> -
>
> Key: DRILL-3048
> URL: https://issues.apache.org/jira/browse/DRILL-3048
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3049) Increase sort spooling threshold

2015-05-12 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reassigned DRILL-3049:
--

Assignee: Steven Phillips

> Increase sort spooling threshold
> 
>
> Key: DRILL-3049
> URL: https://issues.apache.org/jira/browse/DRILL-3049
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3050) Increase query context max memory

2015-05-12 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3050:
---
Fix Version/s: 1.0.0

> Increase query context max memory
> -
>
> Key: DRILL-3050
> URL: https://issues.apache.org/jira/browse/DRILL-3050
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3050) Increase query context max memory

2015-05-12 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reassigned DRILL-3050:
--

Assignee: Steven Phillips

> Increase query context max memory
> -
>
> Key: DRILL-3050
> URL: https://issues.apache.org/jira/browse/DRILL-3050
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3048) Disable assertions by default

2015-05-12 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reassigned DRILL-3048:
--

Assignee: Steven Phillips

> Disable assertions by default
> -
>
> Key: DRILL-3048
> URL: https://issues.apache.org/jira/browse/DRILL-3048
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3049) Increase sort spooling threshold

2015-05-12 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3049:
---
Fix Version/s: 1.0.0

> Increase sort spooling threshold
> 
>
> Key: DRILL-3049
> URL: https://issues.apache.org/jira/browse/DRILL-3049
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3051) Integer overflow in query time runnable

2015-05-12 Thread Steven Phillips (JIRA)
Steven Phillips created DRILL-3051:
--

 Summary: Integer overflow in query time runnable
 Key: DRILL-3051
 URL: https://issues.apache.org/jira/browse/DRILL-3051
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips


This can cause the timeout to become negative. Causes query to fail.

Only see this when querying a large number of files (e.g. ~150K)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3051) Integer overflow in query time runnable

2015-05-12 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reassigned DRILL-3051:
--

Assignee: Steven Phillips

> Integer overflow in query time runnable
> ---
>
> Key: DRILL-3051
> URL: https://issues.apache.org/jira/browse/DRILL-3051
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>
> This can cause the timeout to become negative. Causes query to fail.
> Only see this when querying a large number of files (e.g. ~150K)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3051) Integer overflow in query time runnable

2015-05-12 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-3051:
---
Fix Version/s: 1.0.0

> Integer overflow in query time runnable
> ---
>
> Key: DRILL-3051
> URL: https://issues.apache.org/jira/browse/DRILL-3051
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>
> This can cause the timeout to become negative. Causes query to fail.
> Only see this when querying a large number of files (e.g. ~150K)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3050) Increase query context max memory

2015-05-12 Thread Steven Phillips (JIRA)
Steven Phillips created DRILL-3050:
--

 Summary: Increase query context max memory
 Key: DRILL-3050
 URL: https://issues.apache.org/jira/browse/DRILL-3050
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3048) Disable assertions by default

2015-05-12 Thread Steven Phillips (JIRA)
Steven Phillips created DRILL-3048:
--

 Summary: Disable assertions by default
 Key: DRILL-3048
 URL: https://issues.apache.org/jira/browse/DRILL-3048
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3049) Increase sort spooling threshold

2015-05-12 Thread Steven Phillips (JIRA)
Steven Phillips created DRILL-3049:
--

 Summary: Increase sort spooling threshold
 Key: DRILL-3049
 URL: https://issues.apache.org/jira/browse/DRILL-3049
 Project: Apache Drill
  Issue Type: Bug
Reporter: Steven Phillips






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2936) TPCH 4 and 18 SF100 hangs when hash agg is turned off

2015-05-12 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2936:
---
Attachment: DRILL-2936_2015-05-12_17:34:27.patch

> TPCH 4 and 18 SF100 hangs when hash agg is turned off
> -
>
> Key: DRILL-2936
> URL: https://issues.apache.org/jira/browse/DRILL-2936
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Jacques Nadeau
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: DRILL-2936.patch, DRILL-2936_2015-05-12_17:34:27.patch, 
> Screen Shot 2015-05-01 at 2.40.36 PM.png
>
>
> sys options:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.memory.max_query_memory_per_node` = 29205777612;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashjoin`=false;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashagg`=false;
> {code}
> On executing TPCH 04 query hangs. From the profiles page does not look like 
> any fragments are making progress, the last progress time stamps were 
> sometime back. 
> Attached is the logical plan. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2936) TPCH 4 and 18 SF100 hangs when hash agg is turned off

2015-05-12 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541104#comment-14541104
 ] 

Steven Phillips commented on DRILL-2936:


Updated reviewboard https://reviews.apache.org/r/34037/


> TPCH 4 and 18 SF100 hangs when hash agg is turned off
> -
>
> Key: DRILL-2936
> URL: https://issues.apache.org/jira/browse/DRILL-2936
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Jacques Nadeau
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: DRILL-2936.patch, DRILL-2936_2015-05-12_17:34:27.patch, 
> Screen Shot 2015-05-01 at 2.40.36 PM.png
>
>
> sys options:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.memory.max_query_memory_per_node` = 29205777612;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashjoin`=false;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashagg`=false;
> {code}
> On executing TPCH 04 query hangs. From the profiles page does not look like 
> any fragments are making progress, the last progress time stamps were 
> sometime back. 
> Attached is the logical plan. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2957) Netty Memory Manager doesn't move empty chunks between lists

2015-05-11 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538676#comment-14538676
 ] 

Steven Phillips commented on DRILL-2957:


What's the explanation for why this is not a problem?

> Netty Memory Manager doesn't move empty chunks between lists
> 
>
> Key: DRILL-2957
> URL: https://issues.apache.org/jira/browse/DRILL-2957
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
>Priority: Critical
> Fix For: 1.0.0
>
>
> I'm seeing a pattern in the memory allocator and I need you to take a look at 
> it.  Here are the basic concepts:
> 1) We use an extension of PooledByteBufAllocator [1] called 
> PooledByteBufAllocatorL.
> 2) We use many Direct Arenas (generally one per core)
> 3) Each arena has chunk lists for different occupancies (chunks that are 
> empty, chunks 25% full, chunks 50% full, etc) [2]
> 4) Each of these chunk lists maintains a list of chunks.  The chunks move 
> from list to list as they get more or less full.
> 5) When no memory is being used, chunks move back to the empty list.
> 6) If there are excessive empty chunks, they are released back to the OS. (I 
> don't remember the exact trigger here and I'm only seeing this sometimes 
> right now.)
> We're running on Netty 4.0.27.  
> What I'm seeing is that we don't seem to be moving the chunks back to the 
> empty list as they are vacated.  You can see an example output from my memory 
> logging [3] that is enabled by [4].  I haven't replicated this at small scale 
> but at large scale I see it consistently (30 node cluster, large group by 
> query [5]).
> I want to understand this behavior better, determine if it is a bug or not 
> and determine whether or not this hurts memory for subsequent queries.
> One other note, Netty will cache small amounts of memory that is allocated 
> and released on the same thread for that thread.  I don't believe this is a 
> large amount of memory but be aware of it. It should be possible to control 
> this using these settings [6].
> [1] 
> https://github.com/netty/netty/blob/master/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java
> [2] 
> https://github.com/netty/netty/blob/master/buffer/src/main/java/io/netty/buffer/PoolArena.java#L67
> [3] Memory log output at idle after large query (one example arena out of 32 
> on perf cluster, see logs on those nodes for more info):
> ::snip::
> Chunk(s) at 0~25%:
> none
> Chunk(s) at 0~50%:
> Chunk(62194b16: 0%, 0/16777216)
> Chunk(35983868: 1%, 8192/16777216)
> Chunk(5bbfb16a: 1%, 163840/16777216)
> Chunk(1c6d277e: 1%, 8192/16777216)
> Chunk(2897b6bf: 2%, 204800/16777216)
> Chunk(287d5c71: 0%, 0/16777216)
> Chunk(s) at 25~75%:
> Chunk(61bad0ee: 0%, 0/16777216)
> Chunk(s) at 50~100%:
> Chunk(2d79a032: 0%, 0/16777216)
> Chunk(42415f4e: 0%, 0/16777216)
> Chunk(33a3bade: 0%, 0/16777216)
> Chunk(1ce7ca63: 0%, 0/16777216)
> Chunk(531e1888: 0%, 0/16777216)
> Chunk(54786a09: 0%, 0/16777216)
> Chunk(5cdcb359: 0%, 0/16777216)
> Chunk(3e40137b: 0%, 0/16777216)
> Chunk(534f0fb3: 0%, 0/16777216)
> Chunk(6301ee8a: 0%, 0/16777216)
> Chunk(6a90c3aa: 0%, 0/16777216)
> Chunk(s) at 75~100%:
> none
> Chunk(s) at 100%:
> none
> ::snip::
> [4] Enable the memory logger by enabling trace level debugging for the 
> "drill.allocator" logger like this:
>   
>   
>   
>   
>  
> [5] On perf cluster
> # sqllineTPCDS
> ALTER SESSION SET `exec.errors.verbose` = true;
> ALTER SESSION SET `planner.enable_multiphase_agg` = false;
> ALTER SESSION SET `store.parquet.block-size` = 134217728;
> ALTER SESSION SET `planner.enable_mux_exchange` = false;
> ALTER SESSION SET `exec.min_hash_table_size` = 67108864;
> ALTER SESSION SET `planner.enable_hashagg` = true;
> ALTER SESSION SET `planner.memory.max_query_memory_per_node` = 29205777612;
> ALTER SESSION SET `planner.width.max_per_node` = 23;
> create table dfs.tmp.agg33 as
> select ss_sold_date_sk , ss_sold_time_sk , ss_item_sk , ss_customer_sk , 
> ss_cdemo_sk, count(*) from `store_sales_dri3`
>  group by ss_sold_date_sk , ss_sold_time_sk , ss_item_sk , ss_customer_sk , 
> ss_cdemo_sk;
> [6] 
> https://github.com/netty/netty/blob/master/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L98



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2936) TPCH 4 and 18 SF100 hangs when hash agg is turned off

2015-05-11 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2936:
---
Attachment: DRILL-2936.patch

> TPCH 4 and 18 SF100 hangs when hash agg is turned off
> -
>
> Key: DRILL-2936
> URL: https://issues.apache.org/jira/browse/DRILL-2936
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Steven Phillips
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: DRILL-2936.patch, Screen Shot 2015-05-01 at 2.40.36 
> PM.png
>
>
> sys options:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.memory.max_query_memory_per_node` = 29205777612;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashjoin`=false;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashagg`=false;
> {code}
> On executing TPCH 04 query hangs. From the profiles page does not look like 
> any fragments are making progress, the last progress time stamps were 
> sometime back. 
> Attached is the logical plan. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2936) TPCH 4 and 18 SF100 hangs when hash agg is turned off

2015-05-11 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537866#comment-14537866
 ] 

Steven Phillips commented on DRILL-2936:


Created reviewboard https://reviews.apache.org/r/34037/


> TPCH 4 and 18 SF100 hangs when hash agg is turned off
> -
>
> Key: DRILL-2936
> URL: https://issues.apache.org/jira/browse/DRILL-2936
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Steven Phillips
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: DRILL-2936.patch, Screen Shot 2015-05-01 at 2.40.36 
> PM.png
>
>
> sys options:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.memory.max_query_memory_per_node` = 29205777612;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashjoin`=false;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashagg`=false;
> {code}
> On executing TPCH 04 query hangs. From the profiles page does not look like 
> any fragments are making progress, the last progress time stamps were 
> sometime back. 
> Attached is the logical plan. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2936) TPCH 4 and 18 SF100 hangs when hash agg is turned off

2015-05-11 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2936:
---
Assignee: Jacques Nadeau  (was: Steven Phillips)

> TPCH 4 and 18 SF100 hangs when hash agg is turned off
> -
>
> Key: DRILL-2936
> URL: https://issues.apache.org/jira/browse/DRILL-2936
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Jacques Nadeau
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: DRILL-2936.patch, Screen Shot 2015-05-01 at 2.40.36 
> PM.png
>
>
> sys options:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.memory.max_query_memory_per_node` = 29205777612;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashjoin`=false;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashagg`=false;
> {code}
> On executing TPCH 04 query hangs. From the profiles page does not look like 
> any fragments are making progress, the last progress time stamps were 
> sometime back. 
> Attached is the logical plan. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2086) mapr profile - use MapR 4.0.2

2015-05-09 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2086:
---
Fix Version/s: (was: 1.0.0)
   1.1.0

> mapr profile - use MapR 4.0.2
> -
>
> Key: DRILL-2086
> URL: https://issues.apache.org/jira/browse/DRILL-2086
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Reporter: Patrick Wong
>Assignee: Steven Phillips
> Fix For: 1.1.0
>
> Attachments: DRILL-2086.1.patch.txt
>
>
> This will greatly simplify some other things.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2425) Wrong results when identifier change cases within the same data file

2015-05-09 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips resolved DRILL-2425.

Resolution: Duplicate

> Wrong results when identifier change cases within the same data file
> 
>
> Key: DRILL-2425
> URL: https://issues.apache.org/jira/browse/DRILL-2425
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 0.8.0
>Reporter: Chun Chang
>Assignee: Steven Phillips
>Priority: Critical
> Fix For: 1.0.0
>
>
> #Fri Mar 06 16:51:10 EST 2015
> git.commit.id.abbrev=fb293ba
> I have the following JSON file that one of the identifier change cases:
> {code}
> [root@qa-node120 md-83]# hadoop fs -cat 
> /drill/testdata/complex_type/json/schema/a.json
> {"SOURCE": "ebm","msAddressIpv6Array": null}
> {"SOURCE": "ebm","msAddressIpv6Array": {"msAddressIpv6_1":"99.111.222.0", 
> "msAddressIpv6_2":"88.222.333.0"}}
> {"SOURCE": "ebm","msAddressIpv6Array": {"msAddressIpv6_1":"99.111.222.1", 
> "msAddressIpv6_2":"88.222.333.1"}}
> {"SOURCE": "ebm","msAddressIpv6Array": {"msaddressipv6_1":"99.111.222.2", 
> "msAddressIpv6_2":"88.222.333.2"}}
> {code}
> Query this file through drill gives wrong results:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select 
> t.msAddressIpv6Array.msAddressIpv6_1 as msAddressIpv6_1 from `schema/a.json` 
> t;
> +-+
> | msAddressIpv6_1 |
> +-+
> | null|
> | null|
> | null|
> | 99.111.222.2|
> +-+
> {code}
> plan:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> explain plan for select 
> t.msAddressIpv6Array.msAddressIpv6_1 as msAddressIpv6_1 from `schema/a.json` 
> t;
> +++
> |text|json|
> +++
> | 00-00Screen
> 00-01  Project(msAddressIpv6_1=[ITEM($0, 'msAddressIpv6_1')])
> 00-02Scan(groupscan=[EasyGroupScan 
> [selectionRoot=/drill/testdata/complex_type/json/schema/a.json, numFiles=1, 
> columns=[`msAddressIpv6Array`.`msAddressIpv6_1`], 
> files=[maprfs:/drill/testdata/complex_type/json/schema/a.json]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2476) Handle IterOutcome.STOP in buildSchema()

2015-05-09 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536327#comment-14536327
 ] 

Steven Phillips commented on DRILL-2476:


+1

> Handle IterOutcome.STOP in buildSchema()
> 
>
> Key: DRILL-2476
> URL: https://issues.apache.org/jira/browse/DRILL-2476
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 0.7.0
>Reporter: Sudheesh Katkam
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
> Attachments: DRILL-2476.1.patch.txt
>
>
> There are some {{RecordBatch}} implementations like {{HashAggBatch}} that 
> override the {{buildSchema()}} function. The overriding functions do not 
> handle {{IterOutcome.STOP}}. This causes the {{FragmentContext}} to receive 
> two failures in some cases (linked JIRAs).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2743) Parquet file metadata caching

2015-05-09 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2743:
---
Fix Version/s: (was: 1.0.0)
   1.1.0

> Parquet file metadata caching
> -
>
> Key: DRILL-2743
> URL: https://issues.apache.org/jira/browse/DRILL-2743
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Parquet
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.1.0
>
> Attachments: DRILL-2743.patch, drill.parquet_metadata
>
>
> To run a query against parquet files, we have to first recursively search the 
> directory tree for all of the files, get the block locations for each file, 
> and read the footer from each file, and this is done during the planning 
> phase. When there are many files, this can result in a very large delay in 
> running the query, and it does not scale.
> However, there isn't really any need to read the footers during planning, if 
> we instead treat each parquet file as a single work unit, all we need to know 
> are the block locations for the file, the number of rows, and the columns. We 
> should store only the information which we need for planning in a file 
> located in the top directory for a given parquet table, and then we can delay 
> reading of the footers until execution time, which can be done in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2941) Update RPC layer to avoid writing local data messages to socket

2015-05-09 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2941:
---
Assignee: Jacques Nadeau  (was: Steven Phillips)

> Update RPC layer to avoid writing local data messages to socket
> ---
>
> Key: DRILL-2941
> URL: https://issues.apache.org/jira/browse/DRILL-2941
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.0.0
>
> Attachments: DRILL-2941.patch, DRILL-2941.patch
>
>
> Right now, if we send a fragment record batch to localhost, we still traverse 
> the RPC layer.   We should short-circuit this path.  This is especially 
> important in light of the mux and demux exchanges.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2971) If Bit<>Bit connection is unexpectedly closed and we were already blocked on writing to socket, we'll stay forever in ResettableBarrier.await()

2015-05-09 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2971:
---
Assignee: Jacques Nadeau  (was: Steven Phillips)

> If Bit<>Bit connection is unexpectedly closed and we were already blocked on 
> writing to socket, we'll stay forever in ResettableBarrier.await()
> ---
>
> Key: DRILL-2971
> URL: https://issues.apache.org/jira/browse/DRILL-2971
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.0.0
>
> Attachments: DRILL-2971.patch
>
>
> We need to reset the ResettableBarrier if the connection dies so that the 
> message can be failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2971) If Bit<>Bit connection is unexpectedly closed and we were already blocked on writing to socket, we'll stay forever in ResettableBarrier.await()

2015-05-09 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536322#comment-14536322
 ] 

Steven Phillips commented on DRILL-2971:


+1

> If Bit<>Bit connection is unexpectedly closed and we were already blocked on 
> writing to socket, we'll stay forever in ResettableBarrier.await()
> ---
>
> Key: DRILL-2971
> URL: https://issues.apache.org/jira/browse/DRILL-2971
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Reporter: Jacques Nadeau
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
> Attachments: DRILL-2971.patch
>
>
> We need to reset the ResettableBarrier if the connection dies so that the 
> message can be failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2941) Update RPC layer to avoid writing local data messages to socket

2015-05-09 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536302#comment-14536302
 ] 

Steven Phillips commented on DRILL-2941:


+1


> Update RPC layer to avoid writing local data messages to socket
> ---
>
> Key: DRILL-2941
> URL: https://issues.apache.org/jira/browse/DRILL-2941
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Reporter: Jacques Nadeau
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
> Attachments: DRILL-2941.patch, DRILL-2941.patch
>
>
> Right now, if we send a fragment record batch to localhost, we still traverse 
> the RPC layer.   We should short-circuit this path.  This is especially 
> important in light of the mux and demux exchanges.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2849) Difference in query results over CSV file created by CTAS, compared to results over original CSV file

2015-05-07 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533322#comment-14533322
 ] 

Steven Phillips commented on DRILL-2849:


I think we should close this bug. The query is failing due to the malformed 
data.

> Difference in query results over CSV file created by CTAS, compared to 
> results over original CSV file 
> --
>
> Key: DRILL-2849
> URL: https://issues.apache.org/jira/browse/DRILL-2849
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 0.9.0
> Environment: 64e3ec52b93e9331aa5179e040eca19afece8317 | DRILL-2611: 
> value vectors should report valid value count | 16.04.2015 @ 13:53:34 EDT
>Reporter: Khurram Faraaz
>Assignee: Khurram Faraaz
>Priority: Critical
> Fix For: 1.0.0
>
>
> Different results are seen for the same query over CSV data file and another 
> CSV data file created by CTAS using the same CSV file.
> Tests were executed on 4 node cluster on CentOS.
> I got rid of the header information that is written by CTAS into the new CSV 
> file that CTAS creates, and then ran my queries over CTAS' CSV file.
> query over uncompressed CSV file, deletions/deletions-0-of-00020.csv
> {code}
> > select count(cast(columns[0] as double)),max(cast(columns[0] as 
> > double)),min(cast(columns[0] as double)),avg(cast(columns[0] as double)), 
> > columns[7] from `deletions/deletions-0-of-00020.csv` group by 
> > columns[7];
> 88 rows selected (6.893 seconds)
> =
> {code}
> query over CSV file that was created by CTAS. (input to CTAS was 
> deletions/deletions-0-of-00020.csv)
> Notice there is one more record returned.
> {code}
> > select count(cast(columns[0] as double)),max(cast(columns[0] as 
> > double)),min(cast(columns[0] as double)),avg(cast(columns[0] as double)), 
> > columns[7] from `csvToCSV_0_of_00020/0_0_0.csv` group by columns[7];
>  
> 89 rows selected (6.623 seconds)
> ==
> {code}
> query over compressed CSV file
> {code}
> > select count(cast(columns[0] as double)),max(cast(columns[0] as 
> > double)),min(cast(columns[0] as double)),avg(cast(columns[0] as double)), 
> > columns[7] from `deletions-0-of-00020.csv.gz` group by columns[7];
> 88 rows selected (10.526 seconds)
> ==
> {code}
> In the below cases, the count and sum results are different when query is 
> executed over CSV file that was created by CTAS. ( this may explain why we 
> see the difference in results in the above queries ? )
> {code}
> 0: jdbc:drill:> select count(cast(columns[0] as double)),max(cast(columns[0] 
> as double)),min(cast(columns[0] as double)),avg(cast(columns[0] as double)), 
> columns[7] from `deletions/deletions-0-of-00020.csv` where columns[7] is 
> null group by columns[7];
> ++++++
> |   EXPR$0   |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |
> ++++++
> | 252| 1.362983396001E12 | 1.165768779027E12 | 1.293794515595635E12 | 
> null   |
> ++++++
> 1 row selected (6.013 seconds)
> 0: jdbc:drill:> select count(cast(columns[0] as double)),max(cast(columns[0] 
> as double)),min(cast(columns[0] as double)),avg(cast(columns[0] as double)), 
> columns[7] from `deletions-0-of-00020.csv.gz` where columns[7] is null 
> group by columns[7];
> ++++++
> |   EXPR$0   |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |
> ++++++
> | 252| 1.362983396001E12 | 1.165768779027E12 | 1.293794515595635E12 | 
> null   |
> ++++++
> 1 row selected (8.899 seconds)
> {code}
> Notice that count and sum results are different (from those above) when query 
> is executed over the CSV file created by CTAS.
> {code}
> 0: jdbc:drill:> select count(cast(columns[0] as double)),max(cast(columns[0] 
> as double)),min(cast(columns[0] as double)),avg(cast(columns[0] as double)), 
> columns[7] from `csvToCSV_0_of_00020/0_0_0.csv` where columns[7] is null 
> group by columns[7];
> ++++++
> |   EXPR$0   |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |
> ++++++
> | 245| 1.349670663E12 | 1.165768779027E12 | 1.2930281335065144E12 | 
> null   |
> ++--

[jira] [Commented] (DRILL-2936) TPCH 4 and 18 SF100 hangs when hash agg is turned off

2015-05-07 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533036#comment-14533036
 ] 

Steven Phillips commented on DRILL-2936:


I agree. I am working on it right now. My thought right now is to use when 
there is a HashToMerge exchange.

Are there any other situation where we would want to use it? My other thought 
was that it might be useful in the case of a sort, where there is a 
HashToRandom exchange before the sort, and the ExternalSort starts spilling to 
disk, this can result in very poor performance because all of the senders get 
blocked as soon as any one sort fragment starts spilling. Using the spooling 
buffer would prevent this from happening.

> TPCH 4 and 18 SF100 hangs when hash agg is turned off
> -
>
> Key: DRILL-2936
> URL: https://issues.apache.org/jira/browse/DRILL-2936
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Steven Phillips
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: Screen Shot 2015-05-01 at 2.40.36 PM.png
>
>
> sys options:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.memory.max_query_memory_per_node` = 29205777612;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashjoin`=false;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashagg`=false;
> {code}
> On executing TPCH 04 query hangs. From the profiles page does not look like 
> any fragments are making progress, the last progress time stamps were 
> sometime back. 
> Attached is the logical plan. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2936) TPCH 4 and 18 SF100 hangs when hash agg is turned off

2015-05-05 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529841#comment-14529841
 ] 

Steven Phillips commented on DRILL-2936:


The reason this particular query hits this problem is because there is a 
HashtoRandom exchange followed by a HashToMerge exchange, both distributed on 
the same key, with the same number of fragments. 

> TPCH 4 and 18 SF100 hangs when hash agg is turned off
> -
>
> Key: DRILL-2936
> URL: https://issues.apache.org/jira/browse/DRILL-2936
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Steven Phillips
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: Screen Shot 2015-05-01 at 2.40.36 PM.png
>
>
> sys options:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.memory.max_query_memory_per_node` = 29205777612;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashjoin`=false;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashagg`=false;
> {code}
> On executing TPCH 04 query hangs. From the profiles page does not look like 
> any fragments are making progress, the last progress time stamps were 
> sometime back. 
> Attached is the logical plan. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2936) TPCH 4 and 18 SF100 hangs when hash agg is turned off

2015-05-05 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529839#comment-14529839
 ] 

Steven Phillips commented on DRILL-2936:


It turns out this is caused by a sort-of deadlock situation condition that can 
arise with hash-to-merge exchange. The hash-to-merge exchange consists of a 
partition sender and a merging receiver. The partition sender has outgoing 
buckets it sends to the different downstream minor fragments. And each merging 
receiver has an incoming buffer for each of the sending minor fragments.

The merging receiver cannot proceed without data from each of the sending 
fragments. If data from any one of the sending fragments is unavailable, it 
will block until it receives some data from that fragment, or a message 
indicating there is no more data from that fragment.

If there is some skew in the data, it's possible that a partition sender may 
not send any data to a particular receiver. That receiver will end up blocking 
because it is waiting to receive that data. Since it is blocked, it is unable 
to consume the data that it does receive from other senders. After a few 
batches, the sender also blocks due to backpressure, because the receiver is 
unable to consume.

Once we reach this state, the query hangs indefinitely.

> TPCH 4 and 18 SF100 hangs when hash agg is turned off
> -
>
> Key: DRILL-2936
> URL: https://issues.apache.org/jira/browse/DRILL-2936
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Steven Phillips
>Priority: Critical
> Fix For: 1.0.0
>
> Attachments: Screen Shot 2015-05-01 at 2.40.36 PM.png
>
>
> sys options:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.memory.max_query_memory_per_node` = 29205777612;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashjoin`=false;
> 0: jdbc:drill:schema=dfs.drillTestDirTpch100P> alter system set 
> `planner.enable_hashagg`=false;
> {code}
> On executing TPCH 04 query hangs. From the profiles page does not look like 
> any fragments are making progress, the last progress time stamps were 
> sometime back. 
> Attached is the logical plan. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2906) Json reader with extended json adds extra column

2015-05-05 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-2906:
---
Summary: Json reader with extended json adds extra column  (was: CTAS with 
store.format = 'json' returns incorrect results)

> Json reader with extended json adds extra column
> 
>
> Key: DRILL-2906
> URL: https://issues.apache.org/jira/browse/DRILL-2906
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON, Storage - Writer
>Reporter: Mehant Baid
>Assignee: Steven Phillips
> Fix For: 1.0.0
>
>
> Performing a CTAS with 'store.format' = 'json' and querying the table results 
> in projecting an addition field '*' will null values. Below is a simple repro
> 0: jdbc:drill:zk=local> create table t as select timestamp '1980-10-01 
> 00:00:00' from cp.`employee.json` limit 1;
> ++---+
> |  Fragment  | Number of records written |
> ++---+
> | 0_0| 1 |
> ++---+
> 1 row selected (0.314 seconds)
> 0: jdbc:drill:zk=local> select * from t;
> +++
> |   EXPR$0   | *  |
> +++
> | 1980-10-01 00:00:00.0 | null   |
> +++
> Notice in the above result set we get an extra column '*' with null value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2   3   4   5   6   7   >