[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348039#comment-16348039
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165263759
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -215,6 +206,7 @@ public BatchHolder() {
   MaterializedField outputField = materializedValueFields[i];
   // Create a type-specific ValueVector for this value
   vector = TypeHelper.getNewVector(outputField, allocator);
+  int columnSize = new RecordBatchSizer.ColumnSize(vector).estSize;
--- End diff --

I can think of three reasons to use the sizer:

* Type logic is complex: we have multiple sets of rules depending on the 
data type. Best to encapsulate the logic in a single place. So, either 1) use 
the "sizer", or 2) move the logic from the "sizer" to a common utility.
* Column size is tricky as it depends on `DataMode`. The size or a 
`Required INT` is 4. The (total memory) size of an `Optional INT` is 5. For a 
`Repeated INT`? You need to know the average array cardinality, which the 
"sizer" provides (by analyzing an input batch.)
* As discussed, variable-width columns (`VARCHAR`, `VARBINARY` for HBase) 
have no known size. We really have to completely forget about that awful "50" 
estimate. We can only estimate size from input, which is, again, what the 
"sizer" does.

Of course, all the above only works I you actually sample the input.

A current limitation (and good enhancement) is that the Sizer is aware of 
just one batch. The sort (the first user of the "sizer") needed only aggregate 
row size, so it just kept track of the widest row ever seen. If you need 
detailed column information, you may want another layer: one that aggregates 
information across batches. (For arrays and variable-width columns, you can 
take the weighted average or the maximum depending on your needs.)

Remember, if the purpose of this number is to estimate memory use, then you 
have to add a 33% (average) allowance for internal fragmentation. (Each vector 
is, on average, 75% full.)


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6129) Query fails on nested data type schema change

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348024#comment-16348024
 ] 

ASF GitHub Bot commented on DRILL-6129:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1106
  
Note that a similar bug was recently fixed in (as I recall) the Merge 
Receiver. As part of this fix, would be good to either:

1. Determine if we have more copies of this logic besides the Merge 
Receiver (previously fixed) and the client code (fixed here.)
2. Refactor the code so that all use cases use a common set of code for 
this task.

In any event, would be good to compare this code with that done in the 
Merge Receiver to ensure that we are using a common approach. See 
`exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java` in 
PR #968.

The two sets of code appear similar, depending on what `isSameSchema()` 
does with a list of `MaterializedField`s. But, please take a look.


> Query fails on nested data type schema change
> -
>
> Key: DRILL-6129
> URL: https://issues.apache.org/jira/browse/DRILL-6129
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.10.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
> Fix For: 1.13.0
>
>
> Use-Case -
>  * Assume two parquet files with similar schemas except for a nested column
>  * Schema file1
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>   repeated group list
>  * optional group element
>  ** optional int64 child_field
>  * Schema file2
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>   repeated group list
>  * optional group element
>  ** optional group child_field
>  *** optional int64 child_field_f1
>  *** optional int64 child_field_f1
>  * Essentially child_field changed from an int64 to a group of fields
>  
> Observed Query Failure
> select * from ;
> Error: Unexpected RuntimeException: java.lang.IllegalArgumentException: The 
> field $bits$(UINT1:REQUIRED) doesn't match the provided metadata major_type {
>   minor_type: MAP
>   mode: REQUIRED
> Note that selecting one file at a time succeeds which seems to indicate the 
> issue has to do with the schema change logic. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6124) testCountDownLatch can be null in PartitionerDecorator depending on user's injection controls config

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347965#comment-16347965
 ] 

ASF GitHub Bot commented on DRILL-6124:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1103
  
You are right @arina-ielchiieva . Thanks for catching this, I will close 
the PR and mark the jira as invalid.


> testCountDownLatch can be null in PartitionerDecorator depending on user's 
> injection controls config
> 
>
> Key: DRILL-6124
> URL: https://issues.apache.org/jira/browse/DRILL-6124
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Minor
> Fix For: 1.13.0
>
>
> In PartitionerDecorator we get a latch from the injector with the following 
> code.
> testCountDownLatch = injector.getLatch(context.getExecutionControls(), 
> "partitioner-sender-latch");
> However, if there is no injection site defined in the user's drill 
> configuration then testCountDownLatch will be null. So we have to check if it 
> is null in order to avoid NPE's



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6124) testCountDownLatch can be null in PartitionerDecorator depending on user's injection controls config

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347966#comment-16347966
 ] 

ASF GitHub Bot commented on DRILL-6124:
---

Github user ilooner closed the pull request at:

https://github.com/apache/drill/pull/1103


> testCountDownLatch can be null in PartitionerDecorator depending on user's 
> injection controls config
> 
>
> Key: DRILL-6124
> URL: https://issues.apache.org/jira/browse/DRILL-6124
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Minor
> Fix For: 1.13.0
>
>
> In PartitionerDecorator we get a latch from the injector with the following 
> code.
> testCountDownLatch = injector.getLatch(context.getExecutionControls(), 
> "partitioner-sender-latch");
> However, if there is no injection site defined in the user's drill 
> configuration then testCountDownLatch will be null. So we have to check if it 
> is null in order to avoid NPE's



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6128) Wrong Result with Nested Loop Join

2018-01-31 Thread Sorabh Hamirwasia (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347960#comment-16347960
 ] 

Sorabh Hamirwasia edited comment on DRILL-6128 at 2/1/18 3:36 AM:
--

I did some more investigation on this and looks like we recently added 
generated code for *doEval* method of NestedLoopJoin as part of DRILL-5375. The 
right sift happens inside *doEval* method, reason being it identifies the right 
side container as HyperContainer and does right shift on the index to get the 
batchIndex. This will only work if it's insured by the creator of Expandable 
HyperContainer to fully pack the value vectors inside it and then just use So 
far other operators using the ExpandableHyperContainers are HashJoin, 
MergingRecordBatch and Sort/TopN (using PriorityQueue).

>From discussion with [~amansinha100] it looks like while building HashTable we 
>use HyperContainers of BatchHolders, but we make sure that each BatchHolder is 
>fully filled before adding another one in the container. Hence it is working 
>fine with respect to generated code accessing records from it. It would be 
>good to make sure PriorityQueue is also doing something like this.

*Current Nested Loop Behavior:*

NestedLoop Join adds the right side input batches inside HyperContainer 
(rightContainer) without ensuring it's fully packed. It also maintains a list 
of record counts in each batch in rightCounts. Later these are passed to 
generated code using BatchReference. During EvaluationVisitor, when it sees the 
rightContainer as HyperContainer it does right shift on passed batchIndex.

*Currently to fix this issue we have following ways:*
 1) Make sure all the operators using Hyper container always fully pack it 
(which looks to be the case today). In which case Nested Loop Join has to do 
something similar like while adding batches inside rightContainer make sure the 
batch is fully packed.

2) Operators which are not fully packing the hyper container should use 
BatchReference for generated code and should also keep track of list of record 
counts in each batch along with hyper container. Whenever only index of the 
record is passed to the generated code it should generate index like: 
rightIndex = batchIndex << 16 + recordWithinBatchIndex. When both batch index 
and record index is passed separately then it should generate batchIndex = 
batchIndex << 16 and pass recordWithintBatchIndex separately. Later is the case 
for NestedLoopJoin current Implementation.


was (Author: shamirwasia):
I did some more investigation on this and looks like we recently added 
generated code for *doEval* method of NestedLoopJoin. The right sift happens 
inside *doEval* method, reason being it identifies the right side container as 
HyperContainer and does right shift on the index to get the batchIndex. This 
will only work if it's insured by the creator of Expandable HyperContainer to 
fully pack the value vectors inside it and then just use So far other operators 
using the ExpandableHyperContainers are HashJoin, MergingRecordBatch and 
Sort/TopN (using PriorityQueue).

>From discussion with [~amansinha100] it looks like while building HashTable we 
>use HyperContainers of BatchHolders, but we make sure that each BatchHolder is 
>fully filled before adding another one in the container. Hence it is working 
>fine with respect to generated code accessing records from it. It would be 
>good to make sure PriorityQueue is also doing something like this.

*Current Nested Loop Behavior:*

NestedLoop Join adds the right side input batches inside HyperContainer 
(rightContainer) without ensuring it's fully packed. It also maintains a list 
of record counts in each batch in rightCounts. Later these are passed to 
generated code using BatchReference. During EvaluationVisitor, when it sees the 
rightContainer as HyperContainer it does right shift on passed batchIndex.


*Currently to fix this issue we have following ways:*
1) Make sure all the operators using Hyper container always fully pack it 
(which looks to be the case today). In which case Nested Loop Join has to do 
something similar like while adding batches inside rightContainer make sure the 
batch is fully packed.

2) Operators which are not fully packing the hyper container should use 
BatchReference for generated code and should also keep track of list of record 
counts in each batch along with hyper container. Whenever only index of the 
record is passed to the generated code it should generate index like: 
rightIndex = batchIndex << 16 + recordWithinBatchIndex. When both batch index 
and record index is passed separately then it should generate batchIndex = 
batchIndex << 16 and pass recordWithintBatchIndex separately. Later is the case 
for NestedLoopJoin current Implementation.

> Wrong Result with Nested Loop Join
> --
>
> Key: DRILL-6128

[jira] [Commented] (DRILL-6128) Wrong Result with Nested Loop Join

2018-01-31 Thread Sorabh Hamirwasia (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347960#comment-16347960
 ] 

Sorabh Hamirwasia commented on DRILL-6128:
--

I did some more investigation on this and looks like we recently added 
generated code for *doEval* method of NestedLoopJoin. The right sift happens 
inside *doEval* method, reason being it identifies the right side container as 
HyperContainer and does right shift on the index to get the batchIndex. This 
will only work if it's insured by the creator of Expandable HyperContainer to 
fully pack the value vectors inside it and then just use So far other operators 
using the ExpandableHyperContainers are HashJoin, MergingRecordBatch and 
Sort/TopN (using PriorityQueue).

>From discussion with [~amansinha100] it looks like while building HashTable we 
>use HyperContainers of BatchHolders, but we make sure that each BatchHolder is 
>fully filled before adding another one in the container. Hence it is working 
>fine with respect to generated code accessing records from it. It would be 
>good to make sure PriorityQueue is also doing something like this.

*Current Nested Loop Behavior:*

NestedLoop Join adds the right side input batches inside HyperContainer 
(rightContainer) without ensuring it's fully packed. It also maintains a list 
of record counts in each batch in rightCounts. Later these are passed to 
generated code using BatchReference. During EvaluationVisitor, when it sees the 
rightContainer as HyperContainer it does right shift on passed batchIndex.


*Currently to fix this issue we have following ways:*
1) Make sure all the operators using Hyper container always fully pack it 
(which looks to be the case today). In which case Nested Loop Join has to do 
something similar like while adding batches inside rightContainer make sure the 
batch is fully packed.

2) Operators which are not fully packing the hyper container should use 
BatchReference for generated code and should also keep track of list of record 
counts in each batch along with hyper container. Whenever only index of the 
record is passed to the generated code it should generate index like: 
rightIndex = batchIndex << 16 + recordWithinBatchIndex. When both batch index 
and record index is passed separately then it should generate batchIndex = 
batchIndex << 16 and pass recordWithintBatchIndex separately. Later is the case 
for NestedLoopJoin current Implementation.

> Wrong Result with Nested Loop Join
> --
>
> Key: DRILL-6128
> URL: https://issues.apache.org/jira/browse/DRILL-6128
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
>
> Nested Loop Join produces wrong result's if there are multiple batches on the 
> right side. It builds an ExapandableHyperContainer to hold all the right side 
> of batches. Then for each record on left side input evaluates the condition 
> with all records on right side and emit the output if condition is satisfied. 
> The main loop inside 
> [populateOutgoingBatch|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java#L106]
>  call's *doEval* with correct indexes to evaluate records on both the sides. 
> In generated code of *doEval* for some reason there is a right shift of 16 
> done on the rightBatchIndex (sample shared below).
> {code:java}
> public boolean doEval(int leftIndex, int rightBatchIndex, int 
> rightRecordIndexWithinBatch)
>  throws SchemaChangeException
> {
>   {
>IntHolder out3 = new IntHolder();
>{
>  out3 .value = vv0 .getAccessor().get((leftIndex));
>}
>IntHolder out7 = new IntHolder();
>{
>  out7 .value =  
>  
> vv4[((rightBatchIndex)>>>16)].getAccessor().get(((rightRecordIndexWithinBatch)&
>  65535));
>}
> ..
> ..
> }{code}
>  
> When the actual loop is processing second batch, inside eval method the index 
> with right shift becomes 0 and it ends up evaluating condition w.r.t first 
> right batch again. So if there is more than one batch (upto 65535) on right 
> side doEval will always consider first batch for condition evaluation. But 
> the output data will be based on correct batch so there will be issues like 
> OutOfBound and WrongData. Cases can be:
> Let's say: *rightBatchIndex*: index of right batch to consider, 
> *rightRecordIndexWithinBatch*: index of record in right batch at 
> rightBatchIndex
> 1) First right batch comes with zero data and with OK_NEW_SCHEMA (let's say 
> because of filter in the operator tree). Next Right batch has > 0 data. So 
> when we call doEval for second batch(*rightBatchIndex = 1*) and first record 
> in it (i.e. 

[jira] [Updated] (DRILL-6129) Query fails on nested data type schema change

2018-01-31 Thread salim achouche (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6129:
--
Reviewer: Aman Sinha

> Query fails on nested data type schema change
> -
>
> Key: DRILL-6129
> URL: https://issues.apache.org/jira/browse/DRILL-6129
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.10.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
> Fix For: 1.13.0
>
>
> Use-Case -
>  * Assume two parquet files with similar schemas except for a nested column
>  * Schema file1
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>   repeated group list
>  * optional group element
>  ** optional int64 child_field
>  * Schema file2
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>   repeated group list
>  * optional group element
>  ** optional group child_field
>  *** optional int64 child_field_f1
>  *** optional int64 child_field_f1
>  * Essentially child_field changed from an int64 to a group of fields
>  
> Observed Query Failure
> select * from ;
> Error: Unexpected RuntimeException: java.lang.IllegalArgumentException: The 
> field $bits$(UINT1:REQUIRED) doesn't match the provided metadata major_type {
>   minor_type: MAP
>   mode: REQUIRED
> Note that selecting one file at a time succeeds which seems to indicate the 
> issue has to do with the schema change logic. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6129) Query fails on nested data type schema change

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347950#comment-16347950
 ] 

ASF GitHub Bot commented on DRILL-6129:
---

Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1106
  
@amansinha100 can you please review it?


> Query fails on nested data type schema change
> -
>
> Key: DRILL-6129
> URL: https://issues.apache.org/jira/browse/DRILL-6129
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.10.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
> Fix For: 1.13.0
>
>
> Use-Case -
>  * Assume two parquet files with similar schemas except for a nested column
>  * Schema file1
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>   repeated group list
>  * optional group element
>  ** optional int64 child_field
>  * Schema file2
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>   repeated group list
>  * optional group element
>  ** optional group child_field
>  *** optional int64 child_field_f1
>  *** optional int64 child_field_f1
>  * Essentially child_field changed from an int64 to a group of fields
>  
> Observed Query Failure
> select * from ;
> Error: Unexpected RuntimeException: java.lang.IllegalArgumentException: The 
> field $bits$(UINT1:REQUIRED) doesn't match the provided metadata major_type {
>   minor_type: MAP
>   mode: REQUIRED
> Note that selecting one file at a time succeeds which seems to indicate the 
> issue has to do with the schema change logic. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6129) Query fails on nested data type schema change

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347948#comment-16347948
 ] 

ASF GitHub Bot commented on DRILL-6129:
---

GitHub user sachouche opened a pull request:

https://github.com/apache/drill/pull/1106

DRILL-6129: Fixed query failure due to nested column data type change

Problem Description -
- The Drillbit was able to successfully send batches containing different 
metadata (for nested columns)
- This was the case when one or multiple scanners were involved
- The issue happened within the client where value vectors are cached 
across batches
- The load(...) API is responsible for updating values vectors when a new 
batch arrives
- The RecordBatchLoader class is used to detect schema changes ; if this is 
the case, then previous value vectors are discarded and new ones created
- There is a bug with the current implementation where only first level 
columns are compared

Fix -
- The fix is to improve the schema diff logic by including nested columns

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sachouche/drill DRILL-6129

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1106.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1106


commit 9ffb41f509cd2531e7f3cdf89a66605ec0fdf7a4
Author: Salim Achouche 
Date:   2018-02-01T02:59:58Z

DRILL-6129: Fixed query failure due to nested column data type change




> Query fails on nested data type schema change
> -
>
> Key: DRILL-6129
> URL: https://issues.apache.org/jira/browse/DRILL-6129
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 1.10.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
> Fix For: 1.13.0
>
>
> Use-Case -
>  * Assume two parquet files with similar schemas except for a nested column
>  * Schema file1
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>   repeated group list
>  * optional group element
>  ** optional int64 child_field
>  * Schema file2
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>   repeated group list
>  * optional group element
>  ** optional group child_field
>  *** optional int64 child_field_f1
>  *** optional int64 child_field_f1
>  * Essentially child_field changed from an int64 to a group of fields
>  
> Observed Query Failure
> select * from ;
> Error: Unexpected RuntimeException: java.lang.IllegalArgumentException: The 
> field $bits$(UINT1:REQUIRED) doesn't match the provided metadata major_type {
>   minor_type: MAP
>   mode: REQUIRED
> Note that selecting one file at a time succeeds which seems to indicate the 
> issue has to do with the schema change logic. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6129) Query fails on nested data type schema change

2018-01-31 Thread salim achouche (JIRA)
salim achouche created DRILL-6129:
-

 Summary: Query fails on nested data type schema change
 Key: DRILL-6129
 URL: https://issues.apache.org/jira/browse/DRILL-6129
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - CLI
Affects Versions: 1.10.0
Reporter: salim achouche
Assignee: salim achouche
 Fix For: 1.13.0


Use-Case -
 * Assume two parquet files with similar schemas except for a nested column
 * Schema file1
 ** int64 field1
 ** optional group field2

 *** optional group field2.1 (LIST)
  repeated group list

 * optional group element

 ** optional int64 child_field
 * Schema file2
 ** int64 field1
 ** optional group field2

 *** optional group field2.1 (LIST)
  repeated group list

 * optional group element

 ** optional group child_field
 *** optional int64 child_field_f1
 *** optional int64 child_field_f1
 * Essentially child_field changed from an int64 to a group of fields

 

Observed Query Failure

select * from ;
Error: Unexpected RuntimeException: java.lang.IllegalArgumentException: The 
field $bits$(UINT1:REQUIRED) doesn't match the provided metadata major_type {
  minor_type: MAP
  mode: REQUIRED
Note that selecting one file at a time succeeds which seems to indicate the 
issue has to do with the schema change logic. 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6106) Use valueOf method instead of constructor since valueOf has a higher performance by caching frequently requested values.

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347848#comment-16347848
 ] 

ASF GitHub Bot commented on DRILL-6106:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1099


> Use valueOf method instead of constructor since valueOf has a higher 
> performance by caching frequently requested values.
> 
>
> Key: DRILL-6106
> URL: https://issues.apache.org/jira/browse/DRILL-6106
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Reudismam Rolim de Sousa
>Assignee: Reudismam Rolim de Sousa
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Use valueOf method instead of constructor since valueOf has a higher 
> performance by caching frequently requested values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6128) Wrong Result with Nested Loop Join

2018-01-31 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-6128:


 Summary: Wrong Result with Nested Loop Join
 Key: DRILL-6128
 URL: https://issues.apache.org/jira/browse/DRILL-6128
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Sorabh Hamirwasia
Assignee: Sorabh Hamirwasia


Nested Loop Join produces wrong result's if there are multiple batches on the 
right side. It builds an ExapandableHyperContainer to hold all the right side 
of batches. Then for each record on left side input evaluates the condition 
with all records on right side and emit the output if condition is satisfied. 
The main loop inside 
[populateOutgoingBatch|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java#L106]
 call's *doEval* with correct indexes to evaluate records on both the sides. In 
generated code of *doEval* for some reason there is a right shift of 16 done on 
the rightBatchIndex (sample shared below).
{code:java}
public boolean doEval(int leftIndex, int rightBatchIndex, int 
rightRecordIndexWithinBatch)
 throws SchemaChangeException
{
  {
   IntHolder out3 = new IntHolder();
   {
 out3 .value = vv0 .getAccessor().get((leftIndex));
   }
   IntHolder out7 = new IntHolder();
   {
 out7 .value =  
 
vv4[((rightBatchIndex)>>>16)].getAccessor().get(((rightRecordIndexWithinBatch)& 
65535));
   }

..
..
}{code}
 

When the actual loop is processing second batch, inside eval method the index 
with right shift becomes 0 and it ends up evaluating condition w.r.t first 
right batch again. So if there is more than one batch (upto 65535) on right 
side doEval will always consider first batch for condition evaluation. But the 
output data will be based on correct batch so there will be issues like 
OutOfBound and WrongData. Cases can be:

Let's say: *rightBatchIndex*: index of right batch to consider, 
*rightRecordIndexWithinBatch*: index of record in right batch at rightBatchIndex

1) First right batch comes with zero data and with OK_NEW_SCHEMA (let's say 
because of filter in the operator tree). Next Right batch has > 0 data. So when 
we call doEval for second batch(*rightBatchIndex = 1*) and first record in it 
(i.e. *rightRecordIndexWithinBatch = 0*), actual evaluation will happen using 
first batch (since *rightBatchIndex >>> 16 = 0*). On accessing record at 
*rightRecordIndexWithinBatch* in first batch it will throw 
*IndexOutofBoundException* since the first batch has no records.

2) Let's say there are 2 batches on right side. Also let's say first batch 
contains 3 records (with id_right=1/2/3) and 2nd batch also contain 3 records 
(with id_right=10/20/30). Also let's say there is 1 batch on left side with 3 
records (with id_left=1/2/3). Then in this case the NestedLoopJoin (with 
equality condition) will end up producing 6 records instead of 3. It produces 
first 3 records based on match between left records and match in first right 
batch records. But while 2nd right batch it will evaluate id_left=id_right 
based on first batch instead and will again find matches and will produce 
another 3 records. *Example:*

*Left Batch Data:*

 
{code:java}
Batch1:

{
 "id_left": 1,
 "cost_left": 11,
 "name_left": "item11"
}
{
 "id_left": 2,
 "cost_left": 21,
 "name_left": "item21"
}
{
 "id_left": 3,
 "cost_left": 31,
 "name_left": "item31"
}{code}
 

*Right Batch Data:*

 
{code:java}
Batch 1:
{
 "id_right": 1,
 "cost_right": 10,
 "name_right": "item1"
}
{
 "id_right": 2,
 "cost_right": 20,
 "name_right": "item2"
}
{
 "id_right": 3,
 "cost_right": 30,
 "name_right": "item3"
}
{code}
 

 
{code:java}
Batch 2:
{
 "id_right": 4,
 "cost_right": 40,
 "name_right": "item4"
}
{
 "id_right": 4,
 "cost_right": 40,
 "name_right": "item4"
}
{
 "id_right": 4,
 "cost_right": 40,
 "name_right": "item4"
}{code}
 

*Produced output:*
{code:java}
{
 "id_left": 1,
 "cost_left": 11,
 "name_left": "item11",
 "id_right": 1,
 "cost_right": 10,
 "name_right": "item1"
}
{
 "id_left": 1,
 "cost_left": 11,
 "name_left": "item11",
 "id_right": 4,
 "cost_right": 40,
 "name_right": "item4"
}
{
 "id_left": 2,
 "cost_left": 21,
 "name_left": "item21"
 "id_right": 2, 
 "cost_right": 20,
 "name_right": "item2"
}
{
 "id_left": 2,
 "cost_left": 21,
 "name_left": "item21"
 "id_right": 4, 
 "cost_right": 40,
 "name_right": "item4"
}
{
 "id_left": 3,
 "cost_left": 31,
 "name_left": "item31"
 "id_right": 3, 
 "cost_right": 30,
 "name_right": "item3"
}
{
 "id_left": 3,
 "cost_left": 31,
 "name_left": "item31"
 "id_right": 4, 
 "cost_right": 40,
 "name_right": "item4"
}{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6111) NullPointerException with Kafka Storage Plugin

2018-01-31 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua reassigned DRILL-6111:
---

Assignee: Bhallamudi Venkata Siva Kamesh

> NullPointerException with Kafka Storage Plugin
> --
>
> Key: DRILL-6111
> URL: https://issues.apache.org/jira/browse/DRILL-6111
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.12.0
>Reporter: Jared Stehler
>Assignee: Bhallamudi Venkata Siva Kamesh
>Priority: Major
>
> I'm unable to query using the kafka storage plugin; queries are failing with 
> a NPE which *seems* like a json typo:
> {code:java}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 1:2
> [Error Id: 49d5f72f-0187-480b-8b29-6eeeb5adc88f on 10.80.53.16:31820]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
>  ~[drill-common-1.12.0.jar:1.12.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:298)
>  [drill-java-exec-1.12.0.jar:1.12.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.12.0.jar:1.12.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267)
>  [drill-java-exec-1.12.0.jar:1.12.0]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.12.0.jar:1.12.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_131]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_131]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
> Caused by: com.fasterxml.jackson.databind.JsonMappingException: Instantiation 
> of [simple type, class org.apache.drill.exec.store.kafka.KafkaSubScan] value 
> failed (java.lang.NullPointerException): null
> at [Source: {
> "pop" : "single-sender",
> "@id" : 0,
> "receiver-major-fragment" : 0,
> "receiver-minor-fragment" : 0,
> "child" : {
> "pop" : "selection-vector-remover",
> "@id" : 1,
> "child" : {
> "pop" : "limit",
> "@id" : 2,
> "child" : {
> "pop" : "kafka-partition-scan",
> "@id" : 3,
> "userName" : "",
> "columns" : [ "`*`" ],
> "partitionSubScanSpecList" : [ {
> "topicName" : "ingest-prime",
> "partitionId" : 5,
> "startOffset" : 8824294,
> "endOffset" : 8874172
> }, {
> "topicName" : "ingest-prime",
> "partitionId" : 1,
> "startOffset" : 8826346,
> "endOffset" : 8874623
> }, {
> "topicName" : "ingest-prime",
> "partitionId" : 6,
> "startOffset" : 8824744,
> "endOffset" : 8874617
> } ],
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "KafkaStoragePluginConfig" : {
> "type" : "kafka",
> "kafkaConsumerProps" : {
> "key.deserializer" : 
> "org.apache.kafka.common.serialization.ByteArrayDeserializer",
> "auto.offset.reset" : "earliest",
> "bootstrap.servers" : 
> "kafkas.dev3.master.us-west-2.prod.aws.intellify.io:9092",
> "enable.auto.commit" : "true",
> "group.id" : "drill-query-consumer-1",
> "value.deserializer" : 
> "org.apache.kafka.common.serialization.ByteArrayDeserializer",
> "session.timeout.ms" : "3"
> },
> "enabled" : true
> },
> "cost" : 0.0
> },
> "first" : 0,
> "last" : 2,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 2.0
> },
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 2.0
> },
> "destination" : "CgsxMC44MC41My4xNhDM+AEYzfgBIM74ATIGMS4xMi4wOAA=",
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 2.0
> }; line: 49, column: 7] (through reference chain: 
> org.apache.drill.exec.physical.config.SingleSender["child"]->org.apache.drill.exec.physical.config.SelectionVectorRemover["child"]->org.apache.drill.exec.physical.config.Limit["child"])
> at 
> com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:263)
>  ~[jackson-databind-2.7.9.1.jar:2.7.9.1]
> at 
> com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.wrapAsJsonMappingException(StdValueInstantiator.java:453)
>  ~[jackson-databind-2.7.9.1.jar:2.7.9.1]
> at 
> com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.rewrapCtorProblem(StdValueInstantiator.java:472)
>  ~[jackson-databind-2.7.9.1.jar:2.7.9.1]
> at 
> com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:258)
>  ~[jackson-databind-2.7.9.1.jar:2.7.9.1]
> at 
> com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:135)
>  ~[jackson-databind-2.7.9.1.jar:2.7.9.1]
> at 
> com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:444)
>  

[jira] [Commented] (DRILL-6106) Use valueOf method instead of constructor since valueOf has a higher performance by caching frequently requested values.

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347506#comment-16347506
 ] 

ASF GitHub Bot commented on DRILL-6106:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1099
  
@reudismam Travis fails in other PRs as well. See #1105.


> Use valueOf method instead of constructor since valueOf has a higher 
> performance by caching frequently requested values.
> 
>
> Key: DRILL-6106
> URL: https://issues.apache.org/jira/browse/DRILL-6106
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Reudismam Rolim de Sousa
>Assignee: Reudismam Rolim de Sousa
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Use valueOf method instead of constructor since valueOf has a higher 
> performance by caching frequently requested values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347504#comment-16347504
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165168294
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -84,13 +85,6 @@
 public abstract class HashAggTemplate implements HashAggregator {
   protected static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HashAggregator.class);
 
-  private static final int VARIABLE_MAX_WIDTH_VALUE_SIZE = 50;
-  private static final int VARIABLE_MIN_WIDTH_VALUE_SIZE = 8;
-
-  private static final boolean EXTRA_DEBUG_1 = false;
--- End diff --

Oh but there is! slf4j and logback have a feature called markers, which 
allows you to associate a tag with a statement. When you print logs you can 
specify to filter by level and by marker. There is a working example here 
https://examples.javacodegeeks.com/enterprise-java/slf4j/slf4j-markers-example/ 
. I will update the log statements to use markers in this PR.


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347492#comment-16347492
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r164617150
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -84,13 +85,6 @@
 public abstract class HashAggTemplate implements HashAggregator {
   protected static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HashAggregator.class);
 
-  private static final int VARIABLE_MAX_WIDTH_VALUE_SIZE = 50;
-  private static final int VARIABLE_MIN_WIDTH_VALUE_SIZE = 8;
-
-  private static final boolean EXTRA_DEBUG_1 = false;
--- End diff --

The logging framework only gives error/warning/debug/trace ... there is no 
option for a user configurable level 


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347493#comment-16347493
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165166234
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ---
@@ -255,7 +254,6 @@ private HashAggregator createAggregatorInternal() 
throws SchemaChangeException,
   groupByOutFieldIds[i] = container.add(vv);
 }
 
-int extraNonNullColumns = 0; // each of SUM, MAX and MIN gets an extra 
bigint column
--- End diff --

Maybe do this work as a separate PR (for DRILL-5728) ?  Else it would delay 
this PR, and overload it ...


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347447#comment-16347447
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165161146
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ---
@@ -255,7 +254,6 @@ private HashAggregator createAggregatorInternal() 
throws SchemaChangeException,
   groupByOutFieldIds[i] = container.add(vv);
 }
 
-int extraNonNullColumns = 0; // each of SUM, MAX and MIN gets an extra 
bigint column
--- End diff --

Thanks for catching this. Then we should fix the underlying problem instead 
of passing around additional parameters to work around the issue. I will work 
on fixing the codegen for the BatchHolder as part of this PR.


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347420#comment-16347420
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165156589
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -215,6 +206,7 @@ public BatchHolder() {
   MaterializedField outputField = materializedValueFields[i];
   // Create a type-specific ValueVector for this value
   vector = TypeHelper.getNewVector(outputField, allocator);
+  int columnSize = new RecordBatchSizer.ColumnSize(vector).estSize;
--- End diff --

@ilooner That is the point. If we know the exact value, why do we need 
RecordBatchSizer ? we should use RecordBatchSizer when we need to get sizing 
information for a batch (in most cases, incoming batch). In this case, you are 
allocating memory for value vectors for the batch you are building. For fixed 
width columns, you can get the column width size for each type you are 
allocating memory for using TypeHelper.getSize. For variable width columns, 
TypeHelper.getSize assumes it is 50 bytes.  If you want to adjust memory you 
are allocating for variable width columns for outgoing batch based on incoming 
batch, that's when you use RecordBatchSizer on actual incoming batch to figure 
out the average size of that column.  You can also use RecordBatchSizer on 
incoming batch if you want to figure out how many values you want to allocate 
memory for in the outgoing batch. Note that, with your change, for just created 
value vectors, variable width columns will return estSize of 1, which is not 
what you want. 


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized

2018-01-31 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6125:
--
Fix Version/s: 1.13.0

> PartitionSenderRootExec can leak memory because close method is not 
> synchronized
> 
>
> Key: DRILL-6125
> URL: https://issues.apache.org/jira/browse/DRILL-6125
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Minor
> Fix For: 1.13.0
>
>
> PartitionSenderRootExec creates a PartitionerDecorator and saves it in the 
> *partitioner* field. The creation of the partitioner happens in the 
> createPartitioner method. This method get's called by the main fragment 
> thread. The partitioner field is accessed by the fragment thread during 
> normal execution but it can also be accessed by the receivingFragmentFinished 
> method which is a callback executed by the event processor thread. Because 
> multiple threads can access the partitioner field synchronization is done on 
> creation and on when receivingFragmentFinished. However, the close method can 
> also be called by the event processor thread, and the close method does not 
> synchronize before accessing the partitioner field. Since synchronization is 
> not done the event processor thread may have an old reference to the 
> partitioner when a query cancellation is done. Since it has an old reference 
> the current partitioner can may not be cleared and a memory leak may occur.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6124) testCountDownLatch can be null in PartitionerDecorator depending on user's injection controls config

2018-01-31 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6124:
--
Affects Version/s: (was: 1.12.0)
   1.13.0

> testCountDownLatch can be null in PartitionerDecorator depending on user's 
> injection controls config
> 
>
> Key: DRILL-6124
> URL: https://issues.apache.org/jira/browse/DRILL-6124
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Minor
> Fix For: 1.13.0
>
>
> In PartitionerDecorator we get a latch from the injector with the following 
> code.
> testCountDownLatch = injector.getLatch(context.getExecutionControls(), 
> "partitioner-sender-latch");
> However, if there is no injection site defined in the user's drill 
> configuration then testCountDownLatch will be null. So we have to check if it 
> is null in order to avoid NPE's



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized

2018-01-31 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6125:
--
Affects Version/s: 1.13.0

> PartitionSenderRootExec can leak memory because close method is not 
> synchronized
> 
>
> Key: DRILL-6125
> URL: https://issues.apache.org/jira/browse/DRILL-6125
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Minor
>
> PartitionSenderRootExec creates a PartitionerDecorator and saves it in the 
> *partitioner* field. The creation of the partitioner happens in the 
> createPartitioner method. This method get's called by the main fragment 
> thread. The partitioner field is accessed by the fragment thread during 
> normal execution but it can also be accessed by the receivingFragmentFinished 
> method which is a callback executed by the event processor thread. Because 
> multiple threads can access the partitioner field synchronization is done on 
> creation and on when receivingFragmentFinished. However, the close method can 
> also be called by the event processor thread, and the close method does not 
> synchronize before accessing the partitioner field. Since synchronization is 
> not done the event processor thread may have an old reference to the 
> partitioner when a query cancellation is done. Since it has an old reference 
> the current partitioner can may not be cleared and a memory leak may occur.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized

2018-01-31 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6125:
--
Reviewer: Arina Ielchiieva

> PartitionSenderRootExec can leak memory because close method is not 
> synchronized
> 
>
> Key: DRILL-6125
> URL: https://issues.apache.org/jira/browse/DRILL-6125
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> PartitionSenderRootExec creates a PartitionerDecorator and saves it in the 
> *partitioner* field. The creation of the partitioner happens in the 
> createPartitioner method. This method get's called by the main fragment 
> thread. The partitioner field is accessed by the fragment thread during 
> normal execution but it can also be accessed by the receivingFragmentFinished 
> method which is a callback executed by the event processor thread. Because 
> multiple threads can access the partitioner field synchronization is done on 
> creation and on when receivingFragmentFinished. However, the close method can 
> also be called by the event processor thread, and the close method does not 
> synchronize before accessing the partitioner field. Since synchronization is 
> not done the event processor thread may have an old reference to the 
> partitioner when a query cancellation is done. Since it has an old reference 
> the current partitioner can may not be cleared and a memory leak may occur.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized

2018-01-31 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6125:
--
Priority: Minor  (was: Major)

> PartitionSenderRootExec can leak memory because close method is not 
> synchronized
> 
>
> Key: DRILL-6125
> URL: https://issues.apache.org/jira/browse/DRILL-6125
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Minor
>
> PartitionSenderRootExec creates a PartitionerDecorator and saves it in the 
> *partitioner* field. The creation of the partitioner happens in the 
> createPartitioner method. This method get's called by the main fragment 
> thread. The partitioner field is accessed by the fragment thread during 
> normal execution but it can also be accessed by the receivingFragmentFinished 
> method which is a callback executed by the event processor thread. Because 
> multiple threads can access the partitioner field synchronization is done on 
> creation and on when receivingFragmentFinished. However, the close method can 
> also be called by the event processor thread, and the close method does not 
> synchronize before accessing the partitioner field. Since synchronization is 
> not done the event processor thread may have an old reference to the 
> partitioner when a query cancellation is done. Since it has an old reference 
> the current partitioner can may not be cleared and a memory leak may occur.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347395#comment-16347395
 ] 

ASF GitHub Bot commented on DRILL-6125:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1105
  
@sachouche @arina-ielchiieva 


> PartitionSenderRootExec can leak memory because close method is not 
> synchronized
> 
>
> Key: DRILL-6125
> URL: https://issues.apache.org/jira/browse/DRILL-6125
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> PartitionSenderRootExec creates a PartitionerDecorator and saves it in the 
> *partitioner* field. The creation of the partitioner happens in the 
> createPartitioner method. This method get's called by the main fragment 
> thread. The partitioner field is accessed by the fragment thread during 
> normal execution but it can also be accessed by the receivingFragmentFinished 
> method which is a callback executed by the event processor thread. Because 
> multiple threads can access the partitioner field synchronization is done on 
> creation and on when receivingFragmentFinished. However, the close method can 
> also be called by the event processor thread, and the close method does not 
> synchronize before accessing the partitioner field. Since synchronization is 
> not done the event processor thread may have an old reference to the 
> partitioner when a query cancellation is done. Since it has an old reference 
> the current partitioner can may not be cleared and a memory leak may occur.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347393#comment-16347393
 ] 

ASF GitHub Bot commented on DRILL-6125:
---

GitHub user ilooner opened a pull request:

https://github.com/apache/drill/pull/1105

DRILL-6125: Fix possible memory leak when query is cancelled.

A detailed description of the problem and solution can be found here: 

https://issues.apache.org/jira/browse/DRILL-6125

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner/drill DRILL-6125

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1105.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1105


commit 1d1725a276c058e8c09e456963bac928d1f062ed
Author: Timothy Farkas 
Date:   2018-01-30T23:55:41Z

DRILL-6125: Fix possible memory leak when query is cancelled.




> PartitionSenderRootExec can leak memory because close method is not 
> synchronized
> 
>
> Key: DRILL-6125
> URL: https://issues.apache.org/jira/browse/DRILL-6125
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> PartitionSenderRootExec creates a PartitionerDecorator and saves it in the 
> *partitioner* field. The creation of the partitioner happens in the 
> createPartitioner method. This method get's called by the main fragment 
> thread. The partitioner field is accessed by the fragment thread during 
> normal execution but it can also be accessed by the receivingFragmentFinished 
> method which is a callback executed by the event processor thread. Because 
> multiple threads can access the partitioner field synchronization is done on 
> creation and on when receivingFragmentFinished. However, the close method can 
> also be called by the event processor thread, and the close method does not 
> synchronize before accessing the partitioner field. Since synchronization is 
> not done the event processor thread may have an old reference to the 
> partitioner when a query cancellation is done. Since it has an old reference 
> the current partitioner can may not be cleared and a memory leak may occur.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6106) Use valueOf method instead of constructor since valueOf has a higher performance by caching frequently requested values.

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347366#comment-16347366
 ] 

ASF GitHub Bot commented on DRILL-6106:
---

Github user reudismam commented on the issue:

https://github.com/apache/drill/pull/1099
  
Only pass Travis CI by removing the edits to SSLConfigClient.java


> Use valueOf method instead of constructor since valueOf has a higher 
> performance by caching frequently requested values.
> 
>
> Key: DRILL-6106
> URL: https://issues.apache.org/jira/browse/DRILL-6106
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Reudismam Rolim de Sousa
>Assignee: Reudismam Rolim de Sousa
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Use valueOf method instead of constructor since valueOf has a higher 
> performance by caching frequently requested values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347292#comment-16347292
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165137630
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java
 ---
@@ -232,9 +251,8 @@ else if (width > 0) {
 }
   }
 
-  public static final int MAX_VECTOR_SIZE = ValueVector.MAX_BUFFER_SIZE; 
// 16 MiB
-
   private List columnSizes = new ArrayList<>();
+  private Map columnSizeMap = 
CaseInsensitiveMap.newHashMap();
--- End diff --

Thanks for the explanation here and on the dev list @paul-rogers. 


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347288#comment-16347288
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165136635
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -397,11 +384,9 @@ private void delayedSetup() {
 }
 numPartitions = BaseAllocator.nextPowerOfTwo(numPartitions); // in 
case not a power of 2
 
-if ( schema == null ) { estValuesBatchSize = estOutgoingAllocSize = 
estMaxBatchSize = 0; } // incoming was an empty batch
--- End diff --

All the unit and functional tests passed without an NPE. The null check was 
redundant because the code in **doWork** that calls **delayedSetup** sets the 
schema if it is null.

```
  // This would be called only once - first time actual data arrives on 
incoming
  if ( schema == null && incoming.getRecordCount() > 0 ) {
this.schema = incoming.getSchema();
currentBatchRecordCount = incoming.getRecordCount(); // initialize 
for first non empty batch
// Calculate the number of partitions based on actual incoming data
delayedSetup();
  }
```

So schema will never be null when delayed setup is called


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6127) NullPointerException happens when submitting physical plan to the Hive storage plugin

2018-01-31 Thread Anton Gozhiy (JIRA)
Anton Gozhiy created DRILL-6127:
---

 Summary: NullPointerException happens when submitting physical 
plan to the Hive storage plugin
 Key: DRILL-6127
 URL: https://issues.apache.org/jira/browse/DRILL-6127
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.13.0
Reporter: Anton Gozhiy


*Prerequisites:*
*1.* Create some test table in Hive:
{code:sql}
create external table if not exists hive_storage.test (key string, value 
string) stored as parquet
location '/hive_storage/test';
insert into table test values ("key", "value");
{code}
*2.* Hive plugin config:

{code:json}
{
  "type": "hive",
  "enabled": true,
  "configProps": {
"hive.metastore.uris": "thrift://localhost:9083",
"fs.default.name": "maprfs:///",
"hive.metastore.sasl.enabled": "false"
  }
}
{code}

*Steps:*
*1.* From the Drill web UI, run the following query:
{code:sql}
explain plan for select * from hive.hive_storage.`test`
{code}

*2.* Copy the json part of the plan
*3.* On the Query page set checkbox to the PHYSICAL
*4.* Submit the copied plan  

*Expected result:*
Drill should return normal result: "key", "value"

*Actual result:*
NPE happens:
{noformat}
[Error Id: 8b45c27e-bddd-4552-b7ea-e5af6f40866a on node1:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
NullPointerException


[Error Id: 8b45c27e-bddd-4552-b7ea-e5af6f40866a on node1:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:761)
 [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.checkCommonStates(QueryStateProcessor.java:327)
 [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.planning(QueryStateProcessor.java:223)
 [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:83)
 [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:279) 
[drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_161]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
Caused by: org.apache.drill.exec.work.foreman.ForemanSetupException: Failure 
while parsing physical plan.
at 
org.apache.drill.exec.work.foreman.Foreman.parseAndRunPhysicalPlan(Foreman.java:393)
 [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:257) 
[drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
... 3 common frames omitted
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Instantiation 
of [simple type, class org.apache.drill.exec.store.hive.HiveScan] value failed 
(java.lang.NullPointerException): null
 at [Source: { "head" : { "version" : 1, "generator" : { "type" : 
"ExplainHandler", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "options" : 
[ ], "queue" : 0, "hasResourcePlan" : false, "resultMode" : "EXEC" }, "graph" : 
[ { "pop" : "hive-scan", "@id" : 2, "userName" : "mapr", "hive-table" : { 
"table" : { "tableName" : "test", "dbName" : "hive_storage", "owner" : "mapr", 
"createTime" : 1517417959, "lastAccessTime" : 0, "retention" : 0, "sd" : { 
"location" : "maprfs:/hive_storage/test", "inputFormat" : 
"org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat", "outputFormat" 
: "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat", 
"compressed" : false, "numBuckets" : -1, "serDeInfo" : { "name" : null, 
"serializationLib" : 
"org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe", "parameters" : { 
"serialization.format" : "1" } }, "sortCols" : [ ], "parameters" : { } }, 
"partitionKeys" : [ ], "parameters" : { "totalSize" : "0", "EXTERNAL" : "TRUE", 
"numRows" : "1", "rawDataSize" : "2", "COLUMN_STATS_ACCURATE" : "true", 
"numFiles" : "0", "transient_lastDdlTime" : "1517418363" }, "viewOriginalText" 
: null, "viewExpandedText" : null, "tableType" : "EXTERNAL_TABLE", 
"columnsCache" : { "keys" : [ [ { "name" : "key", "type" : "string", "comment" 
: null }, { "name" : "value", "type" : "string", "comment" : null } ] ] } }, 
"partitions" : null }, "columns" : [ "`key`", "`value`" ], "cost" : 0.0 }, { 
"pop" : "project", "@id" : 1, "exprs" : [ { "ref" : "`key`", "expr" : "`key`" 
}, { "ref" : "`value`", "expr" : "`value`" } ], "child" : 2, "outputProj" : 
true, "initialAllocation" : 100, "maxAllocation" : 100, 

[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347279#comment-16347279
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165135291
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -215,6 +206,7 @@ public BatchHolder() {
   MaterializedField outputField = materializedValueFields[i];
   // Create a type-specific ValueVector for this value
   vector = TypeHelper.getNewVector(outputField, allocator);
+  int columnSize = new RecordBatchSizer.ColumnSize(vector).estSize;
--- End diff --

@ppadma I thought estSize represented the estimated column width. For 
FixedWidth vectors we know the exact column width, so why can't we use the 
exact value? Also why are there two different things for measuring column 
sizes, when do you use RecordBatchSizer and when do you use TypeHelper? 


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6106) Use valueOf method instead of constructor since valueOf has a higher performance by caching frequently requested values.

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347178#comment-16347178
 ] 

ASF GitHub Bot commented on DRILL-6106:
---

Github user reudismam commented on the issue:

https://github.com/apache/drill/pull/1099
  
I have squashed the commits, but I’m getting an error in Travis CI similar 
to the previous one when I reverted some changes.
Column a-offsets of type UInt4Vector: Offset (0) must be 0 but was 1



> Use valueOf method instead of constructor since valueOf has a higher 
> performance by caching frequently requested values.
> 
>
> Key: DRILL-6106
> URL: https://issues.apache.org/jira/browse/DRILL-6106
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Reudismam Rolim de Sousa
>Assignee: Reudismam Rolim de Sousa
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Use valueOf method instead of constructor since valueOf has a higher 
> performance by caching frequently requested values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347010#comment-16347010
 ] 

ASF GitHub Bot commented on DRILL-5377:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/916
  
@arina-ielchiieva You are right.
According to SQL spec after resolving 
[CALCITE-2055](https://issues.apache.org/jira/browse/CALCITE-2055) and 
Drill-Calcite upgrade Drill and Calcite don't support five digit years. 
Please find more details in jira description.


> Five-digit year dates are displayed incorrectly via jdbc
> 
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.13.0
>
>
> git.commit.id.abbrev=38ef562
> The issue is connected to displaying five-digit year dates via jdbc
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}
> Or a simpler case:
> {code}
> 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from 
> (VALUES(1));
> +--+
> | FUTURE_DATE  |
> +--+
> | 356-02-16   |
> +--+
> 1 row selected (0.293 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347011#comment-16347011
 ] 

ASF GitHub Bot commented on DRILL-5377:
---

Github user vdiravka closed the pull request at:

https://github.com/apache/drill/pull/916


> Five-digit year dates are displayed incorrectly via jdbc
> 
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.13.0
>
>
> git.commit.id.abbrev=38ef562
> The issue is connected to displaying five-digit year dates via jdbc
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}
> Or a simpler case:
> {code}
> 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from 
> (VALUES(1));
> +--+
> | FUTURE_DATE  |
> +--+
> | 356-02-16   |
> +--+
> 1 row selected (0.293 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc

2018-01-31 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka resolved DRILL-5377.

Resolution: Not A Problem

[~vvysotskyi] Thank you.
So for now test cases from jira description will fail with:
{code}
java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: Year out of 
range: [11356]
{code}
This is an expected exception. Nothing should be fixed.

> Five-digit year dates are displayed incorrectly via jdbc
> 
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.13.0
>
>
> git.commit.id.abbrev=38ef562
> The issue is connected to displaying five-digit year dates via jdbc
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}
> Or a simpler case:
> {code}
> 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from 
> (VALUES(1));
> +--+
> | FUTURE_DATE  |
> +--+
> | 356-02-16   |
> +--+
> 1 row selected (0.293 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6111) NullPointerException with Kafka Storage Plugin

2018-01-31 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346983#comment-16346983
 ] 

Arina Ielchiieva commented on DRILL-6111:
-

[~akumarb2010] & [~kam_iitkgp] could you please take a look?

> NullPointerException with Kafka Storage Plugin
> --
>
> Key: DRILL-6111
> URL: https://issues.apache.org/jira/browse/DRILL-6111
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.12.0
>Reporter: Jared Stehler
>Priority: Major
>
> I'm unable to query using the kafka storage plugin; queries are failing with 
> a NPE which *seems* like a json typo:
> {code:java}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 1:2
> [Error Id: 49d5f72f-0187-480b-8b29-6eeeb5adc88f on 10.80.53.16:31820]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
>  ~[drill-common-1.12.0.jar:1.12.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:298)
>  [drill-java-exec-1.12.0.jar:1.12.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.12.0.jar:1.12.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267)
>  [drill-java-exec-1.12.0.jar:1.12.0]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.12.0.jar:1.12.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_131]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_131]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
> Caused by: com.fasterxml.jackson.databind.JsonMappingException: Instantiation 
> of [simple type, class org.apache.drill.exec.store.kafka.KafkaSubScan] value 
> failed (java.lang.NullPointerException): null
> at [Source: {
> "pop" : "single-sender",
> "@id" : 0,
> "receiver-major-fragment" : 0,
> "receiver-minor-fragment" : 0,
> "child" : {
> "pop" : "selection-vector-remover",
> "@id" : 1,
> "child" : {
> "pop" : "limit",
> "@id" : 2,
> "child" : {
> "pop" : "kafka-partition-scan",
> "@id" : 3,
> "userName" : "",
> "columns" : [ "`*`" ],
> "partitionSubScanSpecList" : [ {
> "topicName" : "ingest-prime",
> "partitionId" : 5,
> "startOffset" : 8824294,
> "endOffset" : 8874172
> }, {
> "topicName" : "ingest-prime",
> "partitionId" : 1,
> "startOffset" : 8826346,
> "endOffset" : 8874623
> }, {
> "topicName" : "ingest-prime",
> "partitionId" : 6,
> "startOffset" : 8824744,
> "endOffset" : 8874617
> } ],
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "KafkaStoragePluginConfig" : {
> "type" : "kafka",
> "kafkaConsumerProps" : {
> "key.deserializer" : 
> "org.apache.kafka.common.serialization.ByteArrayDeserializer",
> "auto.offset.reset" : "earliest",
> "bootstrap.servers" : 
> "kafkas.dev3.master.us-west-2.prod.aws.intellify.io:9092",
> "enable.auto.commit" : "true",
> "group.id" : "drill-query-consumer-1",
> "value.deserializer" : 
> "org.apache.kafka.common.serialization.ByteArrayDeserializer",
> "session.timeout.ms" : "3"
> },
> "enabled" : true
> },
> "cost" : 0.0
> },
> "first" : 0,
> "last" : 2,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 2.0
> },
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 2.0
> },
> "destination" : "CgsxMC44MC41My4xNhDM+AEYzfgBIM74ATIGMS4xMi4wOAA=",
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 2.0
> }; line: 49, column: 7] (through reference chain: 
> org.apache.drill.exec.physical.config.SingleSender["child"]->org.apache.drill.exec.physical.config.SelectionVectorRemover["child"]->org.apache.drill.exec.physical.config.Limit["child"])
> at 
> com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:263)
>  ~[jackson-databind-2.7.9.1.jar:2.7.9.1]
> at 
> com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.wrapAsJsonMappingException(StdValueInstantiator.java:453)
>  ~[jackson-databind-2.7.9.1.jar:2.7.9.1]
> at 
> com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.rewrapCtorProblem(StdValueInstantiator.java:472)
>  ~[jackson-databind-2.7.9.1.jar:2.7.9.1]
> at 
> com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:258)
>  ~[jackson-databind-2.7.9.1.jar:2.7.9.1]
> at 
> com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:135)
>  ~[jackson-databind-2.7.9.1.jar:2.7.9.1]
> at 
> com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:444)
>  

[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc

2018-01-31 Thread Volodymyr Vysotskyi (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346944#comment-16346944
 ] 

Volodymyr Vysotskyi commented on DRILL-5377:


After the changes made in CALCITE-1690, date string should strictly match 
pattern
{noformat}
[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]
{noformat}
In CALCITE-2055 was added a check for ranges of date elements.

More details connected with SQL spec. may be found in {{6.1 }}

> Five-digit year dates are displayed incorrectly via jdbc
> 
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.13.0
>
>
> git.commit.id.abbrev=38ef562
> The issue is connected to displaying five-digit year dates via jdbc
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}
> Or a simpler case:
> {code}
> 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from 
> (VALUES(1));
> +--+
> | FUTURE_DATE  |
> +--+
> | 356-02-16   |
> +--+
> 1 row selected (0.293 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc

2018-01-31 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346911#comment-16346911
 ] 

Arina Ielchiieva commented on DRILL-5377:
-

[~vitalii] after upgrade to Calcite 1.15 year with more then 4 digits is 
disallowed according to Sql standard.

[~vvysotskyi] please confirm.

> Five-digit year dates are displayed incorrectly via jdbc
> 
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.13.0
>
>
> git.commit.id.abbrev=38ef562
> The issue is connected to displaying five-digit year dates via jdbc
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}
> Or a simpler case:
> {code}
> 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from 
> (VALUES(1));
> +--+
> | FUTURE_DATE  |
> +--+
> | 356-02-16   |
> +--+
> 1 row selected (0.293 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346914#comment-16346914
 ] 

ASF GitHub Bot commented on DRILL-5377:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/916
  
It seems that this PR is not relevant after Calcite upgrade.
@vdiravka please confirm and close PR.


> Five-digit year dates are displayed incorrectly via jdbc
> 
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.13.0
>
>
> git.commit.id.abbrev=38ef562
> The issue is connected to displaying five-digit year dates via jdbc
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}
> Or a simpler case:
> {code}
> 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from 
> (VALUES(1));
> +--+
> | FUTURE_DATE  |
> +--+
> | 356-02-16   |
> +--+
> 1 row selected (0.293 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6118) Handle item star columns during project / filter push down and directory pruning

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346884#comment-16346884
 ] 

ASF GitHub Bot commented on DRILL-6118:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1104
  
@chunhui-shi please review.


> Handle item star columns during project  /  filter push down and directory 
> pruning
> --
>
> Key: DRILL-6118
> URL: https://issues.apache.org/jira/browse/DRILL-6118
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Project push down, filter push down and partition pruning does not work with 
> dynamically expanded column with is represented as star in ITEM operator: 
> _ITEM($0, 'column_name')_ where $0 is a star.
>  This often occurs when view, sub-select or cte with star is issued.
>  To solve this issue we can create {{DrillFilterItemStarReWriterRule}} which 
> will rewrite such ITEM operator before filter push down and directory 
> pruning. For project into scan push down logic will be handled separately in 
> already existing rule {{DrillPushProjectIntoScanRule}}. Basically, we can 
> consider the following queries the same: 
>  {{select col1 from t}}
>  {{select col1 from (select * from t)}}
> *Use cases*
> Since item star columns where not considered during project / filter push 
> down and directory pruning, push down and pruning did not happen. This was 
> causing Drill to read all columns from file (when only several are needed) or 
> ready all files instead. Views with star query is the most common example. 
> Such behavior significantly degrades performance for item star queries 
> comparing to queries without item star.
> *EXAMPLES*
> *Data set* 
> will create table with three files each in dedicated sub-folder:
> {noformat}
> use dfs.tmp;
> create table `order_ctas/t1` as select cast(o_orderdate as date) as 
> o_orderdate from cp.`tpch/orders.parquet` where o_orderdate between date 
> '1992-01-01' and date '1992-01-03';
> create table `order_ctas/t2` as select cast(o_orderdate as date) as 
> o_orderdate from cp.`tpch/orders.parquet` where o_orderdate between date 
> '1992-01-04' and date '1992-01-06';
> create table `order_ctas/t3` as select cast(o_orderdate as date) as 
> o_orderdate from cp.`tpch/orders.parquet` where o_orderdate between date 
> '1992-01-07' and date '1992-01-09';
> {noformat}
> *Filter push down*
> {{select * from order_ctas where o_orderdate = date '1992-01-01'}} will read 
> only one file
> {noformat}
> 00-00Screen
> 00-01  Project(**=[$0])
> 00-02Project(T1¦¦**=[$0])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($1, 1992-01-01)])
> 00-05  Project(T1¦¦**=[$0], o_orderdate=[$1])
> 00-06Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=/tmp/order_ctas/t1/0_0_0.parquet]], 
> selectionRoot=/tmp/order_ctas, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`**`]]])
> {noformat}
> {{select * from (select * from order_ctas) where o_orderdate = date 
> '1992-01-01'}} will ready all three files
> {noformat}
> 00-00Screen
> 00-01  Project(**=[$0])
> 00-02SelectionVectorRemover
> 00-03  Filter(condition=[=(ITEM($0, 'o_orderdate'), 1992-01-01)])
> 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=/tmp/order_ctas/t1/0_0_0.parquet], ReadEntryWithPath 
> [path=/tmp/order_ctas/t2/0_0_0.parquet], ReadEntryWithPath 
> [path=/tmp/order_ctas/t3/0_0_0.parquet]], selectionRoot=/tmp/order_ctas, 
> numFiles=3, numRowGroups=3, usedMetadataFile=false, columns=[`**`]]])
> {noformat}
> *Directory pruning*
> {{select * from order_ctas where dir0 = 't1'}} will read data only from one 
> folder
> {noformat}
> 00-00Screen
> 00-01  Project(**=[$0])
> 00-02Project(**=[$0])
> 00-03  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=/tmp/order_ctas/t1/0_0_0.parquet]], selectionRoot=/tmporder_ctas, 
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`]]])
> {noformat}
> {{select * from (select * from order_ctas) where dir0 = 't1'}} will read 
> content of all three folders
> {noformat}
> 00-00Screen
> 00-01  Project(**=[$0])
> 00-02SelectionVectorRemover
> 00-03  Filter(condition=[=(ITEM($0, 'dir0'), 't1')])
> 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=/tmp/order_ctas/t1/0_0_0.parquet], ReadEntryWithPath 
> [path=/tmp/order_ctas/t2/0_0_0.parquet], ReadEntryWithPath 
> 

[jira] [Commented] (DRILL-6118) Handle item star columns during project / filter push down and directory pruning

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346883#comment-16346883
 ] 

ASF GitHub Bot commented on DRILL-6118:
---

GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/1104

DRILL-6118: Handle item star columns during project / filter push dow…

…n and directory pruning

1. Added DrillFilterItemStarReWriterRule to re-write item star fields to 
regular field references.
2. Refactored DrillPushProjectIntoScanRule to handle item star fields, 
factored out helper classes and methods from PreUitl.class.
3. Fixed issue with dynamic star usage (after Calcite upgrade old usage of 
star was still present, replaced WILDCARD -> DYNAMIC_STAR  for clarity).
4. Added unit tests to check project / filter push down and directory 
pruning with item star.

Details in [DRILL-6118](https://issues.apache.org/jira/browse/DRILL-6118).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-6118

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1104.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1104


commit 4673bfb593ca6422d58fa9e0e6eb281a69f1ed69
Author: Arina Ielchiieva 
Date:   2017-12-21T17:31:00Z

DRILL-6118: Handle item star columns during project / filter push down and 
directory pruning

1. Added DrillFilterItemStarReWriterRule to re-write item star fields to 
regular field references.
2. Refactored DrillPushProjectIntoScanRule to handle item star fields, 
factored out helper classes and methods from PreUitl.class.
3. Fixed issue with dynamic star usage (after Calcite upgrade old usage of 
star was still present, replaced WILDCARD -> DYNAMIC_STAR  for clarity).
4. Added unit tests to check project / filter push down and directory 
pruning with item star.




> Handle item star columns during project  /  filter push down and directory 
> pruning
> --
>
> Key: DRILL-6118
> URL: https://issues.apache.org/jira/browse/DRILL-6118
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Project push down, filter push down and partition pruning does not work with 
> dynamically expanded column with is represented as star in ITEM operator: 
> _ITEM($0, 'column_name')_ where $0 is a star.
>  This often occurs when view, sub-select or cte with star is issued.
>  To solve this issue we can create {{DrillFilterItemStarReWriterRule}} which 
> will rewrite such ITEM operator before filter push down and directory 
> pruning. For project into scan push down logic will be handled separately in 
> already existing rule {{DrillPushProjectIntoScanRule}}. Basically, we can 
> consider the following queries the same: 
>  {{select col1 from t}}
>  {{select col1 from (select * from t)}}
> *Use cases*
> Since item star columns where not considered during project / filter push 
> down and directory pruning, push down and pruning did not happen. This was 
> causing Drill to read all columns from file (when only several are needed) or 
> ready all files instead. Views with star query is the most common example. 
> Such behavior significantly degrades performance for item star queries 
> comparing to queries without item star.
> *EXAMPLES*
> *Data set* 
> will create table with three files each in dedicated sub-folder:
> {noformat}
> use dfs.tmp;
> create table `order_ctas/t1` as select cast(o_orderdate as date) as 
> o_orderdate from cp.`tpch/orders.parquet` where o_orderdate between date 
> '1992-01-01' and date '1992-01-03';
> create table `order_ctas/t2` as select cast(o_orderdate as date) as 
> o_orderdate from cp.`tpch/orders.parquet` where o_orderdate between date 
> '1992-01-04' and date '1992-01-06';
> create table `order_ctas/t3` as select cast(o_orderdate as date) as 
> o_orderdate from cp.`tpch/orders.parquet` where o_orderdate between date 
> '1992-01-07' and date '1992-01-09';
> {noformat}
> *Filter push down*
> {{select * from order_ctas where o_orderdate = date '1992-01-01'}} will read 
> only one file
> {noformat}
> 00-00Screen
> 00-01  Project(**=[$0])
> 00-02Project(T1¦¦**=[$0])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($1, 1992-01-01)])
> 00-05  Project(T1¦¦**=[$0], o_orderdate=[$1])
> 00-06Scan(groupscan=[ParquetGroupScan 
> 

[jira] [Updated] (DRILL-5978) Upgrade drill-hive library version to 2.1 or newer.

2018-01-31 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5978:

Labels: doc-impacting  (was: )

> Upgrade drill-hive library version to 2.1 or newer.
> ---
>
> Key: DRILL-5978
> URL: https://issues.apache.org/jira/browse/DRILL-5978
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Currently Drill uses [Hive version 1.2.1 
> libraries|https://github.com/apache/drill/blob/master/pom.xml#L53] to perform 
> queries on Hive. This version of library can be used for Hive1.x versions and 
> Hive2.x versions too, but some features of Hive2.x are broken (for example 
> using of ORC transactional tables). To fix that it will be good to update 
> drill-hive library version to 2.1 or newer. 
> Tasks which should be done:
> - resolving dependency conflicts;
> - investigating backward compatibility of newer drill-hive library with older 
> Hive versions (1.x);
> - updating drill-hive version for 
> [MapR|https://github.com/apache/drill/blob/master/pom.xml#L1777] profile too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-31 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4185:

Issue Type: Improvement  (was: Bug)

> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.13.0
>
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}
> *Fix overview:*
> After resolving the current issue Drill can query an empty directory. It is a 
> schemaless Drill table for now. 
> User can query empty directory and use it for queries with any JOIN and UNION 
> (UNION ALL) operators.
> Empty directory with parquet metadata cache files is schemaless Drill table 
> as well. 
> It works similar to empty files:
> - The query with star will return empty result. 
> - If some fields are indicated in select statement, that fields will be 
> returned as INT-OPTIONAL types. 
> - The empty directory in the query with UNION operator will not change the 
> result as if the statement with UNION is absent in the query.
> -  The query with joins will return an empty result except the cases of using 
> outer join clauses, when the outer table for "right join" or derived table 
> for "left join" has a data. In that case the data from a non-empty table is 
> returned.
> - The empty directory table can be used in complex queries.
> *Code changes:*
> Internally empty directory interprets as DynamicDrillTable with null 
> selection. SchemalessScan, SchemalessBatchCreator and SchemalessBatch are 
> introduced and used on execution state for interactions with other operators 
> and batches.
> If empty directory contain parquet metadata cache files, the ParquetGroupScan 
> for such table is not valid and SchemalessScan is used instead of that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-31 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4185:

Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.13.0
>
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}
> *Fix overview:*
> After resolving the current issue Drill can query an empty directory. It is a 
> schemaless Drill table for now. 
> User can query empty directory and use it for queries with any JOIN and UNION 
> (UNION ALL) operators.
> Empty directory with parquet metadata cache files is schemaless Drill table 
> as well. 
> It works similar to empty files:
> - The query with star will return empty result. 
> - If some fields are indicated in select statement, that fields will be 
> returned as INT-OPTIONAL types. 
> - The empty directory in the query with UNION operator will not change the 
> result as if the statement with UNION is absent in the query.
> -  The query with joins will return an empty result except the cases of using 
> outer join clauses, when the outer table for "right join" or derived table 
> for "left join" has a data. In that case the data from a non-empty table is 
> returned.
> - The empty directory table can be used in complex queries.
> *Code changes:*
> Internally empty directory interprets as DynamicDrillTable with null 
> selection. SchemalessScan, SchemalessBatchCreator and SchemalessBatch are 
> introduced and used on execution state for interactions with other operators 
> and batches.
> If empty directory contain parquet metadata cache files, the ParquetGroupScan 
> for such table is not valid and SchemalessScan is used instead of that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-31 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4185:

Fix Version/s: 1.13.0

> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.13.0
>
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}
> *Fix overview:*
> After resolving the current issue Drill can query an empty directory. It is a 
> schemaless Drill table for now. 
> User can query empty directory and use it for queries with any JOIN and UNION 
> (UNION ALL) operators.
> Empty directory with parquet metadata cache files is schemaless Drill table 
> as well. 
> It works similar to empty files:
> - The query with star will return empty result. 
> - If some fields are indicated in select statement, that fields will be 
> returned as INT-OPTIONAL types. 
> - The empty directory in the query with UNION operator will not change the 
> result as if the statement with UNION is absent in the query.
> -  The query with joins will return an empty result except the cases of using 
> outer join clauses, when the outer table for "right join" or derived table 
> for "left join" has a data. In that case the data from a non-empty table is 
> returned.
> - The empty directory table can be used in complex queries.
> *Code changes:*
> Internally empty directory interprets as DynamicDrillTable with null 
> selection. SchemalessScan, SchemalessBatchCreator and SchemalessBatch are 
> introduced and used on execution state for interactions with other operators 
> and batches.
> If empty directory contain parquet metadata cache files, the ParquetGroupScan 
> for such table is not valid and SchemalessScan is used instead of that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346662#comment-16346662
 ] 

ASF GitHub Bot commented on DRILL-4185:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1083
  
+1, LGTM. Thanks for making the changes.


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}
> *Fix overview:*
> After resolving the current issue Drill can query an empty directory. It is a 
> schemaless Drill table for now. 
> User can query empty directory and use it for queries with any JOIN and UNION 
> (UNION ALL) operators.
> Empty directory with parquet metadata cache files is schemaless Drill table 
> as well. 
> It works similar to empty files:
> - The query with star will return empty result. 
> - If some fields are indicated in select statement, that fields will be 
> returned as INT-OPTIONAL types. 
> - The empty directory in the query with UNION operator will not change the 
> result as if the statement with UNION is absent in the query.
> -  The query with joins will return an empty result except the cases of using 
> outer join clauses, when the outer table for "right join" or derived table 
> for "left join" has a data. In that case the data from a non-empty table is 
> returned.
> - The empty directory table can be used in complex queries.
> *Code changes:*
> Internally empty directory interprets as DynamicDrillTable with null 
> selection. SchemalessScan, SchemalessBatchCreator and SchemalessBatch are 
> introduced and used on execution state for interactions with other operators 
> and batches.
> If empty directory contain parquet metadata cache files, the ParquetGroupScan 
> for such table is not valid and SchemalessScan is used instead of that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346641#comment-16346641
 ] 

ASF GitHub Bot commented on DRILL-4185:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r165023581
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestJoinNullable.java ---
@@ -568,6 +570,22 @@ public void nullMixedComparatorEqualJoinHelper(final 
String query) throws Except
 .go();
   }
 
+  /** InnerJoin with empty dir table on nullable cols, MergeJoin */
+  // TODO: the same tests should be added for HashJoin operator, DRILL-6070
+  @Test
--- End diff --

The bug was founded for NLJ and empty tables. I have resolved that issue.
The separate test class is added for empty dir tables and different join 
operators.

Also I have made refactoring for the TestHashJoinAdvanced, 
TestMergeJoinAdvanced, TestNestedLoopJoin classes.


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> 
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Khurram Faraaz
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}
> *Fix overview:*
> After resolving the current issue Drill can query an empty directory. It is a 
> schemaless Drill table for now. 
> User can query empty directory and use it for queries with any JOIN and UNION 
> (UNION ALL) operators.
> Empty directory with parquet metadata cache files is schemaless Drill table 
> as well. 
> It works similar to empty files:
> - The query with star will return empty result. 
> - If some fields are indicated in select statement, that fields will be 
> returned as INT-OPTIONAL types. 
> - The empty directory in the query with UNION operator will not change the 
> result as if the statement with UNION is absent in the query.
> -  The query with joins will return an empty result except the cases of using 
> outer join clauses, when the outer table for "right join" or derived table 
> for "left join" has a data. In that case the data from a non-empty table is 
> returned.
> - The empty directory table can be used in complex queries.
> *Code changes:*
> Internally empty directory interprets as DynamicDrillTable with null 
> selection. SchemalessScan, SchemalessBatchCreator and SchemalessBatch are 
> introduced and used on execution state for interactions with other operators 
> and batches.
> If empty directory contain parquet metadata cache files, the ParquetGroupScan 
> for such table is not valid and SchemalessScan is used instead of that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6124) testCountDownLatch can be null in PartitionerDecorator depending on user's injection controls config

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346585#comment-16346585
 ] 

ASF GitHub Bot commented on DRILL-6124:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1103
  
@ilooner it looks like if latch is not found, execution control will return 
dummy latch [1]? If I am missing something, please explain.

[1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/testing/ExecutionControls.java#L206


> testCountDownLatch can be null in PartitionerDecorator depending on user's 
> injection controls config
> 
>
> Key: DRILL-6124
> URL: https://issues.apache.org/jira/browse/DRILL-6124
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Minor
> Fix For: 1.13.0
>
>
> In PartitionerDecorator we get a latch from the injector with the following 
> code.
> testCountDownLatch = injector.getLatch(context.getExecutionControls(), 
> "partitioner-sender-latch");
> However, if there is no injection site defined in the user's drill 
> configuration then testCountDownLatch will be null. So we have to check if it 
> is null in order to avoid NPE's



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6106) Use valueOf method instead of constructor since valueOf has a higher performance by caching frequently requested values.

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346581#comment-16346581
 ] 

ASF GitHub Bot commented on DRILL-6106:
---

Github user reudismam commented on the issue:

https://github.com/apache/drill/pull/1099
  
Maybe it has not worked as expected. It squashed the commit (first commit), 
but as the commit mix commits from other persons, they come together. Maybe it 
will be the case of creating a patch file for the desired commit and apply this 
patch to a new pull request. 


> Use valueOf method instead of constructor since valueOf has a higher 
> performance by caching frequently requested values.
> 
>
> Key: DRILL-6106
> URL: https://issues.apache.org/jira/browse/DRILL-6106
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Reudismam Rolim de Sousa
>Assignee: Reudismam Rolim de Sousa
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Use valueOf method instead of constructor since valueOf has a higher 
> performance by caching frequently requested values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6106) Use valueOf method instead of constructor since valueOf has a higher performance by caching frequently requested values.

2018-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346587#comment-16346587
 ] 

ASF GitHub Bot commented on DRILL-6106:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1099
  
Well, you can always use force push to override your previous changes or 
even replace your remote branch with new local.


> Use valueOf method instead of constructor since valueOf has a higher 
> performance by caching frequently requested values.
> 
>
> Key: DRILL-6106
> URL: https://issues.apache.org/jira/browse/DRILL-6106
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Reudismam Rolim de Sousa
>Assignee: Reudismam Rolim de Sousa
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Use valueOf method instead of constructor since valueOf has a higher 
> performance by caching frequently requested values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan

2018-01-31 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6099:

Reviewer: Chunhui Shi

> Drill does not push limit past project (flatten) if it cannot be pushed into 
> scan
> -
>
> Key: DRILL-6099
> URL: https://issues.apache.org/jira/browse/DRILL-6099
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It would be useful to have pushdown occur past flatten(project). Here is an 
> example to illustrate the issue:
> {{explain plan without implementation for }}{{select name, 
> flatten(categories) as category from dfs.`/tmp/t_json_20` LIMIT 1;}}
> {{DrillScreenRel}}{{  }}
> {{  DrillLimitRel(fetch=[1])}}{{    }}
> {{    DrillProjectRel(name=[$0], category=[FLATTEN($1)])}}
> {{      DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, 
> `categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}}
> = 
> Content of 0_0_0.json
> =
> {
>   "name" : "Eric Goldberg, MD",
>   "categories" : [ "Doctors", "Health & Medical" ]
> } {
>   "name" : "Pine Cone Restaurant",
>   "categories" : [ "Restaurants" ]
> } {
>   "name" : "Deforest Family Restaurant",
>   "categories" : [ "American (Traditional)", "Restaurants" ]
> } {
>   "name" : "Culver's",
>   "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", 
> "Restaurants" ]
> } {
>   "name" : "Chang Jiang Chinese Kitchen",
>   "categories" : [ "Chinese", "Restaurants" ]
> } 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6124) testCountDownLatch can be null in PartitionerDecorator depending on user's injection controls config

2018-01-31 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6124:

Affects Version/s: 1.12.0

> testCountDownLatch can be null in PartitionerDecorator depending on user's 
> injection controls config
> 
>
> Key: DRILL-6124
> URL: https://issues.apache.org/jira/browse/DRILL-6124
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Minor
> Fix For: 1.13.0
>
>
> In PartitionerDecorator we get a latch from the injector with the following 
> code.
> testCountDownLatch = injector.getLatch(context.getExecutionControls(), 
> "partitioner-sender-latch");
> However, if there is no injection site defined in the user's drill 
> configuration then testCountDownLatch will be null. So we have to check if it 
> is null in order to avoid NPE's



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6124) testCountDownLatch can be null in PartitionerDecorator depending on user's injection controls config

2018-01-31 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6124:

Fix Version/s: 1.13.0

> testCountDownLatch can be null in PartitionerDecorator depending on user's 
> injection controls config
> 
>
> Key: DRILL-6124
> URL: https://issues.apache.org/jira/browse/DRILL-6124
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Minor
> Fix For: 1.13.0
>
>
> In PartitionerDecorator we get a latch from the injector with the following 
> code.
> testCountDownLatch = injector.getLatch(context.getExecutionControls(), 
> "partitioner-sender-latch");
> However, if there is no injection site defined in the user's drill 
> configuration then testCountDownLatch will be null. So we have to check if it 
> is null in order to avoid NPE's



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)