[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466861#comment-16466861
 ] 

ASF GitHub Bot commented on DRILL-5913:
---

vvysotskyi commented on issue #1016: DRILL-5913:solve the mixed processing of 
same functions with same inputRefs but di…
URL: https://github.com/apache/drill/pull/1016#issuecomment-387284020
 
 
   @weijietong, could you please check that this bug is still reproduced on 
current master? I tried a query from the Jira description and it is finished 
successfully. I suppose it was fixed in the scope of Calcite upgrade.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DrillReduceAggregatesRule mixed the same functions of the same inputRef which 
> have different dataTypes 
> ---
>
> Key: DRILL-5913
> URL: https://issues.apache.org/jira/browse/DRILL-5913
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.9.0, 1.11.0
>Reporter: weijie.tong
>Priority: Major
>
> sample query:
> {code:java}
> select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as 
> int)) as col2 from cp.`employee.json`
> {code}
> error info:
> {code:java}
> org.apache.drill.exec.rpc.RpcException: 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> AssertionError: Type mismatch:
> rel rowtype:
> RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT 
> NULL
> equivRel rowtype:
> RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL
> [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010]
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: Internal error: Error while applying rule 
> DrillReduceAggregatesRule, args 
> [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))]
> org.apache.drill.exec.work.foreman.Foreman.run():294
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (java.lang.AssertionError) Internal error: Error while applying 
> rule DrillReduceAggregatesRule, args 
> [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))]
> org.apache.calcite.util.Util.newInternal():792
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811
> {code}
> The reason is that stddev_samp(cast(employee_id as int))  will be reduced as 
> sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be 
> reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching.  
> The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too 
> . But this sum0($0) 's data type is different from the first time's sum0($0) 
> : one is integer ,the other is bigint . But Calcite's addAggCall method treat 
> them as the same by ignoring their data type. This leads to the bigint 
> sum0($0) be replaced by the integer sum0($0).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-05-07 Thread weijie.tong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466845#comment-16466845
 ] 

weijie.tong commented on DRILL-6385:


[~amansinha100] sorry for missing your ever propose.  My work is just a 
start-up. I ever had the same plan to send the bloom filter across the exchange 
boundary but it is difficult to solve the mass RPC network between senders and 
receivers. After deep insight what the impala code has done,it gives me an 
inspire to simple the RPC exchanging mode by contacting to the foreman node. 
After all ,still appreciate your sharing and advices.

> Support JPPD (Join Predicate Push Down)
> ---
>
> Key: DRILL-6385
> URL: https://issues.apache.org/jira/browse/DRILL-6385
> Project: Apache Drill
>  Issue Type: New Feature
>  Components:  Server, Execution - Flow
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>
> This feature is to support the JPPD (Join Predicate Push Down). It will 
> benefit the HashJoin ,Broadcast HashJoin performance by reducing the number 
> of rows to send across the network ,the memory consumed. This feature is 
> already supported by Impala which calls it RuntimeFilter 
> ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]).
>  The first PR will try to push down a bloom filter of HashJoin node to 
> Parquet’s scan node.   The propose basic procedure is described as follow:
>  # The HashJoin build side accumulate the equal join condition rows to 
> construct a bloom filter. Then it sends out the bloom filter to the foreman 
> node.
>  # The foreman node accept the bloom filters passively from all the fragments 
> that has the HashJoin operator. It then aggregates the bloom filters to form 
> a global bloom filter.
>  # The foreman node broadcasts the global bloom filter to all the probe side 
> scan nodes which maybe already have send out partial data to the hash join 
> nodes(currently the hash join node will prefetch one batch from both sides ).
>       4.  The scan node accepts a global bloom filter from the foreman node. 
> It will filter the rest rows satisfying the bloom filter.
>  
> To implement above execution flow, some main new notion described as below:
>       1. RuntimeFilter
> It’s a filter container which may contain BloomFilter or MinMaxFilter.
>       2. RuntimeFilterReporter
> It wraps the logic to send hash join’s bloom filter to the foreman.The 
> serialized bloom filter will be sent out through the data tunnel.This object 
> will be instanced by the FragmentExecutor and passed to the 
> FragmentContext.So the HashJoin operator can obtain it through the 
> FragmentContext.
>      3. RuntimeFilterRequestHandler
> It is responsible to accept a SendRuntimeFilterRequest RPC to strip the 
> actual BloomFilter from the network. It then translates this filter to the 
> WorkerBee’s new interface registerRuntimeFilter.
> Another RPC type is BroadcastRuntimeFilterRequest. It will register the 
> accepted global bloom filter to the WorkerBee by the registerRuntimeFilter 
> method and then propagate to the FragmentContext through which the probe side 
> scan node can fetch the aggregated bloom filter.
>       4.RuntimeFilterManager
> The foreman will instance a RuntimeFilterManager .It will indirectly get 
> every RuntimeFilter by the WorkerBee. Once all the BloomFilters have been 
> accepted and aggregated . It will broadcast the aggregated bloom filter to 
> all the probe side scan nodes through the data tunnel by a 
> BroadcastRuntimeFilterRequest RPC.
>      5. RuntimeFilterEnableOption 
>  A global option will be added to decide whether to enable this new feature.
>  
> Welcome suggestion and advice from you.The related PR will be presented as 
> soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5270) Improve loading of profiles listing in the WebUI

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466821#comment-16466821
 ] 

ASF GitHub Bot commented on DRILL-5270:
---

ilooner commented on issue #1250: DRILL-5270: Improve loading of profiles 
listing in the WebUI
URL: https://github.com/apache/drill/pull/1250#issuecomment-387275507
 
 
   @kkhatua Why not use the Guava Cache? http://www.baeldung.com/guava-cache . 
I think it would simplify the implementation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve loading of profiles listing in the WebUI
> 
>
> Key: DRILL-5270
> URL: https://issues.apache.org/jira/browse/DRILL-5270
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently, as the number of profiles increase, we reload the same list of 
> profiles from the FS.
> An ideal improvement would be to detect if there are any new profiles and 
> only reload from the disk then. Otherwise, a cached list is sufficient.
> For a directory of 280K profiles, the load time is close to 6 seconds on a 32 
> core server. With the caching, we can get it down to as much as a few 
> milliseconds.
> To render the cache as invalid, we inspect the last modified time of the 
> directory to confirm whether a reload is needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466813#comment-16466813
 ] 

ASF GitHub Bot commented on DRILL-5913:
---

weijietong commented on issue #1016: DRILL-5913:solve the mixed processing of 
same functions with same inputRefs but di…
URL: https://github.com/apache/drill/pull/1016#issuecomment-387274177
 
 
   @KulykRoman seems you are familiar with this part of codes . Could you also 
take look at this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DrillReduceAggregatesRule mixed the same functions of the same inputRef which 
> have different dataTypes 
> ---
>
> Key: DRILL-5913
> URL: https://issues.apache.org/jira/browse/DRILL-5913
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.9.0, 1.11.0
>Reporter: weijie.tong
>Priority: Major
>
> sample query:
> {code:java}
> select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as 
> int)) as col2 from cp.`employee.json`
> {code}
> error info:
> {code:java}
> org.apache.drill.exec.rpc.RpcException: 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> AssertionError: Type mismatch:
> rel rowtype:
> RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT 
> NULL
> equivRel rowtype:
> RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL
> [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010]
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: Internal error: Error while applying rule 
> DrillReduceAggregatesRule, args 
> [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))]
> org.apache.drill.exec.work.foreman.Foreman.run():294
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (java.lang.AssertionError) Internal error: Error while applying 
> rule DrillReduceAggregatesRule, args 
> [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))]
> org.apache.calcite.util.Util.newInternal():792
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811
> {code}
> The reason is that stddev_samp(cast(employee_id as int))  will be reduced as 
> sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be 
> reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching.  
> The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too 
> . But this sum0($0) 's data type is different from the first time's sum0($0) 
> : one is integer ,the other is bigint . But Calcite's addAggCall method treat 
> them as the same by ignoring their data type. This leads to the bigint 
> sum0($0) be replaced by the integer sum0($0).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-05-07 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466808#comment-16466808
 ] 

Aman Sinha commented on DRILL-6385:
---

Sending a link to a short design overview doc [2] I had proposed during the 
Drill design hackathon [1] in September 2017.  The proposal was to send the 
bloom filter past exchange boundary rather than sending to the foreman. 
However, this is not implemented, so your contribution would be welcome.  I 
think doing the hash partitioned hash join first seems fine since that's the 
one that would benefit the most.  Looking forward to your pull request !

[[1] 
https://lists.apache.org/thread.html/74cf48dd78d323535dc942c969e72008884e51f8715f4a20f6f8fb66@%3Cdev.drill.apache.org%3E|https://lists.apache.org/thread.html/74cf48dd78d323535dc942c969e72008884e51f8715f4a20f6f8fb66@%3Cdev.drill.apache.org%3E]

[2] 
[https://docs.google.com/document/d/1cNznfv60wwuFJlbKwkVbCBNGSBlY5QbjYNgglPw8JQ0/edit?usp=sharing]

 

> Support JPPD (Join Predicate Push Down)
> ---
>
> Key: DRILL-6385
> URL: https://issues.apache.org/jira/browse/DRILL-6385
> Project: Apache Drill
>  Issue Type: New Feature
>  Components:  Server, Execution - Flow
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>
> This feature is to support the JPPD (Join Predicate Push Down). It will 
> benefit the HashJoin ,Broadcast HashJoin performance by reducing the number 
> of rows to send across the network ,the memory consumed. This feature is 
> already supported by Impala which calls it RuntimeFilter 
> ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]).
>  The first PR will try to push down a bloom filter of HashJoin node to 
> Parquet’s scan node.   The propose basic procedure is described as follow:
>  # The HashJoin build side accumulate the equal join condition rows to 
> construct a bloom filter. Then it sends out the bloom filter to the foreman 
> node.
>  # The foreman node accept the bloom filters passively from all the fragments 
> that has the HashJoin operator. It then aggregates the bloom filters to form 
> a global bloom filter.
>  # The foreman node broadcasts the global bloom filter to all the probe side 
> scan nodes which maybe already have send out partial data to the hash join 
> nodes(currently the hash join node will prefetch one batch from both sides ).
>       4.  The scan node accepts a global bloom filter from the foreman node. 
> It will filter the rest rows satisfying the bloom filter.
>  
> To implement above execution flow, some main new notion described as below:
>       1. RuntimeFilter
> It’s a filter container which may contain BloomFilter or MinMaxFilter.
>       2. RuntimeFilterReporter
> It wraps the logic to send hash join’s bloom filter to the foreman.The 
> serialized bloom filter will be sent out through the data tunnel.This object 
> will be instanced by the FragmentExecutor and passed to the 
> FragmentContext.So the HashJoin operator can obtain it through the 
> FragmentContext.
>      3. RuntimeFilterRequestHandler
> It is responsible to accept a SendRuntimeFilterRequest RPC to strip the 
> actual BloomFilter from the network. It then translates this filter to the 
> WorkerBee’s new interface registerRuntimeFilter.
> Another RPC type is BroadcastRuntimeFilterRequest. It will register the 
> accepted global bloom filter to the WorkerBee by the registerRuntimeFilter 
> method and then propagate to the FragmentContext through which the probe side 
> scan node can fetch the aggregated bloom filter.
>       4.RuntimeFilterManager
> The foreman will instance a RuntimeFilterManager .It will indirectly get 
> every RuntimeFilter by the WorkerBee. Once all the BloomFilters have been 
> accepted and aggregated . It will broadcast the aggregated bloom filter to 
> all the probe side scan nodes through the data tunnel by a 
> BroadcastRuntimeFilterRequest RPC.
>      5. RuntimeFilterEnableOption 
>  A global option will be added to decide whether to enable this new feature.
>  
> Welcome suggestion and advice from you.The related PR will be presented as 
> soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466802#comment-16466802
 ] 

ASF GitHub Bot commented on DRILL-5913:
---

weijietong commented on issue #1016: DRILL-5913:solve the mixed processing of 
same functions with same inputRefs but di…
URL: https://github.com/apache/drill/pull/1016#issuecomment-387271505
 
 
   @vvysotskyi @amansinha100  could you take a look at this PR. I ever contact 
with @julianhyde . Since Calcite treats stddev stddev_samp input parameter data 
type as their original data type,no cast behavior happens at its` 
AggregateReduceFunctionsRule` implementation.So this error will not happen at 
Calcite. So this PR changes our Drill own `DrillReduceAggregatesRule` 
implementation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> DrillReduceAggregatesRule mixed the same functions of the same inputRef which 
> have different dataTypes 
> ---
>
> Key: DRILL-5913
> URL: https://issues.apache.org/jira/browse/DRILL-5913
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.9.0, 1.11.0
>Reporter: weijie.tong
>Priority: Major
>
> sample query:
> {code:java}
> select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as 
> int)) as col2 from cp.`employee.json`
> {code}
> error info:
> {code:java}
> org.apache.drill.exec.rpc.RpcException: 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> AssertionError: Type mismatch:
> rel rowtype:
> RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT 
> NULL
> equivRel rowtype:
> RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL
> [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010]
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: Internal error: Error while applying rule 
> DrillReduceAggregatesRule, args 
> [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))]
> org.apache.drill.exec.work.foreman.Foreman.run():294
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (java.lang.AssertionError) Internal error: Error while applying 
> rule DrillReduceAggregatesRule, args 
> [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))]
> org.apache.calcite.util.Util.newInternal():792
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811
> {code}
> The reason is that stddev_samp(cast(employee_id as int))  will be reduced as 
> sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be 
> reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching.  
> The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too 
> . But this sum0($0) 's data type is different from the first time's sum0($0) 
> : one is integer ,the other is bigint . But Calcite's addAggCall method treat 
> them as the same by ignoring their data type. This leads to the bigint 
> sum0($0) be replaced by the integer sum0($0).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466725#comment-16466725
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

vrozov commented on a change in pull request #1237: DRILL-6348: Fixed code so 
that Unordered Receiver reports its memory …
URL: https://github.com/apache/drill/pull/1237#discussion_r186595478
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/MergingReceiverCreator.java
 ##
 @@ -44,6 +44,11 @@ public MergingRecordBatch getBatch(ExecutorFragmentContext 
context,
 assert bufHolder != null : "IncomingBuffers must be defined for any place 
a receiver is declared.";
 RawBatchBuffer[] buffers = 
bufHolder.getBuffers(receiver.getOppositeMajorFragmentId());
 
-return new MergingRecordBatch(context, receiver, buffers);
+MergingRecordBatch mergeReceiver = new MergingRecordBatch(context, 
receiver, buffers);
+
+// Register this operator's buffer allocator so that incoming buffers are 
owned by this allocator
+bufHolder.setOprAllocator(receiver.getOppositeMajorFragmentId(), 
mergeReceiver.getOprAllocator());
 
 Review comment:
   Consider moving registration of the buffer allocator inside 
`MerginRecordBatch` constructor (change constructor to accept 
`ExchangeFragmentContext` and `MergingReceiverPOP` only).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466726#comment-16466726
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

vrozov commented on a change in pull request #1237: DRILL-6348: Fixed code so 
that Unordered Receiver reports its memory …
URL: https://github.com/apache/drill/pull/1237#discussion_r186595821
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverCreator.java
 ##
 @@ -40,6 +40,11 @@ public UnorderedReceiverBatch 
getBatch(ExecutorFragmentContext context, Unordere
 RawBatchBuffer[] buffers = 
bufHolder.getBuffers(receiver.getOppositeMajorFragmentId());
 assert buffers.length == 1;
 RawBatchBuffer buffer = buffers[0];
-return new UnorderedReceiverBatch(context, buffer, receiver);
+UnorderedReceiverBatch receiverBatch = new UnorderedReceiverBatch(context, 
buffer, receiver);
 
 Review comment:
   The same as for `MerginRecordBatch`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466724#comment-16466724
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

vrozov commented on a change in pull request #1237: DRILL-6348: Fixed code so 
that Unordered Receiver reports its memory …
URL: https://github.com/apache/drill/pull/1237#discussion_r186597070
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/IncomingBuffers.java
 ##
 @@ -129,6 +134,10 @@ public int getRemainingRequired() {
 return collectorMap.get(senderMajorFragmentId).getBuffers();
   }
 
+  public void setOprAllocator(int senderMajorFragmentId, BufferAllocator 
oprAllocator) {
 
 Review comment:
   Consider introducing `getCollector(int senderMajorFragmentId)` instead of 
`setOprAllocator` and `getBuffers`;


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-05-07 Thread weijie.tong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466675#comment-16466675
 ] 

weijie.tong commented on DRILL-6385:


[~amansinha100] thanks for your advice. It's here just to inform the devs about 
 the implementation. I have been working on our own storage layer for a long 
time. It's a delay to implement this feature ,as I ever noticed this proposal 
at the dev list. So now discussion is also encouraged and welcome.

 

To your question , I think I have not described well in the above message. 
Since partitioned hash join is the actual main operator, my first PR is to 
support the partitioned hash join and the above description is about a 
partitioned hash join .

To the broadcast hash join case, my plan is that the build side still needs to 
send its bloom filter to the foreman. The difference is that  the foreman 
broadcasts the bloom filter as soon as it accepted a first arrived one ,no need 
to wait for all the bloom filter from all other nodes( since the distributed 
table acts as the build side table). Through this way ,we follow the same work 
flow rule though no contact to foreman has better performance.

" is a global bloom filter always needed or a local bloom filter will suffice 
in certain cases"  there's no evidence to definitely choose one strategy. To 
partitioned hash join a aggregated global bloom filter will filter more rows 
from the probe side scan .  This is also what the impala does.  Still  needs 
some heuristic statistics plan to choose whether we still need the runtime 
filter at the runtime, since the better filter scenario is the build side has 
low percentage joined rows according to its total table row numbers.

 

" does it mean that a 'global bloom filter' is a synchronization point in your 
proposal " there's no synchronization at the hash join node. To partitioned 
hash join , only the foreman needs to wait for all the bloom filter from all 
the partitioned nodes to aggregate to a global one. The hash join nodes has no 
relationship to each other ,they continue to work parallel.

 

> Support JPPD (Join Predicate Push Down)
> ---
>
> Key: DRILL-6385
> URL: https://issues.apache.org/jira/browse/DRILL-6385
> Project: Apache Drill
>  Issue Type: New Feature
>  Components:  Server, Execution - Flow
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>
> This feature is to support the JPPD (Join Predicate Push Down). It will 
> benefit the HashJoin ,Broadcast HashJoin performance by reducing the number 
> of rows to send across the network ,the memory consumed. This feature is 
> already supported by Impala which calls it RuntimeFilter 
> ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]).
>  The first PR will try to push down a bloom filter of HashJoin node to 
> Parquet’s scan node.   The propose basic procedure is described as follow:
>  # The HashJoin build side accumulate the equal join condition rows to 
> construct a bloom filter. Then it sends out the bloom filter to the foreman 
> node.
>  # The foreman node accept the bloom filters passively from all the fragments 
> that has the HashJoin operator. It then aggregates the bloom filters to form 
> a global bloom filter.
>  # The foreman node broadcasts the global bloom filter to all the probe side 
> scan nodes which maybe already have send out partial data to the hash join 
> nodes(currently the hash join node will prefetch one batch from both sides ).
>       4.  The scan node accepts a global bloom filter from the foreman node. 
> It will filter the rest rows satisfying the bloom filter.
>  
> To implement above execution flow, some main new notion described as below:
>       1. RuntimeFilter
> It’s a filter container which may contain BloomFilter or MinMaxFilter.
>       2. RuntimeFilterReporter
> It wraps the logic to send hash join’s bloom filter to the foreman.The 
> serialized bloom filter will be sent out through the data tunnel.This object 
> will be instanced by the FragmentExecutor and passed to the 
> FragmentContext.So the HashJoin operator can obtain it through the 
> FragmentContext.
>      3. RuntimeFilterRequestHandler
> It is responsible to accept a SendRuntimeFilterRequest RPC to strip the 
> actual BloomFilter from the network. It then translates this filter to the 
> WorkerBee’s new interface registerRuntimeFilter.
> Another RPC type is BroadcastRuntimeFilterRequest. It will register the 
> accepted global bloom filter to the WorkerBee by the registerRuntimeFilter 
> method and then propagate to the FragmentContext through which the probe side 
> scan node can fetch the aggregated bloom filter.
>       4.RuntimeFilterManager
> The foreman will instance a RuntimeFilterManager .It will indirectly 

[jira] [Created] (DRILL-6389) Fix Javadoc Warnings In drill-rpc, drill-memory-base, drill-logical, and drill-common

2018-05-07 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6389:
-

 Summary: Fix Javadoc Warnings In drill-rpc, drill-memory-base, 
drill-logical, and drill-common
 Key: DRILL-6389
 URL: https://issues.apache.org/jira/browse/DRILL-6389
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


There are many warnings when running 

{code}
mvn javadoc:javadoc
{code}

The goal is to eventually fix all the warnings and fail the build if there are 
any javadoc warnings or errors introduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466568#comment-16466568
 ] 

ASF GitHub Bot commented on DRILL-6348:
---

sachouche commented on issue #1237: DRILL-6348: Fixed code so that Unordered 
Receiver reports its memory …
URL: https://github.com/apache/drill/pull/1237#issuecomment-387229629
 
 
   Met with @parthchandra  and  @vrozov to discuss a more comprehensive fix:
   
   **Agreement**
   It was agreed that received batches should be owned by the associated 
receiver (not the fragment).
   This association is done at the framework level (Data Collector) so that the 
receiver doesn't have to perform any extra processing (such as explicit 
draining); this is to ensure that no side effect will occur (e.g., 
acknowledgment logic since it is sensitive to operator record consumption)
   
   **Fix**
   - Modified the Unordered & Merge receivers to register their buffer 
allocators with the associated Data Collector
   - The IncomingBuffers class now uses the operator's buffer allocator instead 
of the fragment allocator


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unordered Receiver does not report its memory usage
> ---
>
> Key: DRILL-6348
> URL: https://issues.apache.org/jira/browse/DRILL-6348
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Flow
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> The Drill Profile functionality doesn't show any memory usage for the 
> Unordered Receiver operator. This is problematic when analyzing OOM 
> conditions since we cannot account for all of a query memory usage. This Jira 
> is to fix memory reporting for the Unordered Receiver operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6386) Disallow Unused Imports In Checkstyle

2018-05-07 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6386:
--
Reviewer: Kunal Khatua

> Disallow Unused Imports In Checkstyle
> -
>
> Key: DRILL-6386
> URL: https://issues.apache.org/jira/browse/DRILL-6386
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6388) Disallow indenting with more than 2 spaces.

2018-05-07 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6388:
-

 Summary: Disallow indenting with more than 2 spaces.
 Key: DRILL-6388
 URL: https://issues.apache.org/jira/browse/DRILL-6388
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


Enforce the two space indenting style guideline as specified here: 
http://drill.apache.org/docs/apache-drill-contribution-guidelines/





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6386) Disallow Unused Imports In Checkstyle

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466543#comment-16466543
 ] 

ASF GitHub Bot commented on DRILL-6386:
---

ilooner commented on issue #1252: DRILL-6386: Disallowed unused imports and 
removed them.
URL: https://github.com/apache/drill/pull/1252#issuecomment-387224347
 
 
   @vrozov @kkhatua Please review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disallow Unused Imports In Checkstyle
> -
>
> Key: DRILL-6386
> URL: https://issues.apache.org/jira/browse/DRILL-6386
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6386) Disallow Unused Imports In Checkstyle

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466542#comment-16466542
 ] 

ASF GitHub Bot commented on DRILL-6386:
---

ilooner opened a new pull request #1252: DRILL-6386: Disallowed unused imports 
and removed them.
URL: https://github.com/apache/drill/pull/1252
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disallow Unused Imports In Checkstyle
> -
>
> Key: DRILL-6386
> URL: https://issues.apache.org/jira/browse/DRILL-6386
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6242) Output format for nested date, time, timestamp values in an object hierarchy

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466507#comment-16466507
 ] 

ASF GitHub Bot commented on DRILL-6242:
---

jiang-wu commented on issue #1247: DRILL-6242 Use 
java.time.Local{Date|Time|DateTime} for Drill Date, Time, and Timestamp types
URL: https://github.com/apache/drill/pull/1247#issuecomment-387215803
 
 
   @vvysotskyi rebased and updated the formatting to use 2 spaces.  Please take 
a look and see if things look right.  Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Output format for nested date, time, timestamp values in an object hierarchy
> 
>
> Key: DRILL-6242
> URL: https://issues.apache.org/jira/browse/DRILL-6242
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.12.0
>Reporter: Jiang Wu
>Assignee: Jiang Wu
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Some storages (mapr db, mongo db, etc.) have hierarchical objects that 
> contain nested fields of date, time, timestamp types.  When a query returns 
> these objects, the output format for the nested date, time, timestamp, are 
> showing the internal object (org.joda.time.DateTime), rather than the logical 
> data value.
> For example.  Suppose in MongoDB, we have a single object that looks like 
> this:
> {code:java}
> > db.test.findOne();
> {
> "_id" : ObjectId("5aa8487d470dd39a635a12f5"),
> "name" : "orange",
> "context" : {
> "date" : ISODate("2018-03-13T21:52:54.940Z"),
> "user" : "jack"
> }
> }
> {code}
> Then connect Drill to the above MongoDB storage, and run the following query 
> within Drill:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> ++-+ 
> | EXPR$0 | context | 
> ++-+ 
> | 2018-03-13 | 
> {"date":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,"era":1,"millisOfDay":78774940,"weekOfWeekyear":11,"weekyear":2018,"monthOfYear":3,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"millisOfSecond":940,"secondOfMinute":54,"secondOfDay":78774,"minuteOfHour":52,"minuteOfDay":1312,"hourOfDay":21,"zone":{"fixed":true,"id":"UTC"},"millis":1520977974940,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"afterNow":false,"beforeNow":true,"equalNow":false},"user":"jack"}
>  |
> {code}
> We can see that from the above output, when the date field is retrieved as a 
> top level column, Drill outputs a logical date value.  But when the same 
> field is within an object hierarchy, Drill outputs the internal object used 
> to hold the date value.
> The expected output is the same display for whether the date field is shown 
> as a top level column or when it is within an object hierarchy:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> ++-+ 
> | EXPR$0 | context | 
> ++-+ 
> | 2018-03-13 | {"date":"2018-03-13","user":"jack"} |
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6387) TestTpchDistributedConcurrent tests are ignored, they should be enabled.

2018-05-07 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6387:
-

 Summary: TestTpchDistributedConcurrent tests are ignored, they 
should be enabled.
 Key: DRILL-6387
 URL: https://issues.apache.org/jira/browse/DRILL-6387
 Project: Apache Drill
  Issue Type: Bug
Reporter: Timothy Farkas
Assignee: Arina Ielchiieva


[~arina] I noticed that you disabled TestTpchDistributedConcurrent with your 
change for DRILL-5771



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types

2018-05-07 Thread salim achouche (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466463#comment-16466463
 ] 

salim achouche commented on DRILL-5846:
---

[~parthc],

Can you please review this Jira PR now that I have provided a detailed 
performance analysis (DRILL-6301).

> Improve Parquet Reader Performance for Flat Data types 
> ---
>
> Key: DRILL-5846
> URL: https://issues.apache.org/jira/browse/DRILL-5846
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.11.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
>  Labels: performance
> Fix For: 1.14.0
>
> Attachments: 2542d447-9837-3924-dd12-f759108461e5.sys.drill, 
> 2542d49b-88ef-38e3-a02b-b441c1295817.sys.drill
>
>
> The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to 
> further improve the Parquet Reader performance as several users reported that 
> Parquet parsing represents the lion share of the overall query execution. It 
> tracks Flat Data types only as Nested DTs might involve functional and 
> processing enhancements (e.g., a nested column can be seen as a Document; 
> user might want to perform operations scoped at the document level that is no 
> need to span all rows). Another JIRA will be created to handle the nested 
> columns use-case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6301) Parquet Performance Analysis

2018-05-07 Thread salim achouche (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche resolved DRILL-6301.
---
Resolution: Fixed
  Reviewer: Pritesh Maker

This is an analytical task.

> Parquet Performance Analysis
> 
>
> Key: DRILL-6301
> URL: https://issues.apache.org/jira/browse/DRILL-6301
> Project: Apache Drill
>  Issue Type: Task
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> _*Description -*_
>  * DRILL-5846 is meant to improve the Flat Parquet reader performance
>  * The associated implementation resulted in a 2x - 4x performance improvement
>  * Though during the review process ([pull 
> request|[https://github.com/apache/drill/pull/1060])] few key questions arised
>  
> *_Intermediary Processing via Direct Memory vs Byte Arrays_*
>  * The main reasons for using byte arrays for intermediary processing is to 
> a) avoid the high cost of the DrillBuf checks (especially the reference 
> counting) and b) benefit from some observed Java optimizations when accessing 
> byte arrays
>  * Starting with version 1.12.0, the DrillBuf enablement checks have been 
> refined so that memory access and reference counting checks can be enabled 
> independently
>  * Benchmarking of Java's Direct Memory unsafe method using JMH indicates the 
> performance gap between heap vs direct memory  is very narrow except for few 
> use-cases
>  * There are also concerns that the extra copy step (from direct memory into 
> byte arrays) will have a negative effect on performance; note that this 
> overhead was not observed using Intel's Vtune as the intermediary buffer were 
> a) pinned to a single CPU, b) reused, and c) small enough to remain in the L1 
> cache during columnar processing.
> _*Goal*_ 
>  * The Flat Parquet reader is amongst the few Drill columnar operators
>  * It is imperative that we agree on the most optimal processing pattern so 
> that the decisions that we take within this Jira are not only applied to 
> Parquet but to all Drill columnar operators   
> _*Methodology*_ 
>  # Assess the performance impact of using intermediary byte arrays (as 
> described above)
>  # Prototype a solution using Direct Memory and DrillBuf checks off, access 
> checks on, all checks on
>  # Make an educated decision on which processing pattern should be adopted
>  # Decide whether it is ok to use Java's unsafe API (and through what 
> mechanism) on byte arrays (when the use of byte arrays is a necessity)
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6301) Parquet Performance Analysis

2018-05-07 Thread salim achouche (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466455#comment-16466455
 ] 

salim achouche commented on DRILL-6301:
---

*Benchmark Results*
 * Updated the Drill JMH benchmark [here|https://github.com/sachouche/drill-jmh]
 * The benchmark results and conclusions have been published to this 
[document|https://docs.google.com/document/d/1BSNem_ItP-Vxlr6auSP_iwwOLM9rwWZYxGwCsXi-IE8/edit#heading=h.57coyirqkop6]

*In summary, it was concluded that*
 * The current Parquet flat reader performance was negatively impacted by the 
DrillBuf APIs when accessing few bytes at a time
 * Using intermediary buffers address such performance issues as the data 
access pattern became bulk
 * Using bulk processing (within the reader) had also the advantage of 
minimizing processing overhead

> Parquet Performance Analysis
> 
>
> Key: DRILL-6301
> URL: https://issues.apache.org/jira/browse/DRILL-6301
> Project: Apache Drill
>  Issue Type: Task
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Major
> Fix For: 1.14.0
>
>
> _*Description -*_
>  * DRILL-5846 is meant to improve the Flat Parquet reader performance
>  * The associated implementation resulted in a 2x - 4x performance improvement
>  * Though during the review process ([pull 
> request|[https://github.com/apache/drill/pull/1060])] few key questions arised
>  
> *_Intermediary Processing via Direct Memory vs Byte Arrays_*
>  * The main reasons for using byte arrays for intermediary processing is to 
> a) avoid the high cost of the DrillBuf checks (especially the reference 
> counting) and b) benefit from some observed Java optimizations when accessing 
> byte arrays
>  * Starting with version 1.12.0, the DrillBuf enablement checks have been 
> refined so that memory access and reference counting checks can be enabled 
> independently
>  * Benchmarking of Java's Direct Memory unsafe method using JMH indicates the 
> performance gap between heap vs direct memory  is very narrow except for few 
> use-cases
>  * There are also concerns that the extra copy step (from direct memory into 
> byte arrays) will have a negative effect on performance; note that this 
> overhead was not observed using Intel's Vtune as the intermediary buffer were 
> a) pinned to a single CPU, b) reused, and c) small enough to remain in the L1 
> cache during columnar processing.
> _*Goal*_ 
>  * The Flat Parquet reader is amongst the few Drill columnar operators
>  * It is imperative that we agree on the most optimal processing pattern so 
> that the decisions that we take within this Jira are not only applied to 
> Parquet but to all Drill columnar operators   
> _*Methodology*_ 
>  # Assess the performance impact of using intermediary byte arrays (as 
> described above)
>  # Prototype a solution using Direct Memory and DrillBuf checks off, access 
> checks on, all checks on
>  # Make an educated decision on which processing pattern should be adopted
>  # Decide whether it is ok to use Java's unsafe API (and through what 
> mechanism) on byte arrays (when the use of byte arrays is a necessity)
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6386) Disallow Unused Imports In Checkstyle

2018-05-07 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6386:
-

 Summary: Disallow Unused Imports In Checkstyle
 Key: DRILL-6386
 URL: https://issues.apache.org/jira/browse/DRILL-6386
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6249) Add Markdown Docs for Unit Testing and Link to it in README.md

2018-05-07 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6249:
--
Reviewer: Arina Ielchiieva

> Add Markdown Docs for Unit Testing and Link to it in README.md
> --
>
> Key: DRILL-6249
> URL: https://issues.apache.org/jira/browse/DRILL-6249
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> I am working on a presentation about how to use the unit testing utilities in 
> Drill. Instead of writing the doc and having it be lost in Google Drive 
> somewhere I am going to add a Markdown doc to the drill repo and link to it 
> in the README.md. This is appropriate since these docs will only be used by 
> developers, and the way we unit test will change as the code changes. So the 
> unit testing docs should be kept in the same repo as the code so it can be 
> updated and kept in sync with the rest of Drill.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6249) Add Markdown Docs for Unit Testing and Link to it in README.md

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466421#comment-16466421
 ] 

ASF GitHub Bot commented on DRILL-6249:
---

ilooner commented on issue #1251: DRILL-6249: Adding more unit testing 
documentation.
URL: https://github.com/apache/drill/pull/1251#issuecomment-387190754
 
 
   @vvysotskyi Please review GeneratedCode.md
   @paul-rogers @arina-ielchiieva please review


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Markdown Docs for Unit Testing and Link to it in README.md
> --
>
> Key: DRILL-6249
> URL: https://issues.apache.org/jira/browse/DRILL-6249
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> I am working on a presentation about how to use the unit testing utilities in 
> Drill. Instead of writing the doc and having it be lost in Google Drive 
> somewhere I am going to add a Markdown doc to the drill repo and link to it 
> in the README.md. This is appropriate since these docs will only be used by 
> developers, and the way we unit test will change as the code changes. So the 
> unit testing docs should be kept in the same repo as the code so it can be 
> updated and kept in sync with the rest of Drill.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6249) Add Markdown Docs for Unit Testing and Link to it in README.md

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466414#comment-16466414
 ] 

ASF GitHub Bot commented on DRILL-6249:
---

ilooner opened a new pull request #1251: DRILL-6249: Adding more unit testing 
documentation.
URL: https://github.com/apache/drill/pull/1251
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Markdown Docs for Unit Testing and Link to it in README.md
> --
>
> Key: DRILL-6249
> URL: https://issues.apache.org/jira/browse/DRILL-6249
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> I am working on a presentation about how to use the unit testing utilities in 
> Drill. Instead of writing the doc and having it be lost in Google Drive 
> somewhere I am going to add a Markdown doc to the drill repo and link to it 
> in the README.md. This is appropriate since these docs will only be used by 
> developers, and the way we unit test will change as the code changes. So the 
> unit testing docs should be kept in the same repo as the code so it can be 
> updated and kept in sync with the rest of Drill.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6321) Lateral Join: Planning changes - enable submitting physical plan

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466282#comment-16466282
 ] 

ASF GitHub Bot commented on DRILL-6321:
---

vrozov commented on a change in pull request #1224: DRILL-6321: Customize 
Drill's conformance. Allow support to APPLY key…
URL: https://github.com/apache/drill/pull/1224#discussion_r186506523
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillConformance.java
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql;
+
+import org.apache.calcite.sql.validate.SqlConformanceEnum;
+import org.apache.calcite.sql.validate.SqlDelegatingConformance;
+
+/**
+ * Drill's SQL conformance is SqlConformanceEnum.DEFAULT except for method 
isApplyAllowed().
+ * Since Drill is going to allow OUTER APPLY and CROSS APPLY to allow each row 
from left child of Join
+ * to join with output of right side (sub-query or table function that will be 
invoked for each row).
+ * Refer to DRILL-5999 for more information.
+ */
+public class DrillConformance extends SqlDelegatingConformance {
 
 Review comment:
   Personally, I don't see a need for the upper-level class in the future, so I 
implemented a different approach in 
https://github.com/apache/drill/compare/master...vrozov:DRILL-6321. A committer 
should decide what approach to follow, it is not that I block the PR with -1.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Lateral Join: Planning changes - enable submitting physical plan
> 
>
> Key: DRILL-6321
> URL: https://issues.apache.org/jira/browse/DRILL-6321
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Parth Chandra
>Assignee: Chunhui Shi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Implement changes to enable submitting a physical plan containing lateral and 
> unnest.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4091) Support more functions in gis contrib module

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466260#comment-16466260
 ] 

ASF GitHub Bot commented on DRILL-4091:
---

ChrisSandison commented on issue #1201: DRILL-4091: Support for additional gis 
operations in gis contrib module
URL: https://github.com/apache/drill/pull/1201#issuecomment-387151953
 
 
   @cgivre updated and squashed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support more functions in gis contrib module
> 
>
> Key: DRILL-4091
> URL: https://issues.apache.org/jira/browse/DRILL-4091
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Karol Potocki
>Assignee: Karol Potocki
>Priority: Major
>
> Support for commonly used gis functions in gis contrib module: relate, 
> contains, crosses, intersects, touches, difference, disjoint, buffer, union 
> etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6272) Remove binary jars files from source distribution

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466145#comment-16466145
 ] 

ASF GitHub Bot commented on DRILL-6272:
---

arina-ielchiieva commented on a change in pull request #1225: DRILL-6272: 
Refactor dynamic UDFs and function initializer tests to g…
URL: https://github.com/apache/drill/pull/1225#discussion_r186472412
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/rpc/user/TemporaryTablesAutomaticDropTest.java
 ##
 @@ -19,39 +19,53 @@
 
 import mockit.Mock;
 import mockit.MockUp;
+import org.apache.drill.exec.store.StorageStrategy;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.common.config.DrillConfig;
-import org.apache.drill.exec.ExecConstants;
 import org.apache.drill.exec.store.StoragePluginRegistry;
 import org.apache.drill.exec.util.StoragePluginTestUtils;
 import org.apache.drill.test.DirTestWatcher;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.RemoteIterator;
+import org.apache.hadoop.fs.permission.FsPermission;
 import org.junit.Before;
+import org.junit.BeforeClass;
 import org.junit.Test;
 
 import java.io.File;
-import java.util.Properties;
 import java.util.UUID;
 
 import static org.apache.drill.exec.util.StoragePluginTestUtils.DFS_TMP_SCHEMA;
+import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertFalse;
 import static org.junit.Assert.assertTrue;
 
 public class TemporaryTablesAutomaticDropTest extends BaseTestQuery {
 
   private static final String session_id = "sessionId";
 
+  private static FileSystem fs;
 
 Review comment:
   Nope, they are defined in `@BeforeClass` and the same for all tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove binary jars files from source distribution
> -
>
> Key: DRILL-6272
> URL: https://issues.apache.org/jira/browse/DRILL-6272
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.14.0
>
>
> Per [~vrozov] the source distribution contains binary jar files under 
> exec/java-exec/src/test/resources/jars



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6272) Remove binary jars files from source distribution

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466147#comment-16466147
 ] 

ASF GitHub Bot commented on DRILL-6272:
---

arina-ielchiieva commented on a change in pull request #1225: DRILL-6272: 
Refactor dynamic UDFs and function initializer tests to g…
URL: https://github.com/apache/drill/pull/1225#discussion_r186471096
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/udf/dynamic/JarBuilder.java
 ##
 @@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udf.dynamic;
+
+import org.apache.maven.cli.MavenCli;
+import org.apache.maven.cli.logging.Slf4jLogger;
+import org.codehaus.plexus.DefaultPlexusContainer;
+import org.codehaus.plexus.PlexusContainer;
+import org.codehaus.plexus.logging.BaseLoggerManager;
+
+import java.util.LinkedList;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+
+public class JarBuilder {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(JarBuilder.class);
+  private static final String MAVEN_MULTI_MODULE_PROJECT_DIRECTORY = 
"maven.multiModuleProjectDirectory";
+
+  private final MavenCli cli;
+  private final String projectDirectory;
+
+  public JarBuilder(String projectDirectory) {
+this.cli = new MavenCli() {
+  @Override
+  protected void customizeContainer(PlexusContainer container) {
+((DefaultPlexusContainer) container).setLoggerManager(new 
BaseLoggerManager() {
+  @Override
+  protected org.codehaus.plexus.logging.Logger createLogger(String s) {
+return new Slf4jLogger(logger);
+  }
+});
+  }
+};
+this.projectDirectory = projectDirectory;
+  }
+
+  /**
+   * Builds jars using embedded maven in provided build directory.
+   * Includes files / resources based given pattern, otherwise using defaults 
provided in pom.xml.
+   * Checks if build exit code is 0, i.e. build was successful.
+   *
+   * @param jarName jar name
+   * @param buildDirectory build directory
+   * @param includeFiles pattern indicating which files should be included
+   * @param includeResources pattern indicating which resources should be 
included
+   *
+   * @return binary jar name with jar extension (my-jar.jar)
+   */
+  public String build(String jarName, String buildDirectory, String 
includeFiles, String includeResources) {
+String originalPropertyValue = null;
+try {
+  originalPropertyValue = 
System.setProperty(MAVEN_MULTI_MODULE_PROJECT_DIRECTORY, projectDirectory);
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove binary jars files from source distribution
> -
>
> Key: DRILL-6272
> URL: https://issues.apache.org/jira/browse/DRILL-6272
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.14.0
>
>
> Per [~vrozov] the source distribution contains binary jar files under 
> exec/java-exec/src/test/resources/jars



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6272) Remove binary jars files from source distribution

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466146#comment-16466146
 ] 

ASF GitHub Bot commented on DRILL-6272:
---

arina-ielchiieva commented on a change in pull request #1225: DRILL-6272: 
Refactor dynamic UDFs and function initializer tests to g…
URL: https://github.com/apache/drill/pull/1225#discussion_r186472868
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/udf/dynamic/JarBuilder.java
 ##
 @@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udf.dynamic;
+
+import org.apache.maven.cli.MavenCli;
+import org.apache.maven.cli.logging.Slf4jLogger;
+import org.codehaus.plexus.DefaultPlexusContainer;
+import org.codehaus.plexus.PlexusContainer;
+import org.codehaus.plexus.logging.BaseLoggerManager;
+
+import java.util.LinkedList;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+
+public class JarBuilder {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(JarBuilder.class);
+  private static final String MAVEN_MULTI_MODULE_PROJECT_DIRECTORY = 
"maven.multiModuleProjectDirectory";
+
+  private final MavenCli cli;
+  private final String projectDirectory;
+
+  public JarBuilder(String projectDirectory) {
+this.cli = new MavenCli() {
+  @Override
+  protected void customizeContainer(PlexusContainer container) {
+((DefaultPlexusContainer) container).setLoggerManager(new 
BaseLoggerManager() {
+  @Override
+  protected org.codehaus.plexus.logging.Logger createLogger(String s) {
+return new Slf4jLogger(logger);
+  }
+});
+  }
+};
+this.projectDirectory = projectDirectory;
+  }
+
+  /**
+   * Builds jars using embedded maven in provided build directory.
+   * Includes files / resources based given pattern, otherwise using defaults 
provided in pom.xml.
+   * Checks if build exit code is 0, i.e. build was successful.
+   *
+   * @param jarName jar name
+   * @param buildDirectory build directory
+   * @param includeFiles pattern indicating which files should be included
+   * @param includeResources pattern indicating which resources should be 
included
+   *
+   * @return binary jar name with jar extension (my-jar.jar)
+   */
+  public String build(String jarName, String buildDirectory, String 
includeFiles, String includeResources) {
+String originalPropertyValue = null;
+try {
+  originalPropertyValue = 
System.setProperty(MAVEN_MULTI_MODULE_PROJECT_DIRECTORY, projectDirectory);
+  List params = new LinkedList<>();
+  params.add("clean");
+  params.add("package");
+  params.add("-DskipTests");
+  // uncomment to build with current Drill version
+  // params.add("-Ddrill.version=" + DrillVersionInfo.getVersion());
+  params.add("-Djar.finalName=" + jarName);
+  params.add("-Dcustom.buildDirectory=" + buildDirectory);
+  if (includeFiles != null) {
+params.add("-Dinclude.files=" + includeFiles);
+  }
+  if (includeResources != null) {
+params.add("-Dinclude.resources=" + includeResources);
+  }
+  int result = cli.doMain(params.toArray(new String[params.size()]), 
projectDirectory, System.out, System.err);
+  assertEquals("Build should be successful.", 0, result);
+  return jarName + ".jar";
+} finally {
+   if (originalPropertyValue != null) {
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove binary jars files from source distribution
> -
>
> Key: DRILL-6272
> URL: https://issues.apache.org/jira/browse/DRILL-6272
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Arina 

[jira] [Commented] (DRILL-6272) Remove binary jars files from source distribution

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466144#comment-16466144
 ] 

ASF GitHub Bot commented on DRILL-6272:
---

arina-ielchiieva commented on a change in pull request #1225: DRILL-6272: 
Refactor dynamic UDFs and function initializer tests to g…
URL: https://github.com/apache/drill/pull/1225#discussion_r186471937
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/rpc/user/TemporaryTablesAutomaticDropTest.java
 ##
 @@ -19,39 +19,53 @@
 
 import mockit.Mock;
 import mockit.MockUp;
+import org.apache.drill.exec.store.StorageStrategy;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.common.config.DrillConfig;
-import org.apache.drill.exec.ExecConstants;
 import org.apache.drill.exec.store.StoragePluginRegistry;
 import org.apache.drill.exec.util.StoragePluginTestUtils;
 import org.apache.drill.test.DirTestWatcher;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.RemoteIterator;
+import org.apache.hadoop.fs.permission.FsPermission;
 import org.junit.Before;
+import org.junit.BeforeClass;
 import org.junit.Test;
 
 import java.io.File;
-import java.util.Properties;
 import java.util.UUID;
 
 import static org.apache.drill.exec.util.StoragePluginTestUtils.DFS_TMP_SCHEMA;
+import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertFalse;
 import static org.junit.Assert.assertTrue;
 
 public class TemporaryTablesAutomaticDropTest extends BaseTestQuery {
 
   private static final String session_id = "sessionId";
 
 Review comment:
   Replaced it to `UUID.randomUUID()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove binary jars files from source distribution
> -
>
> Key: DRILL-6272
> URL: https://issues.apache.org/jira/browse/DRILL-6272
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.14.0
>
>
> Per [~vrozov] the source distribution contains binary jar files under 
> exec/java-exec/src/test/resources/jars



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6272) Remove binary jars files from source distribution

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466148#comment-16466148
 ] 

ASF GitHub Bot commented on DRILL-6272:
---

arina-ielchiieva commented on a change in pull request #1225: DRILL-6272: 
Refactor dynamic UDFs and function initializer tests to g…
URL: https://github.com/apache/drill/pull/1225#discussion_r186472632
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/rpc/user/TemporaryTablesAutomaticDropTest.java
 ##
 @@ -19,39 +19,53 @@
 
 import mockit.Mock;
 import mockit.MockUp;
+import org.apache.drill.exec.store.StorageStrategy;
 import org.apache.drill.test.BaseTestQuery;
 import org.apache.drill.common.config.DrillConfig;
-import org.apache.drill.exec.ExecConstants;
 import org.apache.drill.exec.store.StoragePluginRegistry;
 import org.apache.drill.exec.util.StoragePluginTestUtils;
 import org.apache.drill.test.DirTestWatcher;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.RemoteIterator;
+import org.apache.hadoop.fs.permission.FsPermission;
 import org.junit.Before;
+import org.junit.BeforeClass;
 import org.junit.Test;
 
 import java.io.File;
-import java.util.Properties;
 import java.util.UUID;
 
 import static org.apache.drill.exec.util.StoragePluginTestUtils.DFS_TMP_SCHEMA;
+import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertFalse;
 import static org.junit.Assert.assertTrue;
 
 public class TemporaryTablesAutomaticDropTest extends BaseTestQuery {
 
   private static final String session_id = "sessionId";
 
+  private static FileSystem fs;
+  private static FsPermission expectedFolderPermission;
+  private static FsPermission expectedFilePermission;
+
+  @BeforeClass
+  public static void init() throws Exception {
+fs = getLocalFileSystem();
+expectedFolderPermission = new 
FsPermission(StorageStrategy.TEMPORARY.getFolderPermission());
+expectedFilePermission = new 
FsPermission(StorageStrategy.TEMPORARY.getFilePermission());
+  }
+
   @Before
-  public void setup() throws Exception {
+  public void setup() {
 
 Review comment:
   Unfortunately, yes. It turned out that there is no good way to retrieve 
session information in tests. Sorry for confusion.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove binary jars files from source distribution
> -
>
> Key: DRILL-6272
> URL: https://issues.apache.org/jira/browse/DRILL-6272
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Vlad Rozov
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.14.0
>
>
> Per [~vrozov] the source distribution contains binary jar files under 
> exec/java-exec/src/test/resources/jars



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-05-07 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466032#comment-16466032
 ] 

Aman Sinha commented on DRILL-6385:
---

[~weijie] thanks for working on this.   It sounds like you are far along in the 
implementation.  Just as a future reference, , it would be good to create the 
Jira sooner or inform on the dev list about the ongoing work so that others in 
the community are aware.  

Regarding the proposal, couple of thoughts: is a global bloom filter always 
needed or a local bloom filter will suffice in certain cases ?  In the case 
where we are doing broadcast hash join, the probe side is never distributed, so 
once the build is done on each minor fragment, the bloom filter can be passed 
to the Scan operator locally without contacting the foreman node.    A second 
related thought:  for hash distributed hash join where both probe and build 
sides are hash distributed, does it mean that a 'global bloom filter' is a 
synchronization point in your proposal ? In other words, suppose there are 20 
minor fragments and one of them is slow in completing the build phase, will the 
other 19 probes continue at their own pace ?

> Support JPPD (Join Predicate Push Down)
> ---
>
> Key: DRILL-6385
> URL: https://issues.apache.org/jira/browse/DRILL-6385
> Project: Apache Drill
>  Issue Type: New Feature
>  Components:  Server, Execution - Flow
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>
> This feature is to support the JPPD (Join Predicate Push Down). It will 
> benefit the HashJoin ,Broadcast HashJoin performance by reducing the number 
> of rows to send across the network ,the memory consumed. This feature is 
> already supported by Impala which calls it RuntimeFilter 
> ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]).
>  The first PR will try to push down a bloom filter of HashJoin node to 
> Parquet’s scan node.   The propose basic procedure is described as follow:
>  # The HashJoin build side accumulate the equal join condition rows to 
> construct a bloom filter. Then it sends out the bloom filter to the foreman 
> node.
>  # The foreman node accept the bloom filters passively from all the fragments 
> that has the HashJoin operator. It then aggregates the bloom filters to form 
> a global bloom filter.
>  # The foreman node broadcasts the global bloom filter to all the probe side 
> scan nodes which maybe already have send out partial data to the hash join 
> nodes(currently the hash join node will prefetch one batch from both sides ).
>       4.  The scan node accepts a global bloom filter from the foreman node. 
> It will filter the rest rows satisfying the bloom filter.
>  
> To implement above execution flow, some main new notion described as below:
>       1. RuntimeFilter
> It’s a filter container which may contain BloomFilter or MinMaxFilter.
>       2. RuntimeFilterReporter
> It wraps the logic to send hash join’s bloom filter to the foreman.The 
> serialized bloom filter will be sent out through the data tunnel.This object 
> will be instanced by the FragmentExecutor and passed to the 
> FragmentContext.So the HashJoin operator can obtain it through the 
> FragmentContext.
>      3. RuntimeFilterRequestHandler
> It is responsible to accept a SendRuntimeFilterRequest RPC to strip the 
> actual BloomFilter from the network. It then translates this filter to the 
> WorkerBee’s new interface registerRuntimeFilter.
> Another RPC type is BroadcastRuntimeFilterRequest. It will register the 
> accepted global bloom filter to the WorkerBee by the registerRuntimeFilter 
> method and then propagate to the FragmentContext through which the probe side 
> scan node can fetch the aggregated bloom filter.
>       4.RuntimeFilterManager
> The foreman will instance a RuntimeFilterManager .It will indirectly get 
> every RuntimeFilter by the WorkerBee. Once all the BloomFilters have been 
> accepted and aggregated . It will broadcast the aggregated bloom filter to 
> all the probe side scan nodes through the data tunnel by a 
> BroadcastRuntimeFilterRequest RPC.
>      5. RuntimeFilterEnableOption 
>  A global option will be added to decide whether to enable this new feature.
>  
> Welcome suggestion and advice from you.The related PR will be presented as 
> soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-05-07 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6385:
--

 Summary: Support JPPD (Join Predicate Push Down)
 Key: DRILL-6385
 URL: https://issues.apache.org/jira/browse/DRILL-6385
 Project: Apache Drill
  Issue Type: New Feature
  Components:  Server, Execution - Flow
Reporter: weijie.tong
Assignee: weijie.tong


This feature is to support the JPPD (Join Predicate Push Down). It will benefit 
the HashJoin ,Broadcast HashJoin performance by reducing the number of rows to 
send across the network ,the memory consumed. This feature is already supported 
by Impala which calls it RuntimeFilter 
([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]).
 The first PR will try to push down a bloom filter of HashJoin node to 
Parquet’s scan node.   The propose basic procedure is described as follow:
 # The HashJoin build side accumulate the equal join condition rows to 
construct a bloom filter. Then it sends out the bloom filter to the foreman 
node.
 # The foreman node accept the bloom filters passively from all the fragments 
that has the HashJoin operator. It then aggregates the bloom filters to form a 
global bloom filter.
 # The foreman node broadcasts the global bloom filter to all the probe side 
scan nodes which maybe already have send out partial data to the hash join 
nodes(currently the hash join node will prefetch one batch from both sides ).

      4.  The scan node accepts a global bloom filter from the foreman node. It 
will filter the rest rows satisfying the bloom filter.

 

To implement above execution flow, some main new notion described as below:

      1. RuntimeFilter

It’s a filter container which may contain BloomFilter or MinMaxFilter.

      2. RuntimeFilterReporter

It wraps the logic to send hash join’s bloom filter to the foreman.The 
serialized bloom filter will be sent out through the data tunnel.This object 
will be instanced by the FragmentExecutor and passed to the FragmentContext.So 
the HashJoin operator can obtain it through the FragmentContext.

     3. RuntimeFilterRequestHandler

It is responsible to accept a SendRuntimeFilterRequest RPC to strip the actual 
BloomFilter from the network. It then translates this filter to the WorkerBee’s 
new interface registerRuntimeFilter.

Another RPC type is BroadcastRuntimeFilterRequest. It will register the 
accepted global bloom filter to the WorkerBee by the registerRuntimeFilter 
method and then propagate to the FragmentContext through which the probe side 
scan node can fetch the aggregated bloom filter.

      4.RuntimeFilterManager

The foreman will instance a RuntimeFilterManager .It will indirectly get every 
RuntimeFilter by the WorkerBee. Once all the BloomFilters have been accepted 
and aggregated . It will broadcast the aggregated bloom filter to all the probe 
side scan nodes through the data tunnel by a BroadcastRuntimeFilterRequest RPC.

     5. RuntimeFilterEnableOption 

 A global option will be added to decide whether to enable this new feature.

 

Welcome suggestion and advice from you.The related PR will be presented as soon 
as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6259) Support parquet filter push down for complex types

2018-05-07 Thread Anton Gozhiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Gozhiy closed DRILL-6259.
---

Verified with Drill version 1.14.0-SNAPSHOT, commit id: 
24193b1b038a6315681a65c76a67034b64f71fc5

> Support parquet filter push down for complex types
> --
>
> Key: DRILL-6259
> URL: https://issues.apache.org/jira/browse/DRILL-6259
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Currently parquet filter push down is not working for complex types 
> (including arrays).
> This Jira aims to implement filter push down for complex types which 
> underneath type is among supported simple types for filter push down. For 
> instance, currently Drill does not support filter push down for varchars, 
> decimals etc. Though once Drill will start support, this support will be 
> applied for complex type automatically.
> Complex fields will be pushed down the same way regular fields are, except 
> for one case with arrays.
> Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to 
> push down because we are not able to determine exact number of nulls in 
> arrays fields. 
> {{Consider [1, 2, 3]}} vs {{[1, 2]}} if these arrays are in different files. 
> Statistics for the second case won't show any nulls but when querying from 
> two files, in terms of data the third value in array is null.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465537#comment-16465537
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

vvysotskyi commented on issue #570: DRILL-4834 decimal implementation is 
vulnerable to overflow errors, and extremely complex
URL: https://github.com/apache/drill/pull/570#issuecomment-386979265
 
 
   Closing this PR since it was fixed in the scope of DRILL-6094


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
>Priority: Major
> Fix For: 1.14.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465539#comment-16465539
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

vvysotskyi commented on issue #570: DRILL-4834 decimal implementation is 
vulnerable to overflow errors, and extremely complex
URL: https://github.com/apache/drill/pull/570#issuecomment-386975060
 
 
   @daveoshinsky, could you please close this PR, since it was fixed in the 
scope of DRILL-6094


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
>Priority: Major
> Fix For: 1.14.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4184) Drill does not support Parquet DECIMAL values in variable length BINARY fields

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465534#comment-16465534
 ] 

ASF GitHub Bot commented on DRILL-4184:
---

vvysotskyi closed pull request #372: DRILL-4184: support variable length 
decimal fields in parquet
URL: https://github.com/apache/drill/pull/372
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableVarLengthValuesColumn.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableVarLengthValuesColumn.java
index b18a81c606..bcfc812f0b 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableVarLengthValuesColumn.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableVarLengthValuesColumn.java
@@ -20,10 +20,14 @@
 import io.netty.buffer.DrillBuf;
 
 import java.io.IOException;
+import java.math.BigDecimal;
+import java.nio.ByteBuffer;
 
 import org.apache.drill.common.exceptions.ExecutionSetupException;
 import org.apache.drill.exec.vector.ValueVector;
-
+import org.apache.drill.exec.vector.VariableWidthVector;
+import org.apache.drill.exec.util.DecimalUtility;
+import org.apache.drill.exec.vector.FixedWidthVector;
 import org.apache.parquet.column.ColumnDescriptor;
 import org.apache.parquet.format.SchemaElement;
 import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData;
@@ -69,11 +73,16 @@ protected boolean readAndStoreValueSizeInformation() throws 
IOException {
 if ( currDefLevel == -1 ) {
   currDefLevel = pageReader.definitionLevels.readInteger();
 }
-if ( columnDescriptor.getMaxDefinitionLevel() > currDefLevel) {
+
+if (columnDescriptor.getMaxDefinitionLevel() > currDefLevel) {
   nullsRead++;
-  // set length of zero, each index in the vector defaults to null so no 
need to set the nullability
-  variableWidthVector.getMutator().setValueLengthSafe(
-  valuesReadInCurrentPass + pageReader.valuesReadyToRead, 0);
+  // set length of zero, each index in the vector defaults to null so no
+  // need to set the nullability
+  if (variableWidthVector == null) {
+addDecimalLength(null); // store null length in BYTES for null value
+  } else {
+
variableWidthVector.getMutator().setValueLengthSafe(valuesReadInCurrentPass + 
pageReader.valuesReadyToRead, 0);
+  }
   currentValNull = true;
   return false;// field is null, no length to add to data vector
 }
@@ -83,18 +92,26 @@ protected boolean readAndStoreValueSizeInformation() throws 
IOException {
 currLengthDeterminingDictVal = 
pageReader.dictionaryLengthDeterminingReader.readBytes();
   }
   currDictValToWrite = currLengthDeterminingDictVal;
-  // re-purposing  this field here for length in BYTES to prevent 
repetitive multiplication/division
+
+  // re-purposing this field here for length in BYTES to prevent
+  // repetitive multiplication/division
   dataTypeLengthInBits = currLengthDeterminingDictVal.length();
 }
 else {
   // re-purposing  this field here for length in BYTES to prevent 
repetitive multiplication/division
   dataTypeLengthInBits = pageReader.pageData.getInt((int) 
pageReader.readyToReadPosInBytes);
 }
-// I think this also needs to happen if it is null for the random access
-boolean success = setSafe(valuesReadInCurrentPass + 
pageReader.valuesReadyToRead, pageReader.pageData,
-(int) pageReader.readyToReadPosInBytes + 4, dataTypeLengthInBits);
-if ( ! success ) {
-  return true;
+
+if (variableWidthVector == null) {
+  addDecimalLength(dataTypeLengthInBits); // store decimal length variable 
length decimal field
+}
+else {
+  // I think this also needs to happen if it is null for the random access
+  boolean success = setSafe(valuesReadInCurrentPass + 
pageReader.valuesReadyToRead, pageReader.pageData,
+   (int) pageReader.readyToReadPosInBytes + 4, 
dataTypeLengthInBits);
+  if ( ! success ) {
+return true;
+  }
 }
 return false;
   }
@@ -122,19 +139,34 @@ public void updatePosition() {
   protected void readField(long recordsToRead) {
 // TODO - unlike most implementations of this method, the 
recordsReadInThisIteration field is not set here
 // should verify that this is not breaking anything
-currentValNull = 
variableWidthVector.getAccessor().getObject(valuesReadInCurrentPass) == null;
+if (variableWidthVector == null) {
+  currentValNull = getDecimalLength(valuesReadInCurrentPass) == null;
+}
+else {
+  

[jira] [Commented] (DRILL-4184) Drill does not support Parquet DECIMAL values in variable length BINARY fields

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465533#comment-16465533
 ] 

ASF GitHub Bot commented on DRILL-4184:
---

vvysotskyi commented on issue #372: DRILL-4184: support variable length decimal 
fields in parquet
URL: https://github.com/apache/drill/pull/372#issuecomment-386978734
 
 
   Closing this PR since it was fixed in the scope of DRILL-6094


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill does not support Parquet DECIMAL values in variable length BINARY fields
> --
>
> Key: DRILL-4184
> URL: https://issues.apache.org/jira/browse/DRILL-4184
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.4.0
> Environment: Windows 7 Professional, Java 1.8.0_66
>Reporter: Dave Oshinsky
>Priority: Major
>
> Encoding a DECIMAL logical type in Parquet using the variable length BINARY 
> primitive type is not supported by Drill as of versions 1.3.0 and 1.4.0.  The 
> problem first surfaces with the ClassCastException shown below, but fixing 
> the immediate cause of the exception is not sufficient to support this 
> combination (DECIMAL, BINARY) in a Parquet file.
> In Drill, DECIMAL is currently assumed to be INT32, INT64, INT96, or 
> FIXED_LEN_BINARY_ARRAY.  Are there any plans to support DECIMAL with variable 
> length BINARY?  Avro definitely supports encoding DECIMAL in variable length 
> bytes (see https://avro.apache.org/docs/current/spec.html#Decimal), but this 
> support in Parquet is less clear.
> Selecting on a BINARY DECIMAL field in a parquet file throws an exception as 
> shown below (java.lang.ClassCastException: 
> org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to 
> org.apache.drill.exec.vector.VariableWidthVector).  The successful query at 
> bottom selected on a string field in the same file.
> 0: jdbc:drill:zk=local> select count(*) from 
> dfs.`c:/dao/DBArchivePredictor/tenrows.parquet` where acct_no=7020;
> org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet 
> recor
> d reader.
> Message: Failure in setting up reader
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message sbi.acct_mstr {
>   required binary ACCT_NO (DECIMAL(20,0));
>   optional binary SF_NO (UTF8);
>   optional binary LF_NO (UTF8);
>   optional binary BRANCH_NO (DECIMAL(20,0));
>   optional binary INTRO_CUST_NO (DECIMAL(20,0));
>   optional binary INTRO_ACCT_NO (DECIMAL(20,0));
>   optional binary INTRO_SIGN (UTF8);
>   optional binary TYPE (UTF8);
>   optional binary OPR_MODE (UTF8);
>   optional binary CUR_ACCT_TYPE (UTF8);
>   optional binary TITLE (UTF8);
>   optional binary CORP_CUST_NO (DECIMAL(20,0));
>   optional binary APLNDT (UTF8);
>   optional binary OPNDT (UTF8);
>   optional binary VERI_EMP_NO (DECIMAL(20,0));
>   optional binary VERI_SIGN (UTF8);
>   optional binary MANAGER_SIGN (UTF8);
>   optional binary CURBAL (DECIMAL(8,2));
>   optional binary STATUS (UTF8);
> }
> , metadata: 
> {parquet.avro.schema={"type":"record","name":"acct_mstr","namespace"
> :"sbi","fields":[{"name":"ACCT_NO","type":{"type":"bytes","logicalType":"decimal
> ","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_co
> lumn_class":"java.math.BigDecimal","cv_connection":"oracle.jdbc.driver.T4CConnec
> tion","cv_currency":true,"cv_def_writable":false,"cv_nullable":0,"cv_precision":
> 20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_s
> ubscript":1,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}},{"name":"SF_
> NO","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":tru
> e,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":fal
> se,"cv_nullable":1,"cv_precision":10,"cv_read_only":false,"cv_scale":0,"cv_searc
> hable":true,"cv_signed":true,"cv_subscript":2,"cv_type":12,"cv_typename":"VARCHA
> R2","cv_writable":true}]},{"name":"LF_NO","type":["null",{"type":"string","cv_au
> to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv
> _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":10,"cv_r
> ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript
> ":3,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"BRANCH_
> NO","type":["null",{"type":"bytes","logicalType":"decimal","precision":20,"scale
> ":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"java.math.
> 

[jira] [Commented] (DRILL-4184) Drill does not support Parquet DECIMAL values in variable length BINARY fields

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465532#comment-16465532
 ] 

ASF GitHub Bot commented on DRILL-4184:
---

vvysotskyi commented on issue #372: DRILL-4184: support variable length decimal 
fields in parquet
URL: https://github.com/apache/drill/pull/372#issuecomment-386975850
 
 
   @daveoshinsky, could you please close this PR, since it was fixed in the 
scope of DRILL-6094


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill does not support Parquet DECIMAL values in variable length BINARY fields
> --
>
> Key: DRILL-4184
> URL: https://issues.apache.org/jira/browse/DRILL-4184
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.4.0
> Environment: Windows 7 Professional, Java 1.8.0_66
>Reporter: Dave Oshinsky
>Priority: Major
>
> Encoding a DECIMAL logical type in Parquet using the variable length BINARY 
> primitive type is not supported by Drill as of versions 1.3.0 and 1.4.0.  The 
> problem first surfaces with the ClassCastException shown below, but fixing 
> the immediate cause of the exception is not sufficient to support this 
> combination (DECIMAL, BINARY) in a Parquet file.
> In Drill, DECIMAL is currently assumed to be INT32, INT64, INT96, or 
> FIXED_LEN_BINARY_ARRAY.  Are there any plans to support DECIMAL with variable 
> length BINARY?  Avro definitely supports encoding DECIMAL in variable length 
> bytes (see https://avro.apache.org/docs/current/spec.html#Decimal), but this 
> support in Parquet is less clear.
> Selecting on a BINARY DECIMAL field in a parquet file throws an exception as 
> shown below (java.lang.ClassCastException: 
> org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to 
> org.apache.drill.exec.vector.VariableWidthVector).  The successful query at 
> bottom selected on a string field in the same file.
> 0: jdbc:drill:zk=local> select count(*) from 
> dfs.`c:/dao/DBArchivePredictor/tenrows.parquet` where acct_no=7020;
> org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet 
> recor
> d reader.
> Message: Failure in setting up reader
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message sbi.acct_mstr {
>   required binary ACCT_NO (DECIMAL(20,0));
>   optional binary SF_NO (UTF8);
>   optional binary LF_NO (UTF8);
>   optional binary BRANCH_NO (DECIMAL(20,0));
>   optional binary INTRO_CUST_NO (DECIMAL(20,0));
>   optional binary INTRO_ACCT_NO (DECIMAL(20,0));
>   optional binary INTRO_SIGN (UTF8);
>   optional binary TYPE (UTF8);
>   optional binary OPR_MODE (UTF8);
>   optional binary CUR_ACCT_TYPE (UTF8);
>   optional binary TITLE (UTF8);
>   optional binary CORP_CUST_NO (DECIMAL(20,0));
>   optional binary APLNDT (UTF8);
>   optional binary OPNDT (UTF8);
>   optional binary VERI_EMP_NO (DECIMAL(20,0));
>   optional binary VERI_SIGN (UTF8);
>   optional binary MANAGER_SIGN (UTF8);
>   optional binary CURBAL (DECIMAL(8,2));
>   optional binary STATUS (UTF8);
> }
> , metadata: 
> {parquet.avro.schema={"type":"record","name":"acct_mstr","namespace"
> :"sbi","fields":[{"name":"ACCT_NO","type":{"type":"bytes","logicalType":"decimal
> ","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_co
> lumn_class":"java.math.BigDecimal","cv_connection":"oracle.jdbc.driver.T4CConnec
> tion","cv_currency":true,"cv_def_writable":false,"cv_nullable":0,"cv_precision":
> 20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_s
> ubscript":1,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}},{"name":"SF_
> NO","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":tru
> e,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":fal
> se,"cv_nullable":1,"cv_precision":10,"cv_read_only":false,"cv_scale":0,"cv_searc
> hable":true,"cv_signed":true,"cv_subscript":2,"cv_type":12,"cv_typename":"VARCHA
> R2","cv_writable":true}]},{"name":"LF_NO","type":["null",{"type":"string","cv_au
> to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv
> _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":10,"cv_r
> ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript
> ":3,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"BRANCH_
> NO","type":["null",{"type":"bytes","logicalType":"decimal","precision":20,"scale
> 

[jira] [Commented] (DRILL-3950) CAST(...) * (Interval Constant) gives Internal Exception

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465531#comment-16465531
 ] 

ASF GitHub Bot commented on DRILL-3950:
---

vvysotskyi closed pull request #218: DRILL-3950: Add test case and bump calcite 
version to 1.4.0-drill-r7
URL: https://github.com/apache/drill/pull/218
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestCastFunctions.java
 
b/exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestCastFunctions.java
index 23fc54e5dc..1a3b7511a6 100644
--- 
a/exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestCastFunctions.java
+++ 
b/exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestCastFunctions.java
@@ -18,9 +18,9 @@
 package org.apache.drill.exec.fn.impl;
 
 import org.apache.drill.BaseTestQuery;
-import org.apache.drill.common.types.TypeProtos;
 import org.apache.drill.common.util.FileUtils;
 import org.joda.time.DateTime;
+import org.joda.time.Period;
 import org.junit.Test;
 
 public class TestCastFunctions extends BaseTestQuery {
@@ -79,4 +79,22 @@ public void testToDateForTimeStamp() throws Exception {
 .build()
 .run();
   }
-}
\ No newline at end of file
+
+  @Test // DRILL-3950
+  public void testCastTimesInterval() throws Exception {
+final String query = "select cast(r_regionkey as Integer) * (INTERVAL '1' 
DAY) as col \n" +
+"from cp.`tpch/region.parquet`";
+
+testBuilder()
+.sqlQuery(query)
+.ordered()
+.baselineColumns("col")
+.baselineValues(Period.days(0))
+.baselineValues(Period.days(1))
+.baselineValues(Period.days(2))
+.baselineValues(Period.days(3))
+.baselineValues(Period.days(4))
+.build()
+.run();
+  }
+}
diff --git a/pom.xml b/pom.xml
index 882f8d8af2..d94e21195b 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1238,7 +1238,7 @@
   
 org.apache.calcite
 calcite-core
-1.4.0-drill-r6
+1.4.0-drill-r7
 
   
 org.jgrapht


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> CAST(...) * (Interval Constant) gives Internal Exception
> 
>
> Key: DRILL-3950
> URL: https://issues.apache.org/jira/browse/DRILL-3950
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Roman Kulyk
>Priority: Major
>  Labels: interval
>
> For example,
> {code}
> select cast(empno as Integer) * (INTERVAL '1' DAY)
> from emp
> {code}
> results into
> {code}
> java.lang.AssertionError: Internal error: invalid literal: INTERVAL '1' DAY
> {code}
> The reason is that INTERVAL constant is not extracted properly in the cases 
> where this constant times a CAST() function



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6221) Decimal aggregations for NULL values result in 0.0 value

2018-05-07 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-6221.

Resolution: Fixed

Fixed in the scope of DRILL-6094

> Decimal aggregations for NULL values result in 0.0 value
> 
>
> Key: DRILL-6221
> URL: https://issues.apache.org/jira/browse/DRILL-6221
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.12.0
>Reporter: Andries Engelbrecht
>Assignee: Volodymyr Vysotskyi
>Priority: Minor
>
> If you sum a packed decimal field with a null value instead of null you get 
> 0.0.
>  
> select id, amt from hive.`default`.`packtest`
> 1 2.3
> 2 null
> 3 4.5
>  
> select sum(amt) from hive.`default`.`packtest` group by id
> 1 2.3
> 2 0.0
> 3 4.5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-920) var_samp(decimal38) cause internal assertion error

2018-05-07 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-920.
---
Resolution: Fixed

Fixed in the scope of DRILL-6094

> var_samp(decimal38) cause internal assertion error
> --
>
> Key: DRILL-920
> URL: https://issues.apache.org/jira/browse/DRILL-920
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Reporter: Chun Chang
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: Future
>
>
> #Mon Jun 02 10:18:35 PDT 2014
> git.commit.id.abbrev=8490d74
> The following query caused the internal assertion while applying rule reduce 
> aggregate rule. Note, it complains type mismatch, inferred type 
> decimal(19,19)???
> 0: jdbc:drill:schema=dfs> select var_samp(cast(c_decimal38 as 
> decimal(38,18))) from data where c_row < 15;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "beb1c5ab-6132-416c-a45d-49a20af8d416"
> endpoint {
>   address: "qa-node117.qa.lab"
>   user_port: 31010
>   control_port: 31011
>   data_port: 31012
> }
> error_type: 0
> message: "Failure while setting up Foreman. < AssertionError:[ Internal 
> error: Error while applying rule ReduceAggregatesRule, args 
> [rel#28051:AggregateRel.NONE.ANY([]).[](child=rel#28050:Subset#2.NONE.ANY([]).[],group={},EXPR$0=VAR_SAMP($0))]
>  ] < AssertionError:[ type mismatch:
> aggCall type:
> DECIMAL(38, 18)
> inferred type:
> DECIMAL(19, 19) ]"
> ]
> Error: exception while executing query (state=,code=0)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-4184) Drill does not support Parquet DECIMAL values in variable length BINARY fields

2018-05-07 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-4184.

Resolution: Fixed

Fixed in the scope of DRILL-6094

> Drill does not support Parquet DECIMAL values in variable length BINARY fields
> --
>
> Key: DRILL-4184
> URL: https://issues.apache.org/jira/browse/DRILL-4184
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.4.0
> Environment: Windows 7 Professional, Java 1.8.0_66
>Reporter: Dave Oshinsky
>Priority: Major
>
> Encoding a DECIMAL logical type in Parquet using the variable length BINARY 
> primitive type is not supported by Drill as of versions 1.3.0 and 1.4.0.  The 
> problem first surfaces with the ClassCastException shown below, but fixing 
> the immediate cause of the exception is not sufficient to support this 
> combination (DECIMAL, BINARY) in a Parquet file.
> In Drill, DECIMAL is currently assumed to be INT32, INT64, INT96, or 
> FIXED_LEN_BINARY_ARRAY.  Are there any plans to support DECIMAL with variable 
> length BINARY?  Avro definitely supports encoding DECIMAL in variable length 
> bytes (see https://avro.apache.org/docs/current/spec.html#Decimal), but this 
> support in Parquet is less clear.
> Selecting on a BINARY DECIMAL field in a parquet file throws an exception as 
> shown below (java.lang.ClassCastException: 
> org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to 
> org.apache.drill.exec.vector.VariableWidthVector).  The successful query at 
> bottom selected on a string field in the same file.
> 0: jdbc:drill:zk=local> select count(*) from 
> dfs.`c:/dao/DBArchivePredictor/tenrows.parquet` where acct_no=7020;
> org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet 
> recor
> d reader.
> Message: Failure in setting up reader
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message sbi.acct_mstr {
>   required binary ACCT_NO (DECIMAL(20,0));
>   optional binary SF_NO (UTF8);
>   optional binary LF_NO (UTF8);
>   optional binary BRANCH_NO (DECIMAL(20,0));
>   optional binary INTRO_CUST_NO (DECIMAL(20,0));
>   optional binary INTRO_ACCT_NO (DECIMAL(20,0));
>   optional binary INTRO_SIGN (UTF8);
>   optional binary TYPE (UTF8);
>   optional binary OPR_MODE (UTF8);
>   optional binary CUR_ACCT_TYPE (UTF8);
>   optional binary TITLE (UTF8);
>   optional binary CORP_CUST_NO (DECIMAL(20,0));
>   optional binary APLNDT (UTF8);
>   optional binary OPNDT (UTF8);
>   optional binary VERI_EMP_NO (DECIMAL(20,0));
>   optional binary VERI_SIGN (UTF8);
>   optional binary MANAGER_SIGN (UTF8);
>   optional binary CURBAL (DECIMAL(8,2));
>   optional binary STATUS (UTF8);
> }
> , metadata: 
> {parquet.avro.schema={"type":"record","name":"acct_mstr","namespace"
> :"sbi","fields":[{"name":"ACCT_NO","type":{"type":"bytes","logicalType":"decimal
> ","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_co
> lumn_class":"java.math.BigDecimal","cv_connection":"oracle.jdbc.driver.T4CConnec
> tion","cv_currency":true,"cv_def_writable":false,"cv_nullable":0,"cv_precision":
> 20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_s
> ubscript":1,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}},{"name":"SF_
> NO","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":tru
> e,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":fal
> se,"cv_nullable":1,"cv_precision":10,"cv_read_only":false,"cv_scale":0,"cv_searc
> hable":true,"cv_signed":true,"cv_subscript":2,"cv_type":12,"cv_typename":"VARCHA
> R2","cv_writable":true}]},{"name":"LF_NO","type":["null",{"type":"string","cv_au
> to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv
> _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":10,"cv_r
> ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript
> ":3,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"BRANCH_
> NO","type":["null",{"type":"bytes","logicalType":"decimal","precision":20,"scale
> ":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"java.math.
> BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullable":1,"cv_preci
> sion":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true
> ,"cv_subscript":4,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}]},{"nam
> e":"INTRO_CUST_NO","type":["null",{"type":"bytes","logicalType":"decimal","preci
> sion":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_cla
> ss":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullab
> 

[jira] [Commented] (DRILL-4184) Drill does not support Parquet DECIMAL values in variable length BINARY fields

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465517#comment-16465517
 ] 

ASF GitHub Bot commented on DRILL-4184:
---

vvysotskyi commented on issue #372: DRILL-4184: support variable length decimal 
fields in parquet
URL: https://github.com/apache/drill/pull/372#issuecomment-386975850
 
 
   @daveoshinsky, could you please close this PR, since it was fixed in the 
scope of DRILL-6094


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill does not support Parquet DECIMAL values in variable length BINARY fields
> --
>
> Key: DRILL-4184
> URL: https://issues.apache.org/jira/browse/DRILL-4184
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.4.0
> Environment: Windows 7 Professional, Java 1.8.0_66
>Reporter: Dave Oshinsky
>Priority: Major
>
> Encoding a DECIMAL logical type in Parquet using the variable length BINARY 
> primitive type is not supported by Drill as of versions 1.3.0 and 1.4.0.  The 
> problem first surfaces with the ClassCastException shown below, but fixing 
> the immediate cause of the exception is not sufficient to support this 
> combination (DECIMAL, BINARY) in a Parquet file.
> In Drill, DECIMAL is currently assumed to be INT32, INT64, INT96, or 
> FIXED_LEN_BINARY_ARRAY.  Are there any plans to support DECIMAL with variable 
> length BINARY?  Avro definitely supports encoding DECIMAL in variable length 
> bytes (see https://avro.apache.org/docs/current/spec.html#Decimal), but this 
> support in Parquet is less clear.
> Selecting on a BINARY DECIMAL field in a parquet file throws an exception as 
> shown below (java.lang.ClassCastException: 
> org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to 
> org.apache.drill.exec.vector.VariableWidthVector).  The successful query at 
> bottom selected on a string field in the same file.
> 0: jdbc:drill:zk=local> select count(*) from 
> dfs.`c:/dao/DBArchivePredictor/tenrows.parquet` where acct_no=7020;
> org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet 
> recor
> d reader.
> Message: Failure in setting up reader
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message sbi.acct_mstr {
>   required binary ACCT_NO (DECIMAL(20,0));
>   optional binary SF_NO (UTF8);
>   optional binary LF_NO (UTF8);
>   optional binary BRANCH_NO (DECIMAL(20,0));
>   optional binary INTRO_CUST_NO (DECIMAL(20,0));
>   optional binary INTRO_ACCT_NO (DECIMAL(20,0));
>   optional binary INTRO_SIGN (UTF8);
>   optional binary TYPE (UTF8);
>   optional binary OPR_MODE (UTF8);
>   optional binary CUR_ACCT_TYPE (UTF8);
>   optional binary TITLE (UTF8);
>   optional binary CORP_CUST_NO (DECIMAL(20,0));
>   optional binary APLNDT (UTF8);
>   optional binary OPNDT (UTF8);
>   optional binary VERI_EMP_NO (DECIMAL(20,0));
>   optional binary VERI_SIGN (UTF8);
>   optional binary MANAGER_SIGN (UTF8);
>   optional binary CURBAL (DECIMAL(8,2));
>   optional binary STATUS (UTF8);
> }
> , metadata: 
> {parquet.avro.schema={"type":"record","name":"acct_mstr","namespace"
> :"sbi","fields":[{"name":"ACCT_NO","type":{"type":"bytes","logicalType":"decimal
> ","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_co
> lumn_class":"java.math.BigDecimal","cv_connection":"oracle.jdbc.driver.T4CConnec
> tion","cv_currency":true,"cv_def_writable":false,"cv_nullable":0,"cv_precision":
> 20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_s
> ubscript":1,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}},{"name":"SF_
> NO","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":tru
> e,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":fal
> se,"cv_nullable":1,"cv_precision":10,"cv_read_only":false,"cv_scale":0,"cv_searc
> hable":true,"cv_signed":true,"cv_subscript":2,"cv_type":12,"cv_typename":"VARCHA
> R2","cv_writable":true}]},{"name":"LF_NO","type":["null",{"type":"string","cv_au
> to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv
> _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":10,"cv_r
> ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript
> ":3,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"BRANCH_
> NO","type":["null",{"type":"bytes","logicalType":"decimal","precision":20,"scale
> 

[jira] [Resolved] (DRILL-1005) stddev_pop(decimal) cause internal error

2018-05-07 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-1005.

Resolution: Fixed

Fixed in the scope of DRILL-6094

> stddev_pop(decimal) cause internal error
> 
>
> Key: DRILL-1005
> URL: https://issues.apache.org/jira/browse/DRILL-1005
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Reporter: Chun Chang
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: Future
>
>
> split JIRA920 to cover each function. this one covers stddev_pop(). for 
> detail, please see JIRA920.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-1003) var_pop(decimal) cause internal error

2018-05-07 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-1003.

Resolution: Fixed

Fixed in the scope of DRILL-6094

> var_pop(decimal) cause internal error
> -
>
> Key: DRILL-1003
> URL: https://issues.apache.org/jira/browse/DRILL-1003
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Reporter: Chun Chang
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: Future
>
>
> want to split JIRA920 to cover each function. this JIRA covers function 
> var_pop(). For detail, please see JIRA920



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-1004) stddev_samp(decimal) cause interval error

2018-05-07 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-1004.

Resolution: Fixed

Fixed in the scope of DRILL-6094

> stddev_samp(decimal) cause interval error
> -
>
> Key: DRILL-1004
> URL: https://issues.apache.org/jira/browse/DRILL-1004
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Reporter: Chun Chang
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: Future
>
>
> split JIRA920 to cover each functions. this JIRA covers stddev_samp(). for 
> detail, please see JIRA920.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex

2018-05-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465511#comment-16465511
 ] 

ASF GitHub Bot commented on DRILL-4834:
---

vvysotskyi commented on issue #570: DRILL-4834 decimal implementation is 
vulnerable to overflow errors, and extremely complex
URL: https://github.com/apache/drill/pull/570#issuecomment-386975060
 
 
   @daveoshinsky, could you please close this PR, since it was fixed in the 
scope of DRILL-6094


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> decimal implementation is vulnerable to overflow errors, and extremely complex
> --
>
> Key: DRILL-4834
> URL: https://issues.apache.org/jira/browse/DRILL-4834
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
> Environment: Drill 1.7 on any platform
>Reporter: Dave Oshinsky
>Assignee: Dave Oshinsky
>Priority: Major
> Fix For: 1.14.0
>
>
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java 
> template to handle the situation where a precision is not supplied (i.e., the 
> supplied precision is zero) for an integer value that is to be casted to a 
> decimal.  The Drill decimal implementation uses a limited selection of fixed 
> decimal precision data types (the total number of decimal digits, i.e., 
> Decimal9, 18, 28, 38) to represent decimal values.  If the destination 
> precision is too small to represent the input integer that is being casted, 
> there is no clean way to deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to 
> more efficient use of memory, it often will actually lead to less efficient 
> use of memory (when the fixed precision is specified significantly larger 
> than is actually needed to represent the numbers), and it results in a 
> tremendous mushrooming of the complexity of the code.  For each fixed 
> precision (and there are only a limited set of selections, 9, 18, 28, 38, 
> which itself leads to memory inefficiency), there is a separate set of code 
> generated from templates.  For each pairwise combination of decimal or 
> non-decimal numeric types, there are multiple places in the code where 
> conversions must be handled, or conditions must be included to handle the 
> difference in precision between the two types.  A one-size-fits-all approach 
> (using a variable width vector to represent any decimal precision) would 
> usually be more memory-efficient (since precisions are often over-specified), 
> and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-5858) case expression using decimal expression causes Assignment conversion not possible

2018-05-07 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-5858.

Resolution: Fixed

Fixed in the scope of DRILL-6094

> case expression using decimal expression causes Assignment conversion not 
> possible
> --
>
> Key: DRILL-5858
> URL: https://issues.apache.org/jira/browse/DRILL-5858
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.11.0
> Environment: Drill 1.11 decimal type support enabled
>Reporter: N Campbell
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Attachments: decimal_drill_exception.txt, parquet.tar.gz
>
>
> The error appears to be specific to an expression involving a decimal type 
> within a case expression. If the math expressions are projected on their own 
> the error is not thrown.
> Assignment conversion not possible from type 
> "org.apache.drill.exec.expr.holders.NullableDecimal28SparseHolder" to type 
> "org.apache.drill.exec.expr.holders.NullableDecimal38SparseHolder"
> select  
> CASE when 'A' = 'A' THEN FIN_FINANCE_FACT.AMOUNT_MONTH * - 1 ELSE 
> FIN_FINANCE_FACT.AMOUNT_MONTH  * 1 END  AS STMT_MONTH, 
> CASE WHEN 'A' = 'A'  THEN FIN_FINANCE_FACT.AMOUNT_YEAR_TO_DATE * - 1 ELSE 
> FIN_FINANCE_FACT.AMOUNT_YEAR_TO_DATE  * 1 END AS STMT_YEAR
> FROM dfs.gosalesdw1021p.FIN_FINANCE_FACT 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-5390) Casting as decimal does not make drill use the decimal value vector

2018-05-07 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-5390.

Resolution: Fixed

Fixed in the scope of DRILL-6094

> Casting as decimal does not make drill use the decimal value vector
> ---
>
> Key: DRILL-5390
> URL: https://issues.apache.org/jira/browse/DRILL-5390
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>
> The below query should be using the decimal value vector. However it looks 
> like it is using the float vector. If we feed the output of the below query 
> to a CTAS statement then the parquet file created has a double type instead 
> of a decimal type
> {code}
> alter session set `planner.enable_decimal_data_type` = true;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | planner.enable_decimal_data_type updated.  |
> +---++
> 1 row selected (0.39 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select typeof(col2) from (select 1 as 
> col1, cast(2.0 as decimal(9,2)) as col2, cast(3.0 as decimal(9,2)) as col3 
> from cp.`tpch/lineitem.parquet` limit 1) d;
> +-+
> | EXPR$0  |
> +-+
> | FLOAT8  |
> +-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-3909) Decimal round functions corrupts input data

2018-05-07 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-3909.

Resolution: Fixed

Fixed in the scope of DRILL-6094

> Decimal round functions corrupts input data
> ---
>
> Key: DRILL-3909
> URL: https://issues.apache.org/jira/browse/DRILL-3909
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: Future
>
>
> The Decimal 28 and 38 round functions, instead of creating a new buffer and 
> copying data from the incoming buffer, set the output buffer equal to the 
> input buffer, and then subsequently mutate the data in that buffer. This 
> causes the data in the input buffer to be corrupted.
> A simple example to reproduce:
> {code}
> $ cat a.json
> { a : "9.95678" }
> 0: jdbc:drill:drillbit=localhost> create table a as select cast(a as 
> decimal(38,18)) a from `a.json`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 1  |
> +---++
> 1 row selected (0.206 seconds)
> 0: jdbc:drill:drillbit=localhost> select round(a, 9) from a;
> +---+
> |EXPR$0 |
> +---+
> | 10.0  |
> +---+
> 1 row selected (0.121 seconds)
> 0: jdbc:drill:drillbit=localhost> select round(a, 11) from a;
> ++
> | EXPR$0 |
> ++
> | 9.957  |
> ++
> 1 row selected (0.115 seconds)
> 0: jdbc:drill:drillbit=localhost> select round(a, 9), round(a, 11) from a;
> +---++
> |EXPR$0 | EXPR$1 |
> +---++
> | 10.0  | 1.000  |
> +---++
> {code}
> In the third example, there are two round expressions operating on the same 
> incoming decimal vector, and you can see that the result for the second 
> expression is incorrect.
> Not critical because Decimal type is considered alpha right now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-2101) Decimal literals are treated as double

2018-05-07 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-2101.

Resolution: Fixed

Fixed in the scope of DRILL-6094

> Decimal literals are treated as double
> --
>
> Key: DRILL-2101
> URL: https://issues.apache.org/jira/browse/DRILL-2101
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 0.8.0
>Reporter: Victoria Markman
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: decimal
> Fix For: Future
>
> Attachments: DRILL-2101-PARTIAL-PATCH-enable-decimal-literals.patch, 
> DRILL-2101.patch
>
>
> {code}
> create table t1(c1) as
> select
> cast(null as decimal(28,4))
> from `t1.csv`;
> message root {
>   optional double c1; <-- Wrong, should be decimal
> }
> {code}
> This is very commonly used construct to convert csv files to parquet files, 
> that's why I'm marking this bug as critical.
> {code}
> create table t2 as 
> select
> case when columns[3] = '' then cast(null as decimal(28,4)) else 
> cast(columns[3] as decimal(28, 4)) end
> from `t1.csv`;
> {code}
> Correct - cast string literal to decimal
> {code}
> create table t3(c1) as
> select
> cast('12345678901234567890.1234' as decimal(28,4))
> from `t1.csv`;
> message root {
>   required fixed_len_byte_array(12) c1 (DECIMAL(28,4));
> }
> {code}
> Correct - cast literal from csv file as decimal
> {code}
> create table t4(c1) as
> select
> cast(columns[3] as decimal(28,4))
> from `t1.csv`;
> message root {
>   optional fixed_len_byte_array(12) c1 (DECIMAL(28,4));
> }
> {code}
> Correct - case statement (no null involved)
> {code}
> create table t5(c1) as
> select
> case when columns[3] = '' then cast('' as decimal(28,4)) else 
> cast(columns[3] as decimal(28,4)) end
> from `t1.csv`;
> message root {
>   optional fixed_len_byte_array(12) c1 (DECIMAL(28,4));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)