[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes
[ https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466861#comment-16466861 ] ASF GitHub Bot commented on DRILL-5913: --- vvysotskyi commented on issue #1016: DRILL-5913:solve the mixed processing of same functions with same inputRefs but di… URL: https://github.com/apache/drill/pull/1016#issuecomment-387284020 @weijietong, could you please check that this bug is still reproduced on current master? I tried a query from the Jira description and it is finished successfully. I suppose it was fixed in the scope of Calcite upgrade. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DrillReduceAggregatesRule mixed the same functions of the same inputRef which > have different dataTypes > --- > > Key: DRILL-5913 > URL: https://issues.apache.org/jira/browse/DRILL-5913 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Priority: Major > > sample query: > {code:java} > select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as > int)) as col2 from cp.`employee.json` > {code} > error info: > {code:java} > org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > AssertionError: Type mismatch: > rel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT > NULL > equivRel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL > [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Internal error: Error while applying rule > DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.drill.exec.work.foreman.Foreman.run():294 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.AssertionError) Internal error: Error while applying > rule DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.calcite.util.Util.newInternal():792 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811 > {code} > The reason is that stddev_samp(cast(employee_id as int)) will be reduced as > sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be > reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. > The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too > . But this sum0($0) 's data type is different from the first time's sum0($0) > : one is integer ,the other is bigint . But Calcite's addAggCall method treat > them as the same by ignoring their data type. This leads to the bigint > sum0($0) be replaced by the integer sum0($0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)
[ https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466845#comment-16466845 ] weijie.tong commented on DRILL-6385: [~amansinha100] sorry for missing your ever propose. My work is just a start-up. I ever had the same plan to send the bloom filter across the exchange boundary but it is difficult to solve the mass RPC network between senders and receivers. After deep insight what the impala code has done,it gives me an inspire to simple the RPC exchanging mode by contacting to the foreman node. After all ,still appreciate your sharing and advices. > Support JPPD (Join Predicate Push Down) > --- > > Key: DRILL-6385 > URL: https://issues.apache.org/jira/browse/DRILL-6385 > Project: Apache Drill > Issue Type: New Feature > Components: Server, Execution - Flow >Reporter: weijie.tong >Assignee: weijie.tong >Priority: Major > > This feature is to support the JPPD (Join Predicate Push Down). It will > benefit the HashJoin ,Broadcast HashJoin performance by reducing the number > of rows to send across the network ,the memory consumed. This feature is > already supported by Impala which calls it RuntimeFilter > ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]). > The first PR will try to push down a bloom filter of HashJoin node to > Parquet’s scan node. The propose basic procedure is described as follow: > # The HashJoin build side accumulate the equal join condition rows to > construct a bloom filter. Then it sends out the bloom filter to the foreman > node. > # The foreman node accept the bloom filters passively from all the fragments > that has the HashJoin operator. It then aggregates the bloom filters to form > a global bloom filter. > # The foreman node broadcasts the global bloom filter to all the probe side > scan nodes which maybe already have send out partial data to the hash join > nodes(currently the hash join node will prefetch one batch from both sides ). > 4. The scan node accepts a global bloom filter from the foreman node. > It will filter the rest rows satisfying the bloom filter. > > To implement above execution flow, some main new notion described as below: > 1. RuntimeFilter > It’s a filter container which may contain BloomFilter or MinMaxFilter. > 2. RuntimeFilterReporter > It wraps the logic to send hash join’s bloom filter to the foreman.The > serialized bloom filter will be sent out through the data tunnel.This object > will be instanced by the FragmentExecutor and passed to the > FragmentContext.So the HashJoin operator can obtain it through the > FragmentContext. > 3. RuntimeFilterRequestHandler > It is responsible to accept a SendRuntimeFilterRequest RPC to strip the > actual BloomFilter from the network. It then translates this filter to the > WorkerBee’s new interface registerRuntimeFilter. > Another RPC type is BroadcastRuntimeFilterRequest. It will register the > accepted global bloom filter to the WorkerBee by the registerRuntimeFilter > method and then propagate to the FragmentContext through which the probe side > scan node can fetch the aggregated bloom filter. > 4.RuntimeFilterManager > The foreman will instance a RuntimeFilterManager .It will indirectly get > every RuntimeFilter by the WorkerBee. Once all the BloomFilters have been > accepted and aggregated . It will broadcast the aggregated bloom filter to > all the probe side scan nodes through the data tunnel by a > BroadcastRuntimeFilterRequest RPC. > 5. RuntimeFilterEnableOption > A global option will be added to decide whether to enable this new feature. > > Welcome suggestion and advice from you.The related PR will be presented as > soon as possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5270) Improve loading of profiles listing in the WebUI
[ https://issues.apache.org/jira/browse/DRILL-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466821#comment-16466821 ] ASF GitHub Bot commented on DRILL-5270: --- ilooner commented on issue #1250: DRILL-5270: Improve loading of profiles listing in the WebUI URL: https://github.com/apache/drill/pull/1250#issuecomment-387275507 @kkhatua Why not use the Guava Cache? http://www.baeldung.com/guava-cache . I think it would simplify the implementation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve loading of profiles listing in the WebUI > > > Key: DRILL-5270 > URL: https://issues.apache.org/jira/browse/DRILL-5270 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.9.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Fix For: 1.14.0 > > > Currently, as the number of profiles increase, we reload the same list of > profiles from the FS. > An ideal improvement would be to detect if there are any new profiles and > only reload from the disk then. Otherwise, a cached list is sufficient. > For a directory of 280K profiles, the load time is close to 6 seconds on a 32 > core server. With the caching, we can get it down to as much as a few > milliseconds. > To render the cache as invalid, we inspect the last modified time of the > directory to confirm whether a reload is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes
[ https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466813#comment-16466813 ] ASF GitHub Bot commented on DRILL-5913: --- weijietong commented on issue #1016: DRILL-5913:solve the mixed processing of same functions with same inputRefs but di… URL: https://github.com/apache/drill/pull/1016#issuecomment-387274177 @KulykRoman seems you are familiar with this part of codes . Could you also take look at this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DrillReduceAggregatesRule mixed the same functions of the same inputRef which > have different dataTypes > --- > > Key: DRILL-5913 > URL: https://issues.apache.org/jira/browse/DRILL-5913 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Priority: Major > > sample query: > {code:java} > select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as > int)) as col2 from cp.`employee.json` > {code} > error info: > {code:java} > org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > AssertionError: Type mismatch: > rel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT > NULL > equivRel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL > [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Internal error: Error while applying rule > DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.drill.exec.work.foreman.Foreman.run():294 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.AssertionError) Internal error: Error while applying > rule DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.calcite.util.Util.newInternal():792 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811 > {code} > The reason is that stddev_samp(cast(employee_id as int)) will be reduced as > sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be > reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. > The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too > . But this sum0($0) 's data type is different from the first time's sum0($0) > : one is integer ,the other is bigint . But Calcite's addAggCall method treat > them as the same by ignoring their data type. This leads to the bigint > sum0($0) be replaced by the integer sum0($0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)
[ https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466808#comment-16466808 ] Aman Sinha commented on DRILL-6385: --- Sending a link to a short design overview doc [2] I had proposed during the Drill design hackathon [1] in September 2017. The proposal was to send the bloom filter past exchange boundary rather than sending to the foreman. However, this is not implemented, so your contribution would be welcome. I think doing the hash partitioned hash join first seems fine since that's the one that would benefit the most. Looking forward to your pull request ! [[1] https://lists.apache.org/thread.html/74cf48dd78d323535dc942c969e72008884e51f8715f4a20f6f8fb66@%3Cdev.drill.apache.org%3E|https://lists.apache.org/thread.html/74cf48dd78d323535dc942c969e72008884e51f8715f4a20f6f8fb66@%3Cdev.drill.apache.org%3E] [2] [https://docs.google.com/document/d/1cNznfv60wwuFJlbKwkVbCBNGSBlY5QbjYNgglPw8JQ0/edit?usp=sharing] > Support JPPD (Join Predicate Push Down) > --- > > Key: DRILL-6385 > URL: https://issues.apache.org/jira/browse/DRILL-6385 > Project: Apache Drill > Issue Type: New Feature > Components: Server, Execution - Flow >Reporter: weijie.tong >Assignee: weijie.tong >Priority: Major > > This feature is to support the JPPD (Join Predicate Push Down). It will > benefit the HashJoin ,Broadcast HashJoin performance by reducing the number > of rows to send across the network ,the memory consumed. This feature is > already supported by Impala which calls it RuntimeFilter > ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]). > The first PR will try to push down a bloom filter of HashJoin node to > Parquet’s scan node. The propose basic procedure is described as follow: > # The HashJoin build side accumulate the equal join condition rows to > construct a bloom filter. Then it sends out the bloom filter to the foreman > node. > # The foreman node accept the bloom filters passively from all the fragments > that has the HashJoin operator. It then aggregates the bloom filters to form > a global bloom filter. > # The foreman node broadcasts the global bloom filter to all the probe side > scan nodes which maybe already have send out partial data to the hash join > nodes(currently the hash join node will prefetch one batch from both sides ). > 4. The scan node accepts a global bloom filter from the foreman node. > It will filter the rest rows satisfying the bloom filter. > > To implement above execution flow, some main new notion described as below: > 1. RuntimeFilter > It’s a filter container which may contain BloomFilter or MinMaxFilter. > 2. RuntimeFilterReporter > It wraps the logic to send hash join’s bloom filter to the foreman.The > serialized bloom filter will be sent out through the data tunnel.This object > will be instanced by the FragmentExecutor and passed to the > FragmentContext.So the HashJoin operator can obtain it through the > FragmentContext. > 3. RuntimeFilterRequestHandler > It is responsible to accept a SendRuntimeFilterRequest RPC to strip the > actual BloomFilter from the network. It then translates this filter to the > WorkerBee’s new interface registerRuntimeFilter. > Another RPC type is BroadcastRuntimeFilterRequest. It will register the > accepted global bloom filter to the WorkerBee by the registerRuntimeFilter > method and then propagate to the FragmentContext through which the probe side > scan node can fetch the aggregated bloom filter. > 4.RuntimeFilterManager > The foreman will instance a RuntimeFilterManager .It will indirectly get > every RuntimeFilter by the WorkerBee. Once all the BloomFilters have been > accepted and aggregated . It will broadcast the aggregated bloom filter to > all the probe side scan nodes through the data tunnel by a > BroadcastRuntimeFilterRequest RPC. > 5. RuntimeFilterEnableOption > A global option will be added to decide whether to enable this new feature. > > Welcome suggestion and advice from you.The related PR will be presented as > soon as possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5913) DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes
[ https://issues.apache.org/jira/browse/DRILL-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466802#comment-16466802 ] ASF GitHub Bot commented on DRILL-5913: --- weijietong commented on issue #1016: DRILL-5913:solve the mixed processing of same functions with same inputRefs but di… URL: https://github.com/apache/drill/pull/1016#issuecomment-387271505 @vvysotskyi @amansinha100 could you take a look at this PR. I ever contact with @julianhyde . Since Calcite treats stddev stddev_samp input parameter data type as their original data type,no cast behavior happens at its` AggregateReduceFunctionsRule` implementation.So this error will not happen at Calcite. So this PR changes our Drill own `DrillReduceAggregatesRule` implementation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DrillReduceAggregatesRule mixed the same functions of the same inputRef which > have different dataTypes > --- > > Key: DRILL-5913 > URL: https://issues.apache.org/jira/browse/DRILL-5913 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Affects Versions: 1.9.0, 1.11.0 >Reporter: weijie.tong >Priority: Major > > sample query: > {code:java} > select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as > int)) as col2 from cp.`employee.json` > {code} > error info: > {code:java} > org.apache.drill.exec.rpc.RpcException: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > AssertionError: Type mismatch: > rel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT > NULL > equivRel rowtype: > RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL > [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Internal error: Error while applying rule > DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.drill.exec.work.foreman.Foreman.run():294 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.AssertionError) Internal error: Error while applying > rule DrillReduceAggregatesRule, args > [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))] > org.apache.calcite.util.Util.newInternal():792 > org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251 > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811 > {code} > The reason is that stddev_samp(cast(employee_id as int)) will be reduced as > sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be > reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. > The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too > . But this sum0($0) 's data type is different from the first time's sum0($0) > : one is integer ,the other is bigint . But Calcite's addAggCall method treat > them as the same by ignoring their data type. This leads to the bigint > sum0($0) be replaced by the integer sum0($0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466725#comment-16466725 ] ASF GitHub Bot commented on DRILL-6348: --- vrozov commented on a change in pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver reports its memory … URL: https://github.com/apache/drill/pull/1237#discussion_r186595478 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/MergingReceiverCreator.java ## @@ -44,6 +44,11 @@ public MergingRecordBatch getBatch(ExecutorFragmentContext context, assert bufHolder != null : "IncomingBuffers must be defined for any place a receiver is declared."; RawBatchBuffer[] buffers = bufHolder.getBuffers(receiver.getOppositeMajorFragmentId()); -return new MergingRecordBatch(context, receiver, buffers); +MergingRecordBatch mergeReceiver = new MergingRecordBatch(context, receiver, buffers); + +// Register this operator's buffer allocator so that incoming buffers are owned by this allocator +bufHolder.setOprAllocator(receiver.getOppositeMajorFragmentId(), mergeReceiver.getOprAllocator()); Review comment: Consider moving registration of the buffer allocator inside `MerginRecordBatch` constructor (change constructor to accept `ExchangeFragmentContext` and `MergingReceiverPOP` only). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466726#comment-16466726 ] ASF GitHub Bot commented on DRILL-6348: --- vrozov commented on a change in pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver reports its memory … URL: https://github.com/apache/drill/pull/1237#discussion_r186595821 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverCreator.java ## @@ -40,6 +40,11 @@ public UnorderedReceiverBatch getBatch(ExecutorFragmentContext context, Unordere RawBatchBuffer[] buffers = bufHolder.getBuffers(receiver.getOppositeMajorFragmentId()); assert buffers.length == 1; RawBatchBuffer buffer = buffers[0]; -return new UnorderedReceiverBatch(context, buffer, receiver); +UnorderedReceiverBatch receiverBatch = new UnorderedReceiverBatch(context, buffer, receiver); Review comment: The same as for `MerginRecordBatch`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466724#comment-16466724 ] ASF GitHub Bot commented on DRILL-6348: --- vrozov commented on a change in pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver reports its memory … URL: https://github.com/apache/drill/pull/1237#discussion_r186597070 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/IncomingBuffers.java ## @@ -129,6 +134,10 @@ public int getRemainingRequired() { return collectorMap.get(senderMajorFragmentId).getBuffers(); } + public void setOprAllocator(int senderMajorFragmentId, BufferAllocator oprAllocator) { Review comment: Consider introducing `getCollector(int senderMajorFragmentId)` instead of `setOprAllocator` and `getBuffers`; This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)
[ https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466675#comment-16466675 ] weijie.tong commented on DRILL-6385: [~amansinha100] thanks for your advice. It's here just to inform the devs about the implementation. I have been working on our own storage layer for a long time. It's a delay to implement this feature ,as I ever noticed this proposal at the dev list. So now discussion is also encouraged and welcome. To your question , I think I have not described well in the above message. Since partitioned hash join is the actual main operator, my first PR is to support the partitioned hash join and the above description is about a partitioned hash join . To the broadcast hash join case, my plan is that the build side still needs to send its bloom filter to the foreman. The difference is that the foreman broadcasts the bloom filter as soon as it accepted a first arrived one ,no need to wait for all the bloom filter from all other nodes( since the distributed table acts as the build side table). Through this way ,we follow the same work flow rule though no contact to foreman has better performance. " is a global bloom filter always needed or a local bloom filter will suffice in certain cases" there's no evidence to definitely choose one strategy. To partitioned hash join a aggregated global bloom filter will filter more rows from the probe side scan . This is also what the impala does. Still needs some heuristic statistics plan to choose whether we still need the runtime filter at the runtime, since the better filter scenario is the build side has low percentage joined rows according to its total table row numbers. " does it mean that a 'global bloom filter' is a synchronization point in your proposal " there's no synchronization at the hash join node. To partitioned hash join , only the foreman needs to wait for all the bloom filter from all the partitioned nodes to aggregate to a global one. The hash join nodes has no relationship to each other ,they continue to work parallel. > Support JPPD (Join Predicate Push Down) > --- > > Key: DRILL-6385 > URL: https://issues.apache.org/jira/browse/DRILL-6385 > Project: Apache Drill > Issue Type: New Feature > Components: Server, Execution - Flow >Reporter: weijie.tong >Assignee: weijie.tong >Priority: Major > > This feature is to support the JPPD (Join Predicate Push Down). It will > benefit the HashJoin ,Broadcast HashJoin performance by reducing the number > of rows to send across the network ,the memory consumed. This feature is > already supported by Impala which calls it RuntimeFilter > ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]). > The first PR will try to push down a bloom filter of HashJoin node to > Parquet’s scan node. The propose basic procedure is described as follow: > # The HashJoin build side accumulate the equal join condition rows to > construct a bloom filter. Then it sends out the bloom filter to the foreman > node. > # The foreman node accept the bloom filters passively from all the fragments > that has the HashJoin operator. It then aggregates the bloom filters to form > a global bloom filter. > # The foreman node broadcasts the global bloom filter to all the probe side > scan nodes which maybe already have send out partial data to the hash join > nodes(currently the hash join node will prefetch one batch from both sides ). > 4. The scan node accepts a global bloom filter from the foreman node. > It will filter the rest rows satisfying the bloom filter. > > To implement above execution flow, some main new notion described as below: > 1. RuntimeFilter > It’s a filter container which may contain BloomFilter or MinMaxFilter. > 2. RuntimeFilterReporter > It wraps the logic to send hash join’s bloom filter to the foreman.The > serialized bloom filter will be sent out through the data tunnel.This object > will be instanced by the FragmentExecutor and passed to the > FragmentContext.So the HashJoin operator can obtain it through the > FragmentContext. > 3. RuntimeFilterRequestHandler > It is responsible to accept a SendRuntimeFilterRequest RPC to strip the > actual BloomFilter from the network. It then translates this filter to the > WorkerBee’s new interface registerRuntimeFilter. > Another RPC type is BroadcastRuntimeFilterRequest. It will register the > accepted global bloom filter to the WorkerBee by the registerRuntimeFilter > method and then propagate to the FragmentContext through which the probe side > scan node can fetch the aggregated bloom filter. > 4.RuntimeFilterManager > The foreman will instance a RuntimeFilterManager .It will indirectly
[jira] [Created] (DRILL-6389) Fix Javadoc Warnings In drill-rpc, drill-memory-base, drill-logical, and drill-common
Timothy Farkas created DRILL-6389: - Summary: Fix Javadoc Warnings In drill-rpc, drill-memory-base, drill-logical, and drill-common Key: DRILL-6389 URL: https://issues.apache.org/jira/browse/DRILL-6389 Project: Apache Drill Issue Type: Improvement Reporter: Timothy Farkas Assignee: Timothy Farkas There are many warnings when running {code} mvn javadoc:javadoc {code} The goal is to eventually fix all the warnings and fail the build if there are any javadoc warnings or errors introduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6348) Unordered Receiver does not report its memory usage
[ https://issues.apache.org/jira/browse/DRILL-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466568#comment-16466568 ] ASF GitHub Bot commented on DRILL-6348: --- sachouche commented on issue #1237: DRILL-6348: Fixed code so that Unordered Receiver reports its memory … URL: https://github.com/apache/drill/pull/1237#issuecomment-387229629 Met with @parthchandra and @vrozov to discuss a more comprehensive fix: **Agreement** It was agreed that received batches should be owned by the associated receiver (not the fragment). This association is done at the framework level (Data Collector) so that the receiver doesn't have to perform any extra processing (such as explicit draining); this is to ensure that no side effect will occur (e.g., acknowledgment logic since it is sensitive to operator record consumption) **Fix** - Modified the Unordered & Merge receivers to register their buffer allocators with the associated Data Collector - The IncomingBuffers class now uses the operator's buffer allocator instead of the fragment allocator This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unordered Receiver does not report its memory usage > --- > > Key: DRILL-6348 > URL: https://issues.apache.org/jira/browse/DRILL-6348 > Project: Apache Drill > Issue Type: Task > Components: Execution - Flow >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > The Drill Profile functionality doesn't show any memory usage for the > Unordered Receiver operator. This is problematic when analyzing OOM > conditions since we cannot account for all of a query memory usage. This Jira > is to fix memory reporting for the Unordered Receiver operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6386) Disallow Unused Imports In Checkstyle
[ https://issues.apache.org/jira/browse/DRILL-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Farkas updated DRILL-6386: -- Reviewer: Kunal Khatua > Disallow Unused Imports In Checkstyle > - > > Key: DRILL-6386 > URL: https://issues.apache.org/jira/browse/DRILL-6386 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6388) Disallow indenting with more than 2 spaces.
Timothy Farkas created DRILL-6388: - Summary: Disallow indenting with more than 2 spaces. Key: DRILL-6388 URL: https://issues.apache.org/jira/browse/DRILL-6388 Project: Apache Drill Issue Type: Improvement Reporter: Timothy Farkas Assignee: Timothy Farkas Enforce the two space indenting style guideline as specified here: http://drill.apache.org/docs/apache-drill-contribution-guidelines/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6386) Disallow Unused Imports In Checkstyle
[ https://issues.apache.org/jira/browse/DRILL-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466543#comment-16466543 ] ASF GitHub Bot commented on DRILL-6386: --- ilooner commented on issue #1252: DRILL-6386: Disallowed unused imports and removed them. URL: https://github.com/apache/drill/pull/1252#issuecomment-387224347 @vrozov @kkhatua Please review. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disallow Unused Imports In Checkstyle > - > > Key: DRILL-6386 > URL: https://issues.apache.org/jira/browse/DRILL-6386 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6386) Disallow Unused Imports In Checkstyle
[ https://issues.apache.org/jira/browse/DRILL-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466542#comment-16466542 ] ASF GitHub Bot commented on DRILL-6386: --- ilooner opened a new pull request #1252: DRILL-6386: Disallowed unused imports and removed them. URL: https://github.com/apache/drill/pull/1252 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disallow Unused Imports In Checkstyle > - > > Key: DRILL-6386 > URL: https://issues.apache.org/jira/browse/DRILL-6386 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6242) Output format for nested date, time, timestamp values in an object hierarchy
[ https://issues.apache.org/jira/browse/DRILL-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466507#comment-16466507 ] ASF GitHub Bot commented on DRILL-6242: --- jiang-wu commented on issue #1247: DRILL-6242 Use java.time.Local{Date|Time|DateTime} for Drill Date, Time, and Timestamp types URL: https://github.com/apache/drill/pull/1247#issuecomment-387215803 @vvysotskyi rebased and updated the formatting to use 2 spaces. Please take a look and see if things look right. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Output format for nested date, time, timestamp values in an object hierarchy > > > Key: DRILL-6242 > URL: https://issues.apache.org/jira/browse/DRILL-6242 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.12.0 >Reporter: Jiang Wu >Assignee: Jiang Wu >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Some storages (mapr db, mongo db, etc.) have hierarchical objects that > contain nested fields of date, time, timestamp types. When a query returns > these objects, the output format for the nested date, time, timestamp, are > showing the internal object (org.joda.time.DateTime), rather than the logical > data value. > For example. Suppose in MongoDB, we have a single object that looks like > this: > {code:java} > > db.test.findOne(); > { > "_id" : ObjectId("5aa8487d470dd39a635a12f5"), > "name" : "orange", > "context" : { > "date" : ISODate("2018-03-13T21:52:54.940Z"), > "user" : "jack" > } > } > {code} > Then connect Drill to the above MongoDB storage, and run the following query > within Drill: > {code:java} > > select t.context.`date`, t.context from test t; > ++-+ > | EXPR$0 | context | > ++-+ > | 2018-03-13 | > {"date":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,"era":1,"millisOfDay":78774940,"weekOfWeekyear":11,"weekyear":2018,"monthOfYear":3,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"millisOfSecond":940,"secondOfMinute":54,"secondOfDay":78774,"minuteOfHour":52,"minuteOfDay":1312,"hourOfDay":21,"zone":{"fixed":true,"id":"UTC"},"millis":1520977974940,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"afterNow":false,"beforeNow":true,"equalNow":false},"user":"jack"} > | > {code} > We can see that from the above output, when the date field is retrieved as a > top level column, Drill outputs a logical date value. But when the same > field is within an object hierarchy, Drill outputs the internal object used > to hold the date value. > The expected output is the same display for whether the date field is shown > as a top level column or when it is within an object hierarchy: > {code:java} > > select t.context.`date`, t.context from test t; > ++-+ > | EXPR$0 | context | > ++-+ > | 2018-03-13 | {"date":"2018-03-13","user":"jack"} | > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6387) TestTpchDistributedConcurrent tests are ignored, they should be enabled.
Timothy Farkas created DRILL-6387: - Summary: TestTpchDistributedConcurrent tests are ignored, they should be enabled. Key: DRILL-6387 URL: https://issues.apache.org/jira/browse/DRILL-6387 Project: Apache Drill Issue Type: Bug Reporter: Timothy Farkas Assignee: Arina Ielchiieva [~arina] I noticed that you disabled TestTpchDistributedConcurrent with your change for DRILL-5771 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types
[ https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466463#comment-16466463 ] salim achouche commented on DRILL-5846: --- [~parthc], Can you please review this Jira PR now that I have provided a detailed performance analysis (DRILL-6301). > Improve Parquet Reader Performance for Flat Data types > --- > > Key: DRILL-5846 > URL: https://issues.apache.org/jira/browse/DRILL-5846 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 1.11.0 >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Labels: performance > Fix For: 1.14.0 > > Attachments: 2542d447-9837-3924-dd12-f759108461e5.sys.drill, > 2542d49b-88ef-38e3-a02b-b441c1295817.sys.drill > > > The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to > further improve the Parquet Reader performance as several users reported that > Parquet parsing represents the lion share of the overall query execution. It > tracks Flat Data types only as Nested DTs might involve functional and > processing enhancements (e.g., a nested column can be seen as a Document; > user might want to perform operations scoped at the document level that is no > need to span all rows). Another JIRA will be created to handle the nested > columns use-case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-6301) Parquet Performance Analysis
[ https://issues.apache.org/jira/browse/DRILL-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] salim achouche resolved DRILL-6301. --- Resolution: Fixed Reviewer: Pritesh Maker This is an analytical task. > Parquet Performance Analysis > > > Key: DRILL-6301 > URL: https://issues.apache.org/jira/browse/DRILL-6301 > Project: Apache Drill > Issue Type: Task > Components: Storage - Parquet >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > _*Description -*_ > * DRILL-5846 is meant to improve the Flat Parquet reader performance > * The associated implementation resulted in a 2x - 4x performance improvement > * Though during the review process ([pull > request|[https://github.com/apache/drill/pull/1060])] few key questions arised > > *_Intermediary Processing via Direct Memory vs Byte Arrays_* > * The main reasons for using byte arrays for intermediary processing is to > a) avoid the high cost of the DrillBuf checks (especially the reference > counting) and b) benefit from some observed Java optimizations when accessing > byte arrays > * Starting with version 1.12.0, the DrillBuf enablement checks have been > refined so that memory access and reference counting checks can be enabled > independently > * Benchmarking of Java's Direct Memory unsafe method using JMH indicates the > performance gap between heap vs direct memory is very narrow except for few > use-cases > * There are also concerns that the extra copy step (from direct memory into > byte arrays) will have a negative effect on performance; note that this > overhead was not observed using Intel's Vtune as the intermediary buffer were > a) pinned to a single CPU, b) reused, and c) small enough to remain in the L1 > cache during columnar processing. > _*Goal*_ > * The Flat Parquet reader is amongst the few Drill columnar operators > * It is imperative that we agree on the most optimal processing pattern so > that the decisions that we take within this Jira are not only applied to > Parquet but to all Drill columnar operators > _*Methodology*_ > # Assess the performance impact of using intermediary byte arrays (as > described above) > # Prototype a solution using Direct Memory and DrillBuf checks off, access > checks on, all checks on > # Make an educated decision on which processing pattern should be adopted > # Decide whether it is ok to use Java's unsafe API (and through what > mechanism) on byte arrays (when the use of byte arrays is a necessity) > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6301) Parquet Performance Analysis
[ https://issues.apache.org/jira/browse/DRILL-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466455#comment-16466455 ] salim achouche commented on DRILL-6301: --- *Benchmark Results* * Updated the Drill JMH benchmark [here|https://github.com/sachouche/drill-jmh] * The benchmark results and conclusions have been published to this [document|https://docs.google.com/document/d/1BSNem_ItP-Vxlr6auSP_iwwOLM9rwWZYxGwCsXi-IE8/edit#heading=h.57coyirqkop6] *In summary, it was concluded that* * The current Parquet flat reader performance was negatively impacted by the DrillBuf APIs when accessing few bytes at a time * Using intermediary buffers address such performance issues as the data access pattern became bulk * Using bulk processing (within the reader) had also the advantage of minimizing processing overhead > Parquet Performance Analysis > > > Key: DRILL-6301 > URL: https://issues.apache.org/jira/browse/DRILL-6301 > Project: Apache Drill > Issue Type: Task > Components: Storage - Parquet >Reporter: salim achouche >Assignee: salim achouche >Priority: Major > Fix For: 1.14.0 > > > _*Description -*_ > * DRILL-5846 is meant to improve the Flat Parquet reader performance > * The associated implementation resulted in a 2x - 4x performance improvement > * Though during the review process ([pull > request|[https://github.com/apache/drill/pull/1060])] few key questions arised > > *_Intermediary Processing via Direct Memory vs Byte Arrays_* > * The main reasons for using byte arrays for intermediary processing is to > a) avoid the high cost of the DrillBuf checks (especially the reference > counting) and b) benefit from some observed Java optimizations when accessing > byte arrays > * Starting with version 1.12.0, the DrillBuf enablement checks have been > refined so that memory access and reference counting checks can be enabled > independently > * Benchmarking of Java's Direct Memory unsafe method using JMH indicates the > performance gap between heap vs direct memory is very narrow except for few > use-cases > * There are also concerns that the extra copy step (from direct memory into > byte arrays) will have a negative effect on performance; note that this > overhead was not observed using Intel's Vtune as the intermediary buffer were > a) pinned to a single CPU, b) reused, and c) small enough to remain in the L1 > cache during columnar processing. > _*Goal*_ > * The Flat Parquet reader is amongst the few Drill columnar operators > * It is imperative that we agree on the most optimal processing pattern so > that the decisions that we take within this Jira are not only applied to > Parquet but to all Drill columnar operators > _*Methodology*_ > # Assess the performance impact of using intermediary byte arrays (as > described above) > # Prototype a solution using Direct Memory and DrillBuf checks off, access > checks on, all checks on > # Make an educated decision on which processing pattern should be adopted > # Decide whether it is ok to use Java's unsafe API (and through what > mechanism) on byte arrays (when the use of byte arrays is a necessity) > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6386) Disallow Unused Imports In Checkstyle
Timothy Farkas created DRILL-6386: - Summary: Disallow Unused Imports In Checkstyle Key: DRILL-6386 URL: https://issues.apache.org/jira/browse/DRILL-6386 Project: Apache Drill Issue Type: Improvement Reporter: Timothy Farkas Assignee: Timothy Farkas -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6249) Add Markdown Docs for Unit Testing and Link to it in README.md
[ https://issues.apache.org/jira/browse/DRILL-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Farkas updated DRILL-6249: -- Reviewer: Arina Ielchiieva > Add Markdown Docs for Unit Testing and Link to it in README.md > -- > > Key: DRILL-6249 > URL: https://issues.apache.org/jira/browse/DRILL-6249 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.14.0 > > > I am working on a presentation about how to use the unit testing utilities in > Drill. Instead of writing the doc and having it be lost in Google Drive > somewhere I am going to add a Markdown doc to the drill repo and link to it > in the README.md. This is appropriate since these docs will only be used by > developers, and the way we unit test will change as the code changes. So the > unit testing docs should be kept in the same repo as the code so it can be > updated and kept in sync with the rest of Drill. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6249) Add Markdown Docs for Unit Testing and Link to it in README.md
[ https://issues.apache.org/jira/browse/DRILL-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466421#comment-16466421 ] ASF GitHub Bot commented on DRILL-6249: --- ilooner commented on issue #1251: DRILL-6249: Adding more unit testing documentation. URL: https://github.com/apache/drill/pull/1251#issuecomment-387190754 @vvysotskyi Please review GeneratedCode.md @paul-rogers @arina-ielchiieva please review This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Markdown Docs for Unit Testing and Link to it in README.md > -- > > Key: DRILL-6249 > URL: https://issues.apache.org/jira/browse/DRILL-6249 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.14.0 > > > I am working on a presentation about how to use the unit testing utilities in > Drill. Instead of writing the doc and having it be lost in Google Drive > somewhere I am going to add a Markdown doc to the drill repo and link to it > in the README.md. This is appropriate since these docs will only be used by > developers, and the way we unit test will change as the code changes. So the > unit testing docs should be kept in the same repo as the code so it can be > updated and kept in sync with the rest of Drill. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6249) Add Markdown Docs for Unit Testing and Link to it in README.md
[ https://issues.apache.org/jira/browse/DRILL-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466414#comment-16466414 ] ASF GitHub Bot commented on DRILL-6249: --- ilooner opened a new pull request #1251: DRILL-6249: Adding more unit testing documentation. URL: https://github.com/apache/drill/pull/1251 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Markdown Docs for Unit Testing and Link to it in README.md > -- > > Key: DRILL-6249 > URL: https://issues.apache.org/jira/browse/DRILL-6249 > Project: Apache Drill > Issue Type: Improvement >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > Fix For: 1.14.0 > > > I am working on a presentation about how to use the unit testing utilities in > Drill. Instead of writing the doc and having it be lost in Google Drive > somewhere I am going to add a Markdown doc to the drill repo and link to it > in the README.md. This is appropriate since these docs will only be used by > developers, and the way we unit test will change as the code changes. So the > unit testing docs should be kept in the same repo as the code so it can be > updated and kept in sync with the rest of Drill. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6321) Lateral Join: Planning changes - enable submitting physical plan
[ https://issues.apache.org/jira/browse/DRILL-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466282#comment-16466282 ] ASF GitHub Bot commented on DRILL-6321: --- vrozov commented on a change in pull request #1224: DRILL-6321: Customize Drill's conformance. Allow support to APPLY key… URL: https://github.com/apache/drill/pull/1224#discussion_r186506523 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillConformance.java ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.planner.sql; + +import org.apache.calcite.sql.validate.SqlConformanceEnum; +import org.apache.calcite.sql.validate.SqlDelegatingConformance; + +/** + * Drill's SQL conformance is SqlConformanceEnum.DEFAULT except for method isApplyAllowed(). + * Since Drill is going to allow OUTER APPLY and CROSS APPLY to allow each row from left child of Join + * to join with output of right side (sub-query or table function that will be invoked for each row). + * Refer to DRILL-5999 for more information. + */ +public class DrillConformance extends SqlDelegatingConformance { Review comment: Personally, I don't see a need for the upper-level class in the future, so I implemented a different approach in https://github.com/apache/drill/compare/master...vrozov:DRILL-6321. A committer should decide what approach to follow, it is not that I block the PR with -1. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Lateral Join: Planning changes - enable submitting physical plan > > > Key: DRILL-6321 > URL: https://issues.apache.org/jira/browse/DRILL-6321 > Project: Apache Drill > Issue Type: Task >Reporter: Parth Chandra >Assignee: Chunhui Shi >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Implement changes to enable submitting a physical plan containing lateral and > unnest. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-4091) Support more functions in gis contrib module
[ https://issues.apache.org/jira/browse/DRILL-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466260#comment-16466260 ] ASF GitHub Bot commented on DRILL-4091: --- ChrisSandison commented on issue #1201: DRILL-4091: Support for additional gis operations in gis contrib module URL: https://github.com/apache/drill/pull/1201#issuecomment-387151953 @cgivre updated and squashed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support more functions in gis contrib module > > > Key: DRILL-4091 > URL: https://issues.apache.org/jira/browse/DRILL-4091 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Reporter: Karol Potocki >Assignee: Karol Potocki >Priority: Major > > Support for commonly used gis functions in gis contrib module: relate, > contains, crosses, intersects, touches, difference, disjoint, buffer, union > etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6272) Remove binary jars files from source distribution
[ https://issues.apache.org/jira/browse/DRILL-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466145#comment-16466145 ] ASF GitHub Bot commented on DRILL-6272: --- arina-ielchiieva commented on a change in pull request #1225: DRILL-6272: Refactor dynamic UDFs and function initializer tests to g… URL: https://github.com/apache/drill/pull/1225#discussion_r186472412 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/rpc/user/TemporaryTablesAutomaticDropTest.java ## @@ -19,39 +19,53 @@ import mockit.Mock; import mockit.MockUp; +import org.apache.drill.exec.store.StorageStrategy; import org.apache.drill.test.BaseTestQuery; import org.apache.drill.common.config.DrillConfig; -import org.apache.drill.exec.ExecConstants; import org.apache.drill.exec.store.StoragePluginRegistry; import org.apache.drill.exec.util.StoragePluginTestUtils; import org.apache.drill.test.DirTestWatcher; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.LocatedFileStatus; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.RemoteIterator; +import org.apache.hadoop.fs.permission.FsPermission; import org.junit.Before; +import org.junit.BeforeClass; import org.junit.Test; import java.io.File; -import java.util.Properties; import java.util.UUID; import static org.apache.drill.exec.util.StoragePluginTestUtils.DFS_TMP_SCHEMA; +import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; public class TemporaryTablesAutomaticDropTest extends BaseTestQuery { private static final String session_id = "sessionId"; + private static FileSystem fs; Review comment: Nope, they are defined in `@BeforeClass` and the same for all tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove binary jars files from source distribution > - > > Key: DRILL-6272 > URL: https://issues.apache.org/jira/browse/DRILL-6272 > Project: Apache Drill > Issue Type: Task >Reporter: Vlad Rozov >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.14.0 > > > Per [~vrozov] the source distribution contains binary jar files under > exec/java-exec/src/test/resources/jars -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6272) Remove binary jars files from source distribution
[ https://issues.apache.org/jira/browse/DRILL-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466147#comment-16466147 ] ASF GitHub Bot commented on DRILL-6272: --- arina-ielchiieva commented on a change in pull request #1225: DRILL-6272: Refactor dynamic UDFs and function initializer tests to g… URL: https://github.com/apache/drill/pull/1225#discussion_r186471096 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/udf/dynamic/JarBuilder.java ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.udf.dynamic; + +import org.apache.maven.cli.MavenCli; +import org.apache.maven.cli.logging.Slf4jLogger; +import org.codehaus.plexus.DefaultPlexusContainer; +import org.codehaus.plexus.PlexusContainer; +import org.codehaus.plexus.logging.BaseLoggerManager; + +import java.util.LinkedList; +import java.util.List; + +import static org.junit.Assert.assertEquals; + +public class JarBuilder { + + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(JarBuilder.class); + private static final String MAVEN_MULTI_MODULE_PROJECT_DIRECTORY = "maven.multiModuleProjectDirectory"; + + private final MavenCli cli; + private final String projectDirectory; + + public JarBuilder(String projectDirectory) { +this.cli = new MavenCli() { + @Override + protected void customizeContainer(PlexusContainer container) { +((DefaultPlexusContainer) container).setLoggerManager(new BaseLoggerManager() { + @Override + protected org.codehaus.plexus.logging.Logger createLogger(String s) { +return new Slf4jLogger(logger); + } +}); + } +}; +this.projectDirectory = projectDirectory; + } + + /** + * Builds jars using embedded maven in provided build directory. + * Includes files / resources based given pattern, otherwise using defaults provided in pom.xml. + * Checks if build exit code is 0, i.e. build was successful. + * + * @param jarName jar name + * @param buildDirectory build directory + * @param includeFiles pattern indicating which files should be included + * @param includeResources pattern indicating which resources should be included + * + * @return binary jar name with jar extension (my-jar.jar) + */ + public String build(String jarName, String buildDirectory, String includeFiles, String includeResources) { +String originalPropertyValue = null; +try { + originalPropertyValue = System.setProperty(MAVEN_MULTI_MODULE_PROJECT_DIRECTORY, projectDirectory); Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove binary jars files from source distribution > - > > Key: DRILL-6272 > URL: https://issues.apache.org/jira/browse/DRILL-6272 > Project: Apache Drill > Issue Type: Task >Reporter: Vlad Rozov >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.14.0 > > > Per [~vrozov] the source distribution contains binary jar files under > exec/java-exec/src/test/resources/jars -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6272) Remove binary jars files from source distribution
[ https://issues.apache.org/jira/browse/DRILL-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466146#comment-16466146 ] ASF GitHub Bot commented on DRILL-6272: --- arina-ielchiieva commented on a change in pull request #1225: DRILL-6272: Refactor dynamic UDFs and function initializer tests to g… URL: https://github.com/apache/drill/pull/1225#discussion_r186472868 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/udf/dynamic/JarBuilder.java ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.udf.dynamic; + +import org.apache.maven.cli.MavenCli; +import org.apache.maven.cli.logging.Slf4jLogger; +import org.codehaus.plexus.DefaultPlexusContainer; +import org.codehaus.plexus.PlexusContainer; +import org.codehaus.plexus.logging.BaseLoggerManager; + +import java.util.LinkedList; +import java.util.List; + +import static org.junit.Assert.assertEquals; + +public class JarBuilder { + + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(JarBuilder.class); + private static final String MAVEN_MULTI_MODULE_PROJECT_DIRECTORY = "maven.multiModuleProjectDirectory"; + + private final MavenCli cli; + private final String projectDirectory; + + public JarBuilder(String projectDirectory) { +this.cli = new MavenCli() { + @Override + protected void customizeContainer(PlexusContainer container) { +((DefaultPlexusContainer) container).setLoggerManager(new BaseLoggerManager() { + @Override + protected org.codehaus.plexus.logging.Logger createLogger(String s) { +return new Slf4jLogger(logger); + } +}); + } +}; +this.projectDirectory = projectDirectory; + } + + /** + * Builds jars using embedded maven in provided build directory. + * Includes files / resources based given pattern, otherwise using defaults provided in pom.xml. + * Checks if build exit code is 0, i.e. build was successful. + * + * @param jarName jar name + * @param buildDirectory build directory + * @param includeFiles pattern indicating which files should be included + * @param includeResources pattern indicating which resources should be included + * + * @return binary jar name with jar extension (my-jar.jar) + */ + public String build(String jarName, String buildDirectory, String includeFiles, String includeResources) { +String originalPropertyValue = null; +try { + originalPropertyValue = System.setProperty(MAVEN_MULTI_MODULE_PROJECT_DIRECTORY, projectDirectory); + List params = new LinkedList<>(); + params.add("clean"); + params.add("package"); + params.add("-DskipTests"); + // uncomment to build with current Drill version + // params.add("-Ddrill.version=" + DrillVersionInfo.getVersion()); + params.add("-Djar.finalName=" + jarName); + params.add("-Dcustom.buildDirectory=" + buildDirectory); + if (includeFiles != null) { +params.add("-Dinclude.files=" + includeFiles); + } + if (includeResources != null) { +params.add("-Dinclude.resources=" + includeResources); + } + int result = cli.doMain(params.toArray(new String[params.size()]), projectDirectory, System.out, System.err); + assertEquals("Build should be successful.", 0, result); + return jarName + ".jar"; +} finally { + if (originalPropertyValue != null) { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove binary jars files from source distribution > - > > Key: DRILL-6272 > URL: https://issues.apache.org/jira/browse/DRILL-6272 > Project: Apache Drill > Issue Type: Task >Reporter: Vlad Rozov >Assignee: Arina
[jira] [Commented] (DRILL-6272) Remove binary jars files from source distribution
[ https://issues.apache.org/jira/browse/DRILL-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466144#comment-16466144 ] ASF GitHub Bot commented on DRILL-6272: --- arina-ielchiieva commented on a change in pull request #1225: DRILL-6272: Refactor dynamic UDFs and function initializer tests to g… URL: https://github.com/apache/drill/pull/1225#discussion_r186471937 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/rpc/user/TemporaryTablesAutomaticDropTest.java ## @@ -19,39 +19,53 @@ import mockit.Mock; import mockit.MockUp; +import org.apache.drill.exec.store.StorageStrategy; import org.apache.drill.test.BaseTestQuery; import org.apache.drill.common.config.DrillConfig; -import org.apache.drill.exec.ExecConstants; import org.apache.drill.exec.store.StoragePluginRegistry; import org.apache.drill.exec.util.StoragePluginTestUtils; import org.apache.drill.test.DirTestWatcher; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.LocatedFileStatus; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.RemoteIterator; +import org.apache.hadoop.fs.permission.FsPermission; import org.junit.Before; +import org.junit.BeforeClass; import org.junit.Test; import java.io.File; -import java.util.Properties; import java.util.UUID; import static org.apache.drill.exec.util.StoragePluginTestUtils.DFS_TMP_SCHEMA; +import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; public class TemporaryTablesAutomaticDropTest extends BaseTestQuery { private static final String session_id = "sessionId"; Review comment: Replaced it to `UUID.randomUUID()`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove binary jars files from source distribution > - > > Key: DRILL-6272 > URL: https://issues.apache.org/jira/browse/DRILL-6272 > Project: Apache Drill > Issue Type: Task >Reporter: Vlad Rozov >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.14.0 > > > Per [~vrozov] the source distribution contains binary jar files under > exec/java-exec/src/test/resources/jars -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6272) Remove binary jars files from source distribution
[ https://issues.apache.org/jira/browse/DRILL-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466148#comment-16466148 ] ASF GitHub Bot commented on DRILL-6272: --- arina-ielchiieva commented on a change in pull request #1225: DRILL-6272: Refactor dynamic UDFs and function initializer tests to g… URL: https://github.com/apache/drill/pull/1225#discussion_r186472632 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/rpc/user/TemporaryTablesAutomaticDropTest.java ## @@ -19,39 +19,53 @@ import mockit.Mock; import mockit.MockUp; +import org.apache.drill.exec.store.StorageStrategy; import org.apache.drill.test.BaseTestQuery; import org.apache.drill.common.config.DrillConfig; -import org.apache.drill.exec.ExecConstants; import org.apache.drill.exec.store.StoragePluginRegistry; import org.apache.drill.exec.util.StoragePluginTestUtils; import org.apache.drill.test.DirTestWatcher; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.LocatedFileStatus; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.RemoteIterator; +import org.apache.hadoop.fs.permission.FsPermission; import org.junit.Before; +import org.junit.BeforeClass; import org.junit.Test; import java.io.File; -import java.util.Properties; import java.util.UUID; import static org.apache.drill.exec.util.StoragePluginTestUtils.DFS_TMP_SCHEMA; +import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; public class TemporaryTablesAutomaticDropTest extends BaseTestQuery { private static final String session_id = "sessionId"; + private static FileSystem fs; + private static FsPermission expectedFolderPermission; + private static FsPermission expectedFilePermission; + + @BeforeClass + public static void init() throws Exception { +fs = getLocalFileSystem(); +expectedFolderPermission = new FsPermission(StorageStrategy.TEMPORARY.getFolderPermission()); +expectedFilePermission = new FsPermission(StorageStrategy.TEMPORARY.getFilePermission()); + } + @Before - public void setup() throws Exception { + public void setup() { Review comment: Unfortunately, yes. It turned out that there is no good way to retrieve session information in tests. Sorry for confusion. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove binary jars files from source distribution > - > > Key: DRILL-6272 > URL: https://issues.apache.org/jira/browse/DRILL-6272 > Project: Apache Drill > Issue Type: Task >Reporter: Vlad Rozov >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.14.0 > > > Per [~vrozov] the source distribution contains binary jar files under > exec/java-exec/src/test/resources/jars -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)
[ https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466032#comment-16466032 ] Aman Sinha commented on DRILL-6385: --- [~weijie] thanks for working on this. It sounds like you are far along in the implementation. Just as a future reference, , it would be good to create the Jira sooner or inform on the dev list about the ongoing work so that others in the community are aware. Regarding the proposal, couple of thoughts: is a global bloom filter always needed or a local bloom filter will suffice in certain cases ? In the case where we are doing broadcast hash join, the probe side is never distributed, so once the build is done on each minor fragment, the bloom filter can be passed to the Scan operator locally without contacting the foreman node. A second related thought: for hash distributed hash join where both probe and build sides are hash distributed, does it mean that a 'global bloom filter' is a synchronization point in your proposal ? In other words, suppose there are 20 minor fragments and one of them is slow in completing the build phase, will the other 19 probes continue at their own pace ? > Support JPPD (Join Predicate Push Down) > --- > > Key: DRILL-6385 > URL: https://issues.apache.org/jira/browse/DRILL-6385 > Project: Apache Drill > Issue Type: New Feature > Components: Server, Execution - Flow >Reporter: weijie.tong >Assignee: weijie.tong >Priority: Major > > This feature is to support the JPPD (Join Predicate Push Down). It will > benefit the HashJoin ,Broadcast HashJoin performance by reducing the number > of rows to send across the network ,the memory consumed. This feature is > already supported by Impala which calls it RuntimeFilter > ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]). > The first PR will try to push down a bloom filter of HashJoin node to > Parquet’s scan node. The propose basic procedure is described as follow: > # The HashJoin build side accumulate the equal join condition rows to > construct a bloom filter. Then it sends out the bloom filter to the foreman > node. > # The foreman node accept the bloom filters passively from all the fragments > that has the HashJoin operator. It then aggregates the bloom filters to form > a global bloom filter. > # The foreman node broadcasts the global bloom filter to all the probe side > scan nodes which maybe already have send out partial data to the hash join > nodes(currently the hash join node will prefetch one batch from both sides ). > 4. The scan node accepts a global bloom filter from the foreman node. > It will filter the rest rows satisfying the bloom filter. > > To implement above execution flow, some main new notion described as below: > 1. RuntimeFilter > It’s a filter container which may contain BloomFilter or MinMaxFilter. > 2. RuntimeFilterReporter > It wraps the logic to send hash join’s bloom filter to the foreman.The > serialized bloom filter will be sent out through the data tunnel.This object > will be instanced by the FragmentExecutor and passed to the > FragmentContext.So the HashJoin operator can obtain it through the > FragmentContext. > 3. RuntimeFilterRequestHandler > It is responsible to accept a SendRuntimeFilterRequest RPC to strip the > actual BloomFilter from the network. It then translates this filter to the > WorkerBee’s new interface registerRuntimeFilter. > Another RPC type is BroadcastRuntimeFilterRequest. It will register the > accepted global bloom filter to the WorkerBee by the registerRuntimeFilter > method and then propagate to the FragmentContext through which the probe side > scan node can fetch the aggregated bloom filter. > 4.RuntimeFilterManager > The foreman will instance a RuntimeFilterManager .It will indirectly get > every RuntimeFilter by the WorkerBee. Once all the BloomFilters have been > accepted and aggregated . It will broadcast the aggregated bloom filter to > all the probe side scan nodes through the data tunnel by a > BroadcastRuntimeFilterRequest RPC. > 5. RuntimeFilterEnableOption > A global option will be added to decide whether to enable this new feature. > > Welcome suggestion and advice from you.The related PR will be presented as > soon as possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6385) Support JPPD (Join Predicate Push Down)
weijie.tong created DRILL-6385: -- Summary: Support JPPD (Join Predicate Push Down) Key: DRILL-6385 URL: https://issues.apache.org/jira/browse/DRILL-6385 Project: Apache Drill Issue Type: New Feature Components: Server, Execution - Flow Reporter: weijie.tong Assignee: weijie.tong This feature is to support the JPPD (Join Predicate Push Down). It will benefit the HashJoin ,Broadcast HashJoin performance by reducing the number of rows to send across the network ,the memory consumed. This feature is already supported by Impala which calls it RuntimeFilter ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]). The first PR will try to push down a bloom filter of HashJoin node to Parquet’s scan node. The propose basic procedure is described as follow: # The HashJoin build side accumulate the equal join condition rows to construct a bloom filter. Then it sends out the bloom filter to the foreman node. # The foreman node accept the bloom filters passively from all the fragments that has the HashJoin operator. It then aggregates the bloom filters to form a global bloom filter. # The foreman node broadcasts the global bloom filter to all the probe side scan nodes which maybe already have send out partial data to the hash join nodes(currently the hash join node will prefetch one batch from both sides ). 4. The scan node accepts a global bloom filter from the foreman node. It will filter the rest rows satisfying the bloom filter. To implement above execution flow, some main new notion described as below: 1. RuntimeFilter It’s a filter container which may contain BloomFilter or MinMaxFilter. 2. RuntimeFilterReporter It wraps the logic to send hash join’s bloom filter to the foreman.The serialized bloom filter will be sent out through the data tunnel.This object will be instanced by the FragmentExecutor and passed to the FragmentContext.So the HashJoin operator can obtain it through the FragmentContext. 3. RuntimeFilterRequestHandler It is responsible to accept a SendRuntimeFilterRequest RPC to strip the actual BloomFilter from the network. It then translates this filter to the WorkerBee’s new interface registerRuntimeFilter. Another RPC type is BroadcastRuntimeFilterRequest. It will register the accepted global bloom filter to the WorkerBee by the registerRuntimeFilter method and then propagate to the FragmentContext through which the probe side scan node can fetch the aggregated bloom filter. 4.RuntimeFilterManager The foreman will instance a RuntimeFilterManager .It will indirectly get every RuntimeFilter by the WorkerBee. Once all the BloomFilters have been accepted and aggregated . It will broadcast the aggregated bloom filter to all the probe side scan nodes through the data tunnel by a BroadcastRuntimeFilterRequest RPC. 5. RuntimeFilterEnableOption A global option will be added to decide whether to enable this new feature. Welcome suggestion and advice from you.The related PR will be presented as soon as possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6259) Support parquet filter push down for complex types
[ https://issues.apache.org/jira/browse/DRILL-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Gozhiy closed DRILL-6259. --- Verified with Drill version 1.14.0-SNAPSHOT, commit id: 24193b1b038a6315681a65c76a67034b64f71fc5 > Support parquet filter push down for complex types > -- > > Key: DRILL-6259 > URL: https://issues.apache.org/jira/browse/DRILL-6259 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Currently parquet filter push down is not working for complex types > (including arrays). > This Jira aims to implement filter push down for complex types which > underneath type is among supported simple types for filter push down. For > instance, currently Drill does not support filter push down for varchars, > decimals etc. Though once Drill will start support, this support will be > applied for complex type automatically. > Complex fields will be pushed down the same way regular fields are, except > for one case with arrays. > Query with predicate {{where users.hobbies_ids[2] is null}} won't be able to > push down because we are not able to determine exact number of nulls in > arrays fields. > {{Consider [1, 2, 3]}} vs {{[1, 2]}} if these arrays are in different files. > Statistics for the second case won't show any nulls but when querying from > two files, in terms of data the third value in array is null. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex
[ https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465537#comment-16465537 ] ASF GitHub Bot commented on DRILL-4834: --- vvysotskyi commented on issue #570: DRILL-4834 decimal implementation is vulnerable to overflow errors, and extremely complex URL: https://github.com/apache/drill/pull/570#issuecomment-386979265 Closing this PR since it was fixed in the scope of DRILL-6094 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > decimal implementation is vulnerable to overflow errors, and extremely complex > -- > > Key: DRILL-4834 > URL: https://issues.apache.org/jira/browse/DRILL-4834 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.6.0 > Environment: Drill 1.7 on any platform >Reporter: Dave Oshinsky >Assignee: Dave Oshinsky >Priority: Major > Fix For: 1.14.0 > > > While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java > template to handle the situation where a precision is not supplied (i.e., the > supplied precision is zero) for an integer value that is to be casted to a > decimal. The Drill decimal implementation uses a limited selection of fixed > decimal precision data types (the total number of decimal digits, i.e., > Decimal9, 18, 28, 38) to represent decimal values. If the destination > precision is too small to represent the input integer that is being casted, > there is no clean way to deal with the overflow error properly. > While using fixed decimal precisions as is being done currently can lead to > more efficient use of memory, it often will actually lead to less efficient > use of memory (when the fixed precision is specified significantly larger > than is actually needed to represent the numbers), and it results in a > tremendous mushrooming of the complexity of the code. For each fixed > precision (and there are only a limited set of selections, 9, 18, 28, 38, > which itself leads to memory inefficiency), there is a separate set of code > generated from templates. For each pairwise combination of decimal or > non-decimal numeric types, there are multiple places in the code where > conversions must be handled, or conditions must be included to handle the > difference in precision between the two types. A one-size-fits-all approach > (using a variable width vector to represent any decimal precision) would > usually be more memory-efficient (since precisions are often over-specified), > and would greatly simplify the code. > Also see the DRILL-4184 issue, which is related. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex
[ https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465539#comment-16465539 ] ASF GitHub Bot commented on DRILL-4834: --- vvysotskyi commented on issue #570: DRILL-4834 decimal implementation is vulnerable to overflow errors, and extremely complex URL: https://github.com/apache/drill/pull/570#issuecomment-386975060 @daveoshinsky, could you please close this PR, since it was fixed in the scope of DRILL-6094 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > decimal implementation is vulnerable to overflow errors, and extremely complex > -- > > Key: DRILL-4834 > URL: https://issues.apache.org/jira/browse/DRILL-4834 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.6.0 > Environment: Drill 1.7 on any platform >Reporter: Dave Oshinsky >Assignee: Dave Oshinsky >Priority: Major > Fix For: 1.14.0 > > > While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java > template to handle the situation where a precision is not supplied (i.e., the > supplied precision is zero) for an integer value that is to be casted to a > decimal. The Drill decimal implementation uses a limited selection of fixed > decimal precision data types (the total number of decimal digits, i.e., > Decimal9, 18, 28, 38) to represent decimal values. If the destination > precision is too small to represent the input integer that is being casted, > there is no clean way to deal with the overflow error properly. > While using fixed decimal precisions as is being done currently can lead to > more efficient use of memory, it often will actually lead to less efficient > use of memory (when the fixed precision is specified significantly larger > than is actually needed to represent the numbers), and it results in a > tremendous mushrooming of the complexity of the code. For each fixed > precision (and there are only a limited set of selections, 9, 18, 28, 38, > which itself leads to memory inefficiency), there is a separate set of code > generated from templates. For each pairwise combination of decimal or > non-decimal numeric types, there are multiple places in the code where > conversions must be handled, or conditions must be included to handle the > difference in precision between the two types. A one-size-fits-all approach > (using a variable width vector to represent any decimal precision) would > usually be more memory-efficient (since precisions are often over-specified), > and would greatly simplify the code. > Also see the DRILL-4184 issue, which is related. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-4184) Drill does not support Parquet DECIMAL values in variable length BINARY fields
[ https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465534#comment-16465534 ] ASF GitHub Bot commented on DRILL-4184: --- vvysotskyi closed pull request #372: DRILL-4184: support variable length decimal fields in parquet URL: https://github.com/apache/drill/pull/372 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableVarLengthValuesColumn.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableVarLengthValuesColumn.java index b18a81c606..bcfc812f0b 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableVarLengthValuesColumn.java +++ b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableVarLengthValuesColumn.java @@ -20,10 +20,14 @@ import io.netty.buffer.DrillBuf; import java.io.IOException; +import java.math.BigDecimal; +import java.nio.ByteBuffer; import org.apache.drill.common.exceptions.ExecutionSetupException; import org.apache.drill.exec.vector.ValueVector; - +import org.apache.drill.exec.vector.VariableWidthVector; +import org.apache.drill.exec.util.DecimalUtility; +import org.apache.drill.exec.vector.FixedWidthVector; import org.apache.parquet.column.ColumnDescriptor; import org.apache.parquet.format.SchemaElement; import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData; @@ -69,11 +73,16 @@ protected boolean readAndStoreValueSizeInformation() throws IOException { if ( currDefLevel == -1 ) { currDefLevel = pageReader.definitionLevels.readInteger(); } -if ( columnDescriptor.getMaxDefinitionLevel() > currDefLevel) { + +if (columnDescriptor.getMaxDefinitionLevel() > currDefLevel) { nullsRead++; - // set length of zero, each index in the vector defaults to null so no need to set the nullability - variableWidthVector.getMutator().setValueLengthSafe( - valuesReadInCurrentPass + pageReader.valuesReadyToRead, 0); + // set length of zero, each index in the vector defaults to null so no + // need to set the nullability + if (variableWidthVector == null) { +addDecimalLength(null); // store null length in BYTES for null value + } else { + variableWidthVector.getMutator().setValueLengthSafe(valuesReadInCurrentPass + pageReader.valuesReadyToRead, 0); + } currentValNull = true; return false;// field is null, no length to add to data vector } @@ -83,18 +92,26 @@ protected boolean readAndStoreValueSizeInformation() throws IOException { currLengthDeterminingDictVal = pageReader.dictionaryLengthDeterminingReader.readBytes(); } currDictValToWrite = currLengthDeterminingDictVal; - // re-purposing this field here for length in BYTES to prevent repetitive multiplication/division + + // re-purposing this field here for length in BYTES to prevent + // repetitive multiplication/division dataTypeLengthInBits = currLengthDeterminingDictVal.length(); } else { // re-purposing this field here for length in BYTES to prevent repetitive multiplication/division dataTypeLengthInBits = pageReader.pageData.getInt((int) pageReader.readyToReadPosInBytes); } -// I think this also needs to happen if it is null for the random access -boolean success = setSafe(valuesReadInCurrentPass + pageReader.valuesReadyToRead, pageReader.pageData, -(int) pageReader.readyToReadPosInBytes + 4, dataTypeLengthInBits); -if ( ! success ) { - return true; + +if (variableWidthVector == null) { + addDecimalLength(dataTypeLengthInBits); // store decimal length variable length decimal field +} +else { + // I think this also needs to happen if it is null for the random access + boolean success = setSafe(valuesReadInCurrentPass + pageReader.valuesReadyToRead, pageReader.pageData, + (int) pageReader.readyToReadPosInBytes + 4, dataTypeLengthInBits); + if ( ! success ) { +return true; + } } return false; } @@ -122,19 +139,34 @@ public void updatePosition() { protected void readField(long recordsToRead) { // TODO - unlike most implementations of this method, the recordsReadInThisIteration field is not set here // should verify that this is not breaking anything -currentValNull = variableWidthVector.getAccessor().getObject(valuesReadInCurrentPass) == null; +if (variableWidthVector == null) { + currentValNull = getDecimalLength(valuesReadInCurrentPass) == null; +} +else { +
[jira] [Commented] (DRILL-4184) Drill does not support Parquet DECIMAL values in variable length BINARY fields
[ https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465533#comment-16465533 ] ASF GitHub Bot commented on DRILL-4184: --- vvysotskyi commented on issue #372: DRILL-4184: support variable length decimal fields in parquet URL: https://github.com/apache/drill/pull/372#issuecomment-386978734 Closing this PR since it was fixed in the scope of DRILL-6094 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Drill does not support Parquet DECIMAL values in variable length BINARY fields > -- > > Key: DRILL-4184 > URL: https://issues.apache.org/jira/browse/DRILL-4184 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.4.0 > Environment: Windows 7 Professional, Java 1.8.0_66 >Reporter: Dave Oshinsky >Priority: Major > > Encoding a DECIMAL logical type in Parquet using the variable length BINARY > primitive type is not supported by Drill as of versions 1.3.0 and 1.4.0. The > problem first surfaces with the ClassCastException shown below, but fixing > the immediate cause of the exception is not sufficient to support this > combination (DECIMAL, BINARY) in a Parquet file. > In Drill, DECIMAL is currently assumed to be INT32, INT64, INT96, or > FIXED_LEN_BINARY_ARRAY. Are there any plans to support DECIMAL with variable > length BINARY? Avro definitely supports encoding DECIMAL in variable length > bytes (see https://avro.apache.org/docs/current/spec.html#Decimal), but this > support in Parquet is less clear. > Selecting on a BINARY DECIMAL field in a parquet file throws an exception as > shown below (java.lang.ClassCastException: > org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector). The successful query at > bottom selected on a string field in the same file. > 0: jdbc:drill:zk=local> select count(*) from > dfs.`c:/dao/DBArchivePredictor/tenrows.parquet` where acct_no=7020; > org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet > recor > d reader. > Message: Failure in setting up reader > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message sbi.acct_mstr { > required binary ACCT_NO (DECIMAL(20,0)); > optional binary SF_NO (UTF8); > optional binary LF_NO (UTF8); > optional binary BRANCH_NO (DECIMAL(20,0)); > optional binary INTRO_CUST_NO (DECIMAL(20,0)); > optional binary INTRO_ACCT_NO (DECIMAL(20,0)); > optional binary INTRO_SIGN (UTF8); > optional binary TYPE (UTF8); > optional binary OPR_MODE (UTF8); > optional binary CUR_ACCT_TYPE (UTF8); > optional binary TITLE (UTF8); > optional binary CORP_CUST_NO (DECIMAL(20,0)); > optional binary APLNDT (UTF8); > optional binary OPNDT (UTF8); > optional binary VERI_EMP_NO (DECIMAL(20,0)); > optional binary VERI_SIGN (UTF8); > optional binary MANAGER_SIGN (UTF8); > optional binary CURBAL (DECIMAL(8,2)); > optional binary STATUS (UTF8); > } > , metadata: > {parquet.avro.schema={"type":"record","name":"acct_mstr","namespace" > :"sbi","fields":[{"name":"ACCT_NO","type":{"type":"bytes","logicalType":"decimal > ","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_co > lumn_class":"java.math.BigDecimal","cv_connection":"oracle.jdbc.driver.T4CConnec > tion","cv_currency":true,"cv_def_writable":false,"cv_nullable":0,"cv_precision": > 20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_s > ubscript":1,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}},{"name":"SF_ > NO","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":tru > e,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":fal > se,"cv_nullable":1,"cv_precision":10,"cv_read_only":false,"cv_scale":0,"cv_searc > hable":true,"cv_signed":true,"cv_subscript":2,"cv_type":12,"cv_typename":"VARCHA > R2","cv_writable":true}]},{"name":"LF_NO","type":["null",{"type":"string","cv_au > to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv > _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":10,"cv_r > ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript > ":3,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"BRANCH_ > NO","type":["null",{"type":"bytes","logicalType":"decimal","precision":20,"scale > ":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"java.math. >
[jira] [Commented] (DRILL-4184) Drill does not support Parquet DECIMAL values in variable length BINARY fields
[ https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465532#comment-16465532 ] ASF GitHub Bot commented on DRILL-4184: --- vvysotskyi commented on issue #372: DRILL-4184: support variable length decimal fields in parquet URL: https://github.com/apache/drill/pull/372#issuecomment-386975850 @daveoshinsky, could you please close this PR, since it was fixed in the scope of DRILL-6094 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Drill does not support Parquet DECIMAL values in variable length BINARY fields > -- > > Key: DRILL-4184 > URL: https://issues.apache.org/jira/browse/DRILL-4184 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.4.0 > Environment: Windows 7 Professional, Java 1.8.0_66 >Reporter: Dave Oshinsky >Priority: Major > > Encoding a DECIMAL logical type in Parquet using the variable length BINARY > primitive type is not supported by Drill as of versions 1.3.0 and 1.4.0. The > problem first surfaces with the ClassCastException shown below, but fixing > the immediate cause of the exception is not sufficient to support this > combination (DECIMAL, BINARY) in a Parquet file. > In Drill, DECIMAL is currently assumed to be INT32, INT64, INT96, or > FIXED_LEN_BINARY_ARRAY. Are there any plans to support DECIMAL with variable > length BINARY? Avro definitely supports encoding DECIMAL in variable length > bytes (see https://avro.apache.org/docs/current/spec.html#Decimal), but this > support in Parquet is less clear. > Selecting on a BINARY DECIMAL field in a parquet file throws an exception as > shown below (java.lang.ClassCastException: > org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector). The successful query at > bottom selected on a string field in the same file. > 0: jdbc:drill:zk=local> select count(*) from > dfs.`c:/dao/DBArchivePredictor/tenrows.parquet` where acct_no=7020; > org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet > recor > d reader. > Message: Failure in setting up reader > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message sbi.acct_mstr { > required binary ACCT_NO (DECIMAL(20,0)); > optional binary SF_NO (UTF8); > optional binary LF_NO (UTF8); > optional binary BRANCH_NO (DECIMAL(20,0)); > optional binary INTRO_CUST_NO (DECIMAL(20,0)); > optional binary INTRO_ACCT_NO (DECIMAL(20,0)); > optional binary INTRO_SIGN (UTF8); > optional binary TYPE (UTF8); > optional binary OPR_MODE (UTF8); > optional binary CUR_ACCT_TYPE (UTF8); > optional binary TITLE (UTF8); > optional binary CORP_CUST_NO (DECIMAL(20,0)); > optional binary APLNDT (UTF8); > optional binary OPNDT (UTF8); > optional binary VERI_EMP_NO (DECIMAL(20,0)); > optional binary VERI_SIGN (UTF8); > optional binary MANAGER_SIGN (UTF8); > optional binary CURBAL (DECIMAL(8,2)); > optional binary STATUS (UTF8); > } > , metadata: > {parquet.avro.schema={"type":"record","name":"acct_mstr","namespace" > :"sbi","fields":[{"name":"ACCT_NO","type":{"type":"bytes","logicalType":"decimal > ","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_co > lumn_class":"java.math.BigDecimal","cv_connection":"oracle.jdbc.driver.T4CConnec > tion","cv_currency":true,"cv_def_writable":false,"cv_nullable":0,"cv_precision": > 20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_s > ubscript":1,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}},{"name":"SF_ > NO","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":tru > e,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":fal > se,"cv_nullable":1,"cv_precision":10,"cv_read_only":false,"cv_scale":0,"cv_searc > hable":true,"cv_signed":true,"cv_subscript":2,"cv_type":12,"cv_typename":"VARCHA > R2","cv_writable":true}]},{"name":"LF_NO","type":["null",{"type":"string","cv_au > to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv > _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":10,"cv_r > ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript > ":3,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"BRANCH_ > NO","type":["null",{"type":"bytes","logicalType":"decimal","precision":20,"scale >
[jira] [Commented] (DRILL-3950) CAST(...) * (Interval Constant) gives Internal Exception
[ https://issues.apache.org/jira/browse/DRILL-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465531#comment-16465531 ] ASF GitHub Bot commented on DRILL-3950: --- vvysotskyi closed pull request #218: DRILL-3950: Add test case and bump calcite version to 1.4.0-drill-r7 URL: https://github.com/apache/drill/pull/218 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestCastFunctions.java b/exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestCastFunctions.java index 23fc54e5dc..1a3b7511a6 100644 --- a/exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestCastFunctions.java +++ b/exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestCastFunctions.java @@ -18,9 +18,9 @@ package org.apache.drill.exec.fn.impl; import org.apache.drill.BaseTestQuery; -import org.apache.drill.common.types.TypeProtos; import org.apache.drill.common.util.FileUtils; import org.joda.time.DateTime; +import org.joda.time.Period; import org.junit.Test; public class TestCastFunctions extends BaseTestQuery { @@ -79,4 +79,22 @@ public void testToDateForTimeStamp() throws Exception { .build() .run(); } -} \ No newline at end of file + + @Test // DRILL-3950 + public void testCastTimesInterval() throws Exception { +final String query = "select cast(r_regionkey as Integer) * (INTERVAL '1' DAY) as col \n" + +"from cp.`tpch/region.parquet`"; + +testBuilder() +.sqlQuery(query) +.ordered() +.baselineColumns("col") +.baselineValues(Period.days(0)) +.baselineValues(Period.days(1)) +.baselineValues(Period.days(2)) +.baselineValues(Period.days(3)) +.baselineValues(Period.days(4)) +.build() +.run(); + } +} diff --git a/pom.xml b/pom.xml index 882f8d8af2..d94e21195b 100644 --- a/pom.xml +++ b/pom.xml @@ -1238,7 +1238,7 @@ org.apache.calcite calcite-core -1.4.0-drill-r6 +1.4.0-drill-r7 org.jgrapht This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > CAST(...) * (Interval Constant) gives Internal Exception > > > Key: DRILL-3950 > URL: https://issues.apache.org/jira/browse/DRILL-3950 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning Optimization >Reporter: Sean Hsuan-Yi Chu >Assignee: Roman Kulyk >Priority: Major > Labels: interval > > For example, > {code} > select cast(empno as Integer) * (INTERVAL '1' DAY) > from emp > {code} > results into > {code} > java.lang.AssertionError: Internal error: invalid literal: INTERVAL '1' DAY > {code} > The reason is that INTERVAL constant is not extracted properly in the cases > where this constant times a CAST() function -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-6221) Decimal aggregations for NULL values result in 0.0 value
[ https://issues.apache.org/jira/browse/DRILL-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi resolved DRILL-6221. Resolution: Fixed Fixed in the scope of DRILL-6094 > Decimal aggregations for NULL values result in 0.0 value > > > Key: DRILL-6221 > URL: https://issues.apache.org/jira/browse/DRILL-6221 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.12.0 >Reporter: Andries Engelbrecht >Assignee: Volodymyr Vysotskyi >Priority: Minor > > If you sum a packed decimal field with a null value instead of null you get > 0.0. > > select id, amt from hive.`default`.`packtest` > 1 2.3 > 2 null > 3 4.5 > > select sum(amt) from hive.`default`.`packtest` group by id > 1 2.3 > 2 0.0 > 3 4.5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-920) var_samp(decimal38) cause internal assertion error
[ https://issues.apache.org/jira/browse/DRILL-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi resolved DRILL-920. --- Resolution: Fixed Fixed in the scope of DRILL-6094 > var_samp(decimal38) cause internal assertion error > -- > > Key: DRILL-920 > URL: https://issues.apache.org/jira/browse/DRILL-920 > Project: Apache Drill > Issue Type: New Feature > Components: Functions - Drill >Reporter: Chun Chang >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: Future > > > #Mon Jun 02 10:18:35 PDT 2014 > git.commit.id.abbrev=8490d74 > The following query caused the internal assertion while applying rule reduce > aggregate rule. Note, it complains type mismatch, inferred type > decimal(19,19)??? > 0: jdbc:drill:schema=dfs> select var_samp(cast(c_decimal38 as > decimal(38,18))) from data where c_row < 15; > Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while > running query.[error_id: "beb1c5ab-6132-416c-a45d-49a20af8d416" > endpoint { > address: "qa-node117.qa.lab" > user_port: 31010 > control_port: 31011 > data_port: 31012 > } > error_type: 0 > message: "Failure while setting up Foreman. < AssertionError:[ Internal > error: Error while applying rule ReduceAggregatesRule, args > [rel#28051:AggregateRel.NONE.ANY([]).[](child=rel#28050:Subset#2.NONE.ANY([]).[],group={},EXPR$0=VAR_SAMP($0))] > ] < AssertionError:[ type mismatch: > aggCall type: > DECIMAL(38, 18) > inferred type: > DECIMAL(19, 19) ]" > ] > Error: exception while executing query (state=,code=0) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-4184) Drill does not support Parquet DECIMAL values in variable length BINARY fields
[ https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi resolved DRILL-4184. Resolution: Fixed Fixed in the scope of DRILL-6094 > Drill does not support Parquet DECIMAL values in variable length BINARY fields > -- > > Key: DRILL-4184 > URL: https://issues.apache.org/jira/browse/DRILL-4184 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.4.0 > Environment: Windows 7 Professional, Java 1.8.0_66 >Reporter: Dave Oshinsky >Priority: Major > > Encoding a DECIMAL logical type in Parquet using the variable length BINARY > primitive type is not supported by Drill as of versions 1.3.0 and 1.4.0. The > problem first surfaces with the ClassCastException shown below, but fixing > the immediate cause of the exception is not sufficient to support this > combination (DECIMAL, BINARY) in a Parquet file. > In Drill, DECIMAL is currently assumed to be INT32, INT64, INT96, or > FIXED_LEN_BINARY_ARRAY. Are there any plans to support DECIMAL with variable > length BINARY? Avro definitely supports encoding DECIMAL in variable length > bytes (see https://avro.apache.org/docs/current/spec.html#Decimal), but this > support in Parquet is less clear. > Selecting on a BINARY DECIMAL field in a parquet file throws an exception as > shown below (java.lang.ClassCastException: > org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector). The successful query at > bottom selected on a string field in the same file. > 0: jdbc:drill:zk=local> select count(*) from > dfs.`c:/dao/DBArchivePredictor/tenrows.parquet` where acct_no=7020; > org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet > recor > d reader. > Message: Failure in setting up reader > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message sbi.acct_mstr { > required binary ACCT_NO (DECIMAL(20,0)); > optional binary SF_NO (UTF8); > optional binary LF_NO (UTF8); > optional binary BRANCH_NO (DECIMAL(20,0)); > optional binary INTRO_CUST_NO (DECIMAL(20,0)); > optional binary INTRO_ACCT_NO (DECIMAL(20,0)); > optional binary INTRO_SIGN (UTF8); > optional binary TYPE (UTF8); > optional binary OPR_MODE (UTF8); > optional binary CUR_ACCT_TYPE (UTF8); > optional binary TITLE (UTF8); > optional binary CORP_CUST_NO (DECIMAL(20,0)); > optional binary APLNDT (UTF8); > optional binary OPNDT (UTF8); > optional binary VERI_EMP_NO (DECIMAL(20,0)); > optional binary VERI_SIGN (UTF8); > optional binary MANAGER_SIGN (UTF8); > optional binary CURBAL (DECIMAL(8,2)); > optional binary STATUS (UTF8); > } > , metadata: > {parquet.avro.schema={"type":"record","name":"acct_mstr","namespace" > :"sbi","fields":[{"name":"ACCT_NO","type":{"type":"bytes","logicalType":"decimal > ","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_co > lumn_class":"java.math.BigDecimal","cv_connection":"oracle.jdbc.driver.T4CConnec > tion","cv_currency":true,"cv_def_writable":false,"cv_nullable":0,"cv_precision": > 20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_s > ubscript":1,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}},{"name":"SF_ > NO","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":tru > e,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":fal > se,"cv_nullable":1,"cv_precision":10,"cv_read_only":false,"cv_scale":0,"cv_searc > hable":true,"cv_signed":true,"cv_subscript":2,"cv_type":12,"cv_typename":"VARCHA > R2","cv_writable":true}]},{"name":"LF_NO","type":["null",{"type":"string","cv_au > to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv > _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":10,"cv_r > ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript > ":3,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"BRANCH_ > NO","type":["null",{"type":"bytes","logicalType":"decimal","precision":20,"scale > ":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"java.math. > BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullable":1,"cv_preci > sion":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true > ,"cv_subscript":4,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}]},{"nam > e":"INTRO_CUST_NO","type":["null",{"type":"bytes","logicalType":"decimal","preci > sion":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_cla > ss":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullab >
[jira] [Commented] (DRILL-4184) Drill does not support Parquet DECIMAL values in variable length BINARY fields
[ https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465517#comment-16465517 ] ASF GitHub Bot commented on DRILL-4184: --- vvysotskyi commented on issue #372: DRILL-4184: support variable length decimal fields in parquet URL: https://github.com/apache/drill/pull/372#issuecomment-386975850 @daveoshinsky, could you please close this PR, since it was fixed in the scope of DRILL-6094 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Drill does not support Parquet DECIMAL values in variable length BINARY fields > -- > > Key: DRILL-4184 > URL: https://issues.apache.org/jira/browse/DRILL-4184 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.4.0 > Environment: Windows 7 Professional, Java 1.8.0_66 >Reporter: Dave Oshinsky >Priority: Major > > Encoding a DECIMAL logical type in Parquet using the variable length BINARY > primitive type is not supported by Drill as of versions 1.3.0 and 1.4.0. The > problem first surfaces with the ClassCastException shown below, but fixing > the immediate cause of the exception is not sufficient to support this > combination (DECIMAL, BINARY) in a Parquet file. > In Drill, DECIMAL is currently assumed to be INT32, INT64, INT96, or > FIXED_LEN_BINARY_ARRAY. Are there any plans to support DECIMAL with variable > length BINARY? Avro definitely supports encoding DECIMAL in variable length > bytes (see https://avro.apache.org/docs/current/spec.html#Decimal), but this > support in Parquet is less clear. > Selecting on a BINARY DECIMAL field in a parquet file throws an exception as > shown below (java.lang.ClassCastException: > org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector). The successful query at > bottom selected on a string field in the same file. > 0: jdbc:drill:zk=local> select count(*) from > dfs.`c:/dao/DBArchivePredictor/tenrows.parquet` where acct_no=7020; > org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet > recor > d reader. > Message: Failure in setting up reader > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message sbi.acct_mstr { > required binary ACCT_NO (DECIMAL(20,0)); > optional binary SF_NO (UTF8); > optional binary LF_NO (UTF8); > optional binary BRANCH_NO (DECIMAL(20,0)); > optional binary INTRO_CUST_NO (DECIMAL(20,0)); > optional binary INTRO_ACCT_NO (DECIMAL(20,0)); > optional binary INTRO_SIGN (UTF8); > optional binary TYPE (UTF8); > optional binary OPR_MODE (UTF8); > optional binary CUR_ACCT_TYPE (UTF8); > optional binary TITLE (UTF8); > optional binary CORP_CUST_NO (DECIMAL(20,0)); > optional binary APLNDT (UTF8); > optional binary OPNDT (UTF8); > optional binary VERI_EMP_NO (DECIMAL(20,0)); > optional binary VERI_SIGN (UTF8); > optional binary MANAGER_SIGN (UTF8); > optional binary CURBAL (DECIMAL(8,2)); > optional binary STATUS (UTF8); > } > , metadata: > {parquet.avro.schema={"type":"record","name":"acct_mstr","namespace" > :"sbi","fields":[{"name":"ACCT_NO","type":{"type":"bytes","logicalType":"decimal > ","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_co > lumn_class":"java.math.BigDecimal","cv_connection":"oracle.jdbc.driver.T4CConnec > tion","cv_currency":true,"cv_def_writable":false,"cv_nullable":0,"cv_precision": > 20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_s > ubscript":1,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}},{"name":"SF_ > NO","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":tru > e,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":fal > se,"cv_nullable":1,"cv_precision":10,"cv_read_only":false,"cv_scale":0,"cv_searc > hable":true,"cv_signed":true,"cv_subscript":2,"cv_type":12,"cv_typename":"VARCHA > R2","cv_writable":true}]},{"name":"LF_NO","type":["null",{"type":"string","cv_au > to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv > _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":10,"cv_r > ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript > ":3,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"BRANCH_ > NO","type":["null",{"type":"bytes","logicalType":"decimal","precision":20,"scale >
[jira] [Resolved] (DRILL-1005) stddev_pop(decimal) cause internal error
[ https://issues.apache.org/jira/browse/DRILL-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi resolved DRILL-1005. Resolution: Fixed Fixed in the scope of DRILL-6094 > stddev_pop(decimal) cause internal error > > > Key: DRILL-1005 > URL: https://issues.apache.org/jira/browse/DRILL-1005 > Project: Apache Drill > Issue Type: New Feature > Components: Functions - Drill >Reporter: Chun Chang >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: Future > > > split JIRA920 to cover each function. this one covers stddev_pop(). for > detail, please see JIRA920. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-1003) var_pop(decimal) cause internal error
[ https://issues.apache.org/jira/browse/DRILL-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi resolved DRILL-1003. Resolution: Fixed Fixed in the scope of DRILL-6094 > var_pop(decimal) cause internal error > - > > Key: DRILL-1003 > URL: https://issues.apache.org/jira/browse/DRILL-1003 > Project: Apache Drill > Issue Type: New Feature > Components: Functions - Drill >Reporter: Chun Chang >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: Future > > > want to split JIRA920 to cover each function. this JIRA covers function > var_pop(). For detail, please see JIRA920 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-1004) stddev_samp(decimal) cause interval error
[ https://issues.apache.org/jira/browse/DRILL-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi resolved DRILL-1004. Resolution: Fixed Fixed in the scope of DRILL-6094 > stddev_samp(decimal) cause interval error > - > > Key: DRILL-1004 > URL: https://issues.apache.org/jira/browse/DRILL-1004 > Project: Apache Drill > Issue Type: New Feature > Components: Functions - Drill >Reporter: Chun Chang >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: Future > > > split JIRA920 to cover each functions. this JIRA covers stddev_samp(). for > detail, please see JIRA920. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex
[ https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465511#comment-16465511 ] ASF GitHub Bot commented on DRILL-4834: --- vvysotskyi commented on issue #570: DRILL-4834 decimal implementation is vulnerable to overflow errors, and extremely complex URL: https://github.com/apache/drill/pull/570#issuecomment-386975060 @daveoshinsky, could you please close this PR, since it was fixed in the scope of DRILL-6094 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > decimal implementation is vulnerable to overflow errors, and extremely complex > -- > > Key: DRILL-4834 > URL: https://issues.apache.org/jira/browse/DRILL-4834 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.6.0 > Environment: Drill 1.7 on any platform >Reporter: Dave Oshinsky >Assignee: Dave Oshinsky >Priority: Major > Fix For: 1.14.0 > > > While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java > template to handle the situation where a precision is not supplied (i.e., the > supplied precision is zero) for an integer value that is to be casted to a > decimal. The Drill decimal implementation uses a limited selection of fixed > decimal precision data types (the total number of decimal digits, i.e., > Decimal9, 18, 28, 38) to represent decimal values. If the destination > precision is too small to represent the input integer that is being casted, > there is no clean way to deal with the overflow error properly. > While using fixed decimal precisions as is being done currently can lead to > more efficient use of memory, it often will actually lead to less efficient > use of memory (when the fixed precision is specified significantly larger > than is actually needed to represent the numbers), and it results in a > tremendous mushrooming of the complexity of the code. For each fixed > precision (and there are only a limited set of selections, 9, 18, 28, 38, > which itself leads to memory inefficiency), there is a separate set of code > generated from templates. For each pairwise combination of decimal or > non-decimal numeric types, there are multiple places in the code where > conversions must be handled, or conditions must be included to handle the > difference in precision between the two types. A one-size-fits-all approach > (using a variable width vector to represent any decimal precision) would > usually be more memory-efficient (since precisions are often over-specified), > and would greatly simplify the code. > Also see the DRILL-4184 issue, which is related. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-5858) case expression using decimal expression causes Assignment conversion not possible
[ https://issues.apache.org/jira/browse/DRILL-5858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi resolved DRILL-5858. Resolution: Fixed Fixed in the scope of DRILL-6094 > case expression using decimal expression causes Assignment conversion not > possible > -- > > Key: DRILL-5858 > URL: https://issues.apache.org/jira/browse/DRILL-5858 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.11.0 > Environment: Drill 1.11 decimal type support enabled >Reporter: N Campbell >Assignee: Volodymyr Vysotskyi >Priority: Major > Attachments: decimal_drill_exception.txt, parquet.tar.gz > > > The error appears to be specific to an expression involving a decimal type > within a case expression. If the math expressions are projected on their own > the error is not thrown. > Assignment conversion not possible from type > "org.apache.drill.exec.expr.holders.NullableDecimal28SparseHolder" to type > "org.apache.drill.exec.expr.holders.NullableDecimal38SparseHolder" > select > CASE when 'A' = 'A' THEN FIN_FINANCE_FACT.AMOUNT_MONTH * - 1 ELSE > FIN_FINANCE_FACT.AMOUNT_MONTH * 1 END AS STMT_MONTH, > CASE WHEN 'A' = 'A' THEN FIN_FINANCE_FACT.AMOUNT_YEAR_TO_DATE * - 1 ELSE > FIN_FINANCE_FACT.AMOUNT_YEAR_TO_DATE * 1 END AS STMT_YEAR > FROM dfs.gosalesdw1021p.FIN_FINANCE_FACT -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-5390) Casting as decimal does not make drill use the decimal value vector
[ https://issues.apache.org/jira/browse/DRILL-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi resolved DRILL-5390. Resolution: Fixed Fixed in the scope of DRILL-6094 > Casting as decimal does not make drill use the decimal value vector > --- > > Key: DRILL-5390 > URL: https://issues.apache.org/jira/browse/DRILL-5390 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.11.0 >Reporter: Rahul Challapalli >Assignee: Volodymyr Vysotskyi >Priority: Major > > The below query should be using the decimal value vector. However it looks > like it is using the float vector. If we feed the output of the below query > to a CTAS statement then the parquet file created has a double type instead > of a decimal type > {code} > alter session set `planner.enable_decimal_data_type` = true; > +---++ > | ok | summary | > +---++ > | true | planner.enable_decimal_data_type updated. | > +---++ > 1 row selected (0.39 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> select typeof(col2) from (select 1 as > col1, cast(2.0 as decimal(9,2)) as col2, cast(3.0 as decimal(9,2)) as col3 > from cp.`tpch/lineitem.parquet` limit 1) d; > +-+ > | EXPR$0 | > +-+ > | FLOAT8 | > +-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-3909) Decimal round functions corrupts input data
[ https://issues.apache.org/jira/browse/DRILL-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi resolved DRILL-3909. Resolution: Fixed Fixed in the scope of DRILL-6094 > Decimal round functions corrupts input data > --- > > Key: DRILL-3909 > URL: https://issues.apache.org/jira/browse/DRILL-3909 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: Future > > > The Decimal 28 and 38 round functions, instead of creating a new buffer and > copying data from the incoming buffer, set the output buffer equal to the > input buffer, and then subsequently mutate the data in that buffer. This > causes the data in the input buffer to be corrupted. > A simple example to reproduce: > {code} > $ cat a.json > { a : "9.95678" } > 0: jdbc:drill:drillbit=localhost> create table a as select cast(a as > decimal(38,18)) a from `a.json`; > +---++ > | Fragment | Number of records written | > +---++ > | 0_0 | 1 | > +---++ > 1 row selected (0.206 seconds) > 0: jdbc:drill:drillbit=localhost> select round(a, 9) from a; > +---+ > |EXPR$0 | > +---+ > | 10.0 | > +---+ > 1 row selected (0.121 seconds) > 0: jdbc:drill:drillbit=localhost> select round(a, 11) from a; > ++ > | EXPR$0 | > ++ > | 9.957 | > ++ > 1 row selected (0.115 seconds) > 0: jdbc:drill:drillbit=localhost> select round(a, 9), round(a, 11) from a; > +---++ > |EXPR$0 | EXPR$1 | > +---++ > | 10.0 | 1.000 | > +---++ > {code} > In the third example, there are two round expressions operating on the same > incoming decimal vector, and you can see that the result for the second > expression is incorrect. > Not critical because Decimal type is considered alpha right now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-2101) Decimal literals are treated as double
[ https://issues.apache.org/jira/browse/DRILL-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi resolved DRILL-2101. Resolution: Fixed Fixed in the scope of DRILL-6094 > Decimal literals are treated as double > -- > > Key: DRILL-2101 > URL: https://issues.apache.org/jira/browse/DRILL-2101 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 0.8.0 >Reporter: Victoria Markman >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: decimal > Fix For: Future > > Attachments: DRILL-2101-PARTIAL-PATCH-enable-decimal-literals.patch, > DRILL-2101.patch > > > {code} > create table t1(c1) as > select > cast(null as decimal(28,4)) > from `t1.csv`; > message root { > optional double c1; <-- Wrong, should be decimal > } > {code} > This is very commonly used construct to convert csv files to parquet files, > that's why I'm marking this bug as critical. > {code} > create table t2 as > select > case when columns[3] = '' then cast(null as decimal(28,4)) else > cast(columns[3] as decimal(28, 4)) end > from `t1.csv`; > {code} > Correct - cast string literal to decimal > {code} > create table t3(c1) as > select > cast('12345678901234567890.1234' as decimal(28,4)) > from `t1.csv`; > message root { > required fixed_len_byte_array(12) c1 (DECIMAL(28,4)); > } > {code} > Correct - cast literal from csv file as decimal > {code} > create table t4(c1) as > select > cast(columns[3] as decimal(28,4)) > from `t1.csv`; > message root { > optional fixed_len_byte_array(12) c1 (DECIMAL(28,4)); > } > {code} > Correct - case statement (no null involved) > {code} > create table t5(c1) as > select > case when columns[3] = '' then cast('' as decimal(28,4)) else > cast(columns[3] as decimal(28,4)) end > from `t1.csv`; > message root { > optional fixed_len_byte_array(12) c1 (DECIMAL(28,4)); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)