rf972 commented on pull request #29695:
URL: https://github.com/apache/spark/pull/29695#issuecomment-721853674
First, thank you for this pull request! We have found it very useful, and
we are very excited to use this support to help enable aggregate pushdown in
our own V2 datasource.
We have looked into evaluating this code with TPCH, since we believe spark
will see great gains with aggregate push down.
One of the TPCH queries Q06 is: select sum(l_extendedprice*l_discount) as
revenue
We saw issues with the aggregates and product in our testing. To help
illustrate this issue, we added a similar case with sum and product to
JDBCV2Suite's test("scan with aggregate push-down")
val df6 = sql("select MIN(SALARY) * MIN(BONUS) FROM h2.test.employee")
df6.explain(true)
Below is the error that we get.
[info] java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for
length 1
[info] at
org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$apply$1$$anonfun$1.applyOrElse(V2ScanRelationPushDown.scala:116)
[info] at
org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$apply$1$$anonfun$1.applyOrElse(V2ScanRelationPushDown.scala:109)
[info] at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318)
...
val aggFunction: aggregate.AggregateFunction = {
if (agg.aggregateFunction.isInstanceOf[aggregate.Max]) {
aggregate.Max(aggOutput(i - 1))
} else if (agg.aggregateFunction.isInstanceOf[aggregate.Min]) {
aggregate.Min(aggOutput(i - 1)) <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
exception here
Note that in this case aggOutput has just one item aggOutput[0] =
(min(salary) *min(bonus))
We are really interested in getting this case working and can help out with
evaluation of a fix or even putting a fix together. Any thoughts on a
potential solution would be appreciated. Thanks !
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]