rf972 commented on pull request #29695:
URL: https://github.com/apache/spark/pull/29695#issuecomment-721853674


   First, thank you for this pull request!  We have found it very useful, and 
we are very excited to use this support to help enable aggregate pushdown in 
our own V2 datasource.
   
   We have looked into evaluating this code with TPCH, since we believe spark 
will see great gains with aggregate push down.
   One of the TPCH queries Q06 is:  select sum(l_extendedprice*l_discount) as 
revenue
   
   We saw issues with the aggregates and product in our testing.  To help 
illustrate this issue, we added a similar case with sum and product to 
JDBCV2Suite's test("scan with aggregate push-down")
   val df6 = sql("select MIN(SALARY) * MIN(BONUS) FROM h2.test.employee")
   df6.explain(true)
   
   Below is the error that we get.  
   [info]   java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for 
length 1
   [info]   at 
org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$apply$1$$anonfun$1.applyOrElse(V2ScanRelationPushDown.scala:116)
   [info]   at 
org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$apply$1$$anonfun$1.applyOrElse(V2ScanRelationPushDown.scala:109)
   [info]   at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:318)
   ...
   
   val aggFunction: aggregate.AggregateFunction = {
   if (agg.aggregateFunction.isInstanceOf[aggregate.Max]) {
       aggregate.Max(aggOutput(i - 1))
   } else if (agg.aggregateFunction.isInstanceOf[aggregate.Min]) {
       aggregate.Min(aggOutput(i - 1)) <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 
exception here
   
   Note that in this case aggOutput has just one item aggOutput[0] = 
(min(salary) *min(bonus))
   
   We are really interested in getting this case working and can help out with 
evaluation of a fix or even putting a fix together.  Any thoughts on a 
potential solution would be appreciated.  Thanks !
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to