cloud-fan commented on code in PR #40914:
URL: https://github.com/apache/spark/pull/40914#discussion_r1174799882


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala:
##########
@@ -288,7 +288,7 @@ object StatFunctions extends Logging {
     }
 
     // If there is no selected columns, we don't need to run this aggregate, 
so make it a lazy val.
-    lazy val aggResult = ds.select(aggExprs: 
_*).queryExecution.toRdd.collect().head
+    lazy val aggResult = ds.select(aggExprs: 
_*).queryExecution.toRdd.map(_.copy()).collect().head

Review Comment:
   According to the doc of `QueryExecution.toRDD`, I think adding copy is the 
right thing to do
   ```
     /**
      * Internal version of the RDD. Avoids copies and has no schema.
      * Note for callers: Spark may apply various optimization including 
reusing object: this means
      * the row is valid only for the iteration it is retrieved. You should 
avoid storing row and
      * accessing after iteration. (Calling `collect()` is one of known bad 
usage.)
      * If you want to store these rows into collection, please apply some 
converter or copy row
      * which produces new object per iteration.
      * Given QueryExecution is not a public class, end users are discouraged 
to use this: please
      * use `Dataset.rdd` instead where conversion will be applied.
      */
     lazy val toRdd: RDD[InternalRow] = new SQLExecutionRDD(
       executedPlan.execute(), sparkSession.sessionState.conf)
   ```
   It happens to work here because the result only has one row (it's a global 
aggregate). I'm fine without testing it as this follows the guidance of 
`QueryExecution.toRdd` doc



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to