Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/11190#discussion_r53555179
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -112,16 +115,58 @@ abstract class SparkPlan extends QueryPlan[SparkPlan]
with Logging with Serializ
final def execute(): RDD[InternalRow] = {
RDDOperationScope.withScope(sparkContext, nodeName, false, true) {
prepare()
+ waitForSubqueries()
doExecute()
}
}
+ // All the subquries and their Future of results.
+ @transient private val queryResults = ArrayBuffer[(ScalarSubquery,
Future[Array[InternalRow]])]()
+
+ /**
+ * Collects all the subqueries and create a Future to take the first two
rows of them.
+ */
+ protected def prepareSubqueries(): Unit = {
+ val allSubqueries = expressions.flatMap(_.collect {case e:
ScalarSubquery => e})
+ allSubqueries.foreach { e =>
+ val futureResult = Future {
+ // We only need the first row, try to take two rows so we can
throw an exception if there
+ // are more than one rows returned.
+ e.executedPlan.executeTake(2)
+ }(SparkPlan.subqueryExecutionContext)
+ queryResults += e -> futureResult
+ }
+ }
+
+ /**
+ * Waits for all the subquires to finish and updates the results.
+ */
+ protected def waitForSubqueries(): Unit = {
+ // fill in the result of subqueries
+ queryResults.foreach {
+ case (e, futureResult) =>
+ val rows = Await.result(futureResult, Duration.Inf)
+ if (rows.length > 1) {
+ sys.error(s"more than one row returned by a subquery used as an
expression:\n${e.plan}")
+ }
+ if (rows.length == 1) {
+ assert(rows(0).numFields == 1, "Analyzer should make sure this
only returns one column")
+ e.updateResult(rows(0).get(0, e.dataType))
--- End diff --
Why don't we replace the `ScalarSubqueries` with `Literals` in the the
expression tree? That way we don't need state in `ScalarSubquery` and make CG
easier...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]