Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/7774#discussion_r36128883
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -110,11 +121,30 @@ abstract class SparkPlan extends QueryPlan[SparkPlan]
with Logging with Serializ
"Operator will receive unsafe rows as input but cannot process
unsafe rows")
}
RDDOperationScope.withScope(sparkContext, nodeName, false, true) {
+ prepare()
doExecute()
}
}
/**
+ * Do the preparation for SparkPlan.
+ */
+ final def prepare(): Unit = {
+ doPrepare
+ }
+
+ /**
+ * Overridden by concrete implementations of SparkPlan. It is guaranteed
to run before any
+ * `execute` of SparkPlan. This is helpful when we want to launch some
background works, e.g.,
+ * `BroadcastHashJoin` uses it to broadcast asynchronously.
+ *
+ * This is lazy to make sure running doPrepare is called only once for
each SparkPlan.
+ */
+ protected lazy val doPrepare: Unit = {
--- End diff --
it's really strange for subclasses to override a lazy val that is really a
method call. I think instead this should be a normal `protected def
doPrepare(): Unit` and we should have another `private val prepareCalled`
somewhere to make sure we only call this once.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]