HeartSaVioR commented on code in PR #44884:
URL: https://github.com/apache/spark/pull/44884#discussion_r1467180436
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala:
##########
@@ -171,3 +175,44 @@ case class TransformWithStateExec(
}
}
}
+
+
+object TransformWithStateExec {
+
+ // Plan logical transformWithState for batch queries
+ def generateSparkPlanForBatchQueries(
+ keyDeserializer: Expression,
+ valueDeserializer: Expression,
+ groupingAttributes: Seq[Attribute],
+ dataAttributes: Seq[Attribute],
+ statefulProcessor: StatefulProcessor[Any, Any, Any],
+ timeoutMode: TimeoutMode,
+ outputMode: OutputMode,
+ outputObjAttr: Attribute,
+ child: SparkPlan): SparkPlan = {
+ val shufflePartitions =
child.session.sessionState.conf.numShufflePartitions
+ val statefulOperatorStateInfo = StatefulOperatorStateInfo(
+ Utils.createTempDir().getAbsolutePath,
Review Comment:
We can't expect the path to exist for both driver and executor. If we want
to leverage temp dir, the full path should be retrieved from executor.
Also, in flatMapGroupsWithState, we just mapped the batch version of
flatMapGroupsWithState to flatMapGroups. I'd guess it's no longer simple as we
allow users to initialize multiple states, but would be great if we can fake
state instance (or state store implementation) rather than initiating full
lifecycle of state store including coordination.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]