sahnib commented on code in PR #44884:
URL: https://github.com/apache/spark/pull/44884#discussion_r1471710039
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala:
##########
@@ -171,3 +175,44 @@ case class TransformWithStateExec(
}
}
}
+
+
+object TransformWithStateExec {
+
+ // Plan logical transformWithState for batch queries
+ def generateSparkPlanForBatchQueries(
+ keyDeserializer: Expression,
+ valueDeserializer: Expression,
+ groupingAttributes: Seq[Attribute],
+ dataAttributes: Seq[Attribute],
+ statefulProcessor: StatefulProcessor[Any, Any, Any],
+ timeoutMode: TimeoutMode,
+ outputMode: OutputMode,
+ outputObjAttr: Attribute,
+ child: SparkPlan): SparkPlan = {
+ val shufflePartitions =
child.session.sessionState.conf.numShufflePartitions
+ val statefulOperatorStateInfo = StatefulOperatorStateInfo(
+ Utils.createTempDir().getAbsolutePath,
Review Comment:
@HeartSaVioR @anishshri-db
1. Do we see a usecase where the user would want to read the State files
(from DFS) post query?
2. I think implementing a memory based state store is likely a larger
effort. For the time being, we can also create the temp directory inside the
executor node (inside mapPartitionsWithState), and discard the directory post
evaluation (inside completion iterator). I agree though that this is
inefficient, and a memory based store would be better in long run.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]