Re: [PR] [SPARK-46865][SS] Add Batch Support for TransformWithState Operator [spark]

via GitHub Tue, 30 Jan 2024 10:01:17 -0800


sahnib commented on code in PR #44884:
URL: https://github.com/apache/spark/pull/44884#discussion_r1471710039



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala:
##########
@@ -171,3 +175,44 @@ case class TransformWithStateExec(
     }
   }
 }
+
+
+object TransformWithStateExec {
+
+  // Plan logical transformWithState for batch queries
+  def generateSparkPlanForBatchQueries(
+      keyDeserializer: Expression,
+      valueDeserializer: Expression,
+      groupingAttributes: Seq[Attribute],
+      dataAttributes: Seq[Attribute],
+      statefulProcessor: StatefulProcessor[Any, Any, Any],
+      timeoutMode: TimeoutMode,
+      outputMode: OutputMode,
+      outputObjAttr: Attribute,
+      child: SparkPlan): SparkPlan = {
+    val shufflePartitions = 
child.session.sessionState.conf.numShufflePartitions
+    val statefulOperatorStateInfo = StatefulOperatorStateInfo(
+      Utils.createTempDir().getAbsolutePath,

Review Comment:
   @HeartSaVioR @anishshri-db 
   
   1. Do we see a usecase where the user would want to read the State files 
(from DFS) post query?
   2. I think implementing a memory based state store is likely a larger 
effort. For the time being, we can also create the temp directory inside the 
executor node (inside mapPartitionsWithState), and discard the directory post 
evaluation (inside completion iterator). I agree though that this is 
inefficient, and a memory based store would be better in long run.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46865][SS] Add Batch Support for TransformWithState Operator [spark]

Reply via email to