anishshri-db commented on code in PR #44884:
URL: https://github.com/apache/spark/pull/44884#discussion_r1470068681


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala:
##########
@@ -171,3 +175,44 @@ case class TransformWithStateExec(
     }
   }
 }
+
+
+object TransformWithStateExec {
+
+  // Plan logical transformWithState for batch queries
+  def generateSparkPlanForBatchQueries(
+      keyDeserializer: Expression,
+      valueDeserializer: Expression,
+      groupingAttributes: Seq[Attribute],
+      dataAttributes: Seq[Attribute],
+      statefulProcessor: StatefulProcessor[Any, Any, Any],
+      timeoutMode: TimeoutMode,
+      outputMode: OutputMode,
+      outputObjAttr: Attribute,
+      child: SparkPlan): SparkPlan = {
+    val shufflePartitions = 
child.session.sessionState.conf.numShufflePartitions
+    val statefulOperatorStateInfo = StatefulOperatorStateInfo(
+      Utils.createTempDir().getAbsolutePath,

Review Comment:
   @HeartSaVioR - the tmp dir creation is based on the `java.io.tmpdir` setting 
right, which should be OS dependent ? are you saying that it's not reliable if 
we are running different OS types/versions across driver/executor ? 
   
   unlike FMGWS, we don't actually have a similar batch equivalent like 
mapGroupsExec that we can use anymore. Implementing fake state store 
implementation would be fair amount of work I feel (we would also have to 
ensure that the store supports composite types like ListState, MapState etc in 
the future) ?
   
   IIUC - the only reason we really need the checkpoint loc for the state store 
is for keeping track of the committed state across batches (and potentially 
also within the stateStoreId). Do you think its ok to do either of the 
following:
   - use a local path on the executor (either ways this is a dummy path for 
batch queries - so I guess should be safe to use)
   - skip using the checkpoint loc/pass a None option
   - avoid registration for the instance with the state store coordinator ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to