[GitHub] [spark] tdas commented on a change in pull request #33093: [SPARK-35897][SS][WIP] Support user defined initial state with flatMapGroupsWithState in Structured Streaming

GitBox Wed, 30 Jun 2021 08:11:57 -0700


tdas commented on a change in pull request #33093:
URL: https://github.com/apache/spark/pull/33093#discussion_r661569762




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala
##########
@@ -178,6 +257,47 @@ case class FlatMapGroupsWithStateExec(
       }
     }
 
+    /**
+     * Process the new data iterator along with the initial state. The initial 
state is applied
+     * before processing the new data for every key. The user defined function 
is called only
+     * once on the data.
+     */
+    def processNewDataWithInitState(
+        childDataIter: Iterator[InternalRow],
+        initStateIter: Iterator[InternalRow]
+      ): Iterator[InternalRow] = {
+
+      if (!childDataIter.hasNext && !initStateIter.hasNext) return 
Iterator.empty
+
+      val groupedChildDataIter = GroupedIterator(childDataIter, 
groupingAttributes, child.output)
+      val groupedInitStateIter =
+        GroupedIterator(initStateIter, initStateGroupAttrs, 
initialState.output)
+
+      val keyOrderingComparator = GenerateOrdering.generate(
+        groupingAttributes.map(SortOrder(_, Ascending)), groupingAttributes)

Review comment:
       this object `groupingAttributes.map(SortOrder(_, Ascending))` must be 
reused between here and the `requiredChildOrdering`.  the code has to ensure 
that the child ordering done by the SparkPlan and the comparator being used for 
this merging ... is based on the same sorting strategy.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] tdas commented on a change in pull request #33093: [SPARK-35897][SS][WIP] Support user defined initial state with flatMapGroupsWithState in Structured Streaming

Reply via email to