Re: [PR] [SPARK-49131][SS] TransformWithState should properly set implicit grouping keys even with lazy iterators [spark]

via GitHub Mon, 12 Aug 2024 11:52:47 -0700


anishshri-db commented on code in PR #47641:
URL: https://github.com/apache/spark/pull/47641#discussion_r1714234802



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala:
##########
@@ -198,7 +204,38 @@ case class TransformWithStateExec(
       getOutputRow(obj)
     }
     ImplicitGroupingKeyTracker.removeImplicitKey()
-    mappedIterator
+
+    // Wrapper to ensure that the implicit key is set when the methods on the 
iterator
+    // are called. Inside of processNewData, we use a GroupedIterator, so 
handleInputRows
+    // is only called once per key. As such, we only have to set the implicit 
key when
+    // the first call to hasNext is made, and we have to remove it when 
hasNext returns
+    // false.
+    //
+    // Note: if we ever start to interleave the processing of the iterators we 
get back
+    // from handleInputRows (i.e. we don't process each iterator all at once), 
then this
+    // iterator will need to set/unset the implicit key every time 
hasNext/next is called,
+    // not just at the first and last calls to hasNext.
+    new Iterator[InternalRow] {
+      var hasStarted = false
+
+      override def hasNext: Boolean = {
+        if (!hasStarted) {
+          hasStarted = true
+          ImplicitGroupingKeyTracker.setImplicitKey(keyObj)
+        }
+
+        val hasNext = mappedIterator.hasNext
+        if (!hasNext) {
+          ImplicitGroupingKeyTracker.removeImplicitKey()
+        }
+        hasNext
+      }
+
+      override def next(): InternalRow = {
+        assert(hasStarted, "next called before hasNext")

Review Comment:
   Could we improve the error message ? maybe add info about the 
iterator/function call maybe ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49131][SS] TransformWithState should properly set implicit grouping keys even with lazy iterators [spark]

Reply via email to