[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37893: [SPARK-40434][SS][PYTHON] Implement applyInPandasWithState in PySpark

GitBox Tue, 20 Sep 2022 17:00:56 -0700


HeartSaVioR commented on code in PR #37893:
URL: https://github.com/apache/spark/pull/37893#discussion_r975902646



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala:
##########
@@ -311,6 +323,56 @@ object UnsupportedOperationChecker extends Logging {
             }
           }
 
+        // applyInPandasWithState
+        case m: FlatMapGroupsInPandasWithState if m.isStreaming =>
+          // Check compatibility with output modes and aggregations in query
+          val aggsInQuery = collectStreamingAggregates(plan)
+
+          if (aggsInQuery.isEmpty) {
+            // applyInPandasWithState without aggregation: operation's output 
mode must

Review Comment:
   Now I can imagine the case which current requirement of providing separate 
output mode prevents the unintentional behavior:
   
   - They implemented the user function for flatMapGroupsWithState with append 
mode.
   - They ran the query with append mode.
   - After that, they changed the output mode for the query to update mode for 
some reason.
   - The user function is unchanged to account the change of update mode.
   
   We haven't allowed the query to run as of now, and we are going to allow the 
query to run if we drop the parameter.
   
   PS. I'm not a believer that end users can implement their user function 
accordingly based on output mode, but that is a fundamental API design issue of 
original flatMapGroupsWithState which is separate one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37893: [SPARK-40434][SS][PYTHON] Implement applyInPandasWithState in PySpark

Reply via email to