srowen commented on a change in pull request #24890: [SPARK-28074][SS] Log warn
message on possible correctness issue for multiple stateful operations in
single query
URL: https://github.com/apache/spark/pull/24890#discussion_r329316565
##########
File path: docs/structured-streaming-programming-guide.md
##########
@@ -1616,6 +1615,8 @@ this configuration judiciously.
### Arbitrary Stateful Operations
Many usecases require more advanced stateful operations than aggregations. For
example, in many usecases, you have to track sessions from data streams of
events. For doing such sessionization, you will have to save arbitrary types of
data as state, and perform arbitrary operations on the state using the data
stream events in every trigger. Since Spark 2.2, this can be done using the
operation `mapGroupsWithState` and the more powerful operation
`flatMapGroupsWithState`. Both operations allow you to apply user-defined code
on grouped Datasets to update user-defined state. For more concrete details,
take a look at the API documentation
([Scala](api/scala/index.html#org.apache.spark.sql.streaming.GroupState)/[Java](api/java/org/apache/spark/sql/streaming/GroupState.html))
and the examples
([Scala]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredSessionization.scala)/[Java]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredSessionization.java)).
+Though Spark cannot check and force it, state function should be implemented
with respect to the semantics of output mode. For example, in Update mode Spark
doesn't expect that the state function will emit rows which are older than
current watermark plus allowed late record delay, whereas in Append mode the
state function can emit these rows.
Review comment:
state function -> the state function
of output -> of the output
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]