srowen commented on a change in pull request #24890: [SPARK-28074][SS] Log warn
message on possible correctness issue for multiple stateful operations in
single query
URL: https://github.com/apache/spark/pull/24890#discussion_r329127484
##########
File path: docs/structured-streaming-programming-guide.md
##########
@@ -1616,6 +1614,8 @@ this configuration judiciously.
### Arbitrary Stateful Operations
Many usecases require more advanced stateful operations than aggregations. For
example, in many usecases, you have to track sessions from data streams of
events. For doing such sessionization, you will have to save arbitrary types of
data as state, and perform arbitrary operations on the state using the data
stream events in every trigger. Since Spark 2.2, this can be done using the
operation `mapGroupsWithState` and the more powerful operation
`flatMapGroupsWithState`. Both operations allow you to apply user-defined code
on grouped Datasets to update user-defined state. For more concrete details,
take a look at the API documentation
([Scala](api/scala/index.html#org.apache.spark.sql.streaming.GroupState)/[Java](api/java/org/apache/spark/sql/streaming/GroupState.html))
and the examples
([Scala]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredSessionization.scala)/[Java]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredSessionization.java)).
+Though Spark cannot check and force it, state function should be implemented
with respect of semantic of output mode. e.g. In update mode Spark doesn't
expect state function will emit rows which are older than current watermark,
whereas in Append mode state function can emit these rows.
Review comment:
respect of -> respect to
semantic -> the semantics
e.g. -> For example,
update -> Update
expect state -> expect that the state
state function -> the state function
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]