Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22627#discussion_r222818119
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -2709,6 +2935,78 @@ write.stream(aggDF, "memory", outputMode = 
"complete", checkpointLocation = "pat
     </div>
     </div>
     
    +
    +## Recovery Semantics after Changes in a Streaming Query
    +There are limitations on what changes in a streaming query are allowed 
between restarts from the 
    +same checkpoint location. Here are a few kinds of changes that are either 
not allowed, or 
    +the effect of the change is not well-defined. For all of them:
    +
    +- The term *allowed* means you can do the specified change but whether the 
semantics of its effect 
    +  is well-defined depends on the query and the change.
    +
    +- The term *not allowed* means you should not do the specified change as 
the restarted query is likely 
    +  to fail with unpredictable errors. `sdf` represents a streaming 
DataFrame/Dataset 
    +  generated with sparkSession.readStream.
    +  
    +**Types of changes**
    +
    +- *Changes in the number or type (i.e. different source) of input 
sources*: This is not allowed.
    +
    +- *Changes in the parameters of input sources*: Whether this is allowed 
and whether the semantics 
    +  of the change are well-defined depends on the source and the query. Here 
are a few examples.
    +
    +  - Addition/deletion/modification of rate limits is allowed: 
`spark.readStream.format("kafka").option("subscribe", "topic")` to 
`spark.readStream.format("kafka").option("subscribe", 
"topic").option("maxOffsetsPerTrigger", ...)`
    +
    +  - Changes to subscribed topics/files is generally not allowed as the 
results are unpredictable: 
`spark.readStream.format("kafka").option("subscribe", "topic")` to 
`spark.readStream.format("kafka").option("subscribe", "newTopic")`
    +
    +- *Changes in the type of output sink*: Changes between a few specific 
combinations of sinks 
    +  are allowed. This needs to be verified on a case-by-case basis. Here are 
a few examples.
    +
    +  - File sink to Kafka sink is allowed. Kafka will see only the new data.
    +
    +  - Kafka sink to file sink is not allowed.
    +
    +  - Kafka sink changed to foreach, or vice versa is allowed.
    +
    +- *Changes in the parameters of output sink*: Whether this is allowed and 
whether the semantics of 
    +  the change are well-defined depends on the sink and the query. Here are 
a few examples.
    +
    +  - Changes to output directory of a file sink is not allowed: 
`sdf.writeStream.format("parquet").option("path", "/somePath")` to 
`sdf.writeStream.format("parquet").option("path", "/anotherPath")`
    +
    +  - Changes to output topic is allowed: 
`sdf.writeStream.format("kafka").option("topic", "someTopic")` to 
`sdf.writeStream.format("kafka").option("path", "anotherTopic")`
    --- End diff --
    
    nit: `path` -> `topic`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to