Dear Mich,
Sure, that is a good idea. If we have a pause() function, we can
temporarily stop streaming and adjust configuration, maybe from environment
variable.
Once these parameters are adjust, we can restart the streaming to apply the
newest parameter without stop spark streaming application.
most probably we will require an additional method pause()
https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.streaming.StreamingQuery.html
to allow us to pause (as opposed to stop()) the streaming process and
resume after changing the parameters. The state of streaming
hm interesting proposition. I guess you mean altering one of following
parameters in flight
streamingDataFrame = self.spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers",
config['MDVariables']['bootstrapServers'],)
Hi,
There is some good documentation under here
https://docs.databricks.com/structured-streaming/query-recovery.html
Under the “recovery after change in structured streaming query” heading
that gives good general guidelines on what can be changed in a “pause” of a
stream
On Thu, 16 Feb 2023
*Component*: Spark Structured Streaming
*Level*: Advanced
*Scenario*: How-to
-
*Problems Description*
I would like to confirm could we directly apply new options of
readStream/writeStream without stopping current running spark structured
streaming applications? For