Hi, I am processing a bunch of HDFS data using the StreamingContext (Spark 1.1.0) which means that all files that exist in the directory at start() time are processed in the first batch. Now when I try to stop this stream processing using `streamingContext.stop(false, false)` (that is, even with stopGracefully = false), it has no effect. The stop() call blocks and data processing continues (probably it would stop after the batch, but that would be too long since all my data is in that batch).
I am not exactly sure if this is generally true or only for the first batch. Also I observed that stopping the stream processing during the first batch does occasionally lead to a very long time until the stop takes place (even if there is no data present at all). Has anyone experienced something similar? In my processing code, do I have to do something particular (like checking for the state of the StreamingContext) to allow the interruption? It is quite important for me that stopping the stream processing takes place rather quickly. Thanks Tobias
