I am trying to implement a use case that takes some human input. Putting that in a single file (as opposed to a collection of HDFS files) would be a simpler human interface, so I tried an experiment with whether Spark Streaming (via textFileStream) will recognize a new version of a filename it has already digested. (Yes, I'm deleting and moving a new file into the same name, not modifying in place.) It appears the answer is No, it does not recognize a new version. Can one of the experts confirm a) this is true and b) this is intended?
Experiment: - run an existing program that works to digest new files in a directory - modify the data-creation script to put the new files always under the same name instead of different names, then run the script Outcome: it sees the first file under that name, but none of the subsequent files (with different contents, which would show up in output). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-appears-not-to-recognize-a-more-recent-version-of-an-already-seen-file-true-tp18074.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
