Spark Streaming appears not to recognize a more recent version of an already-seen file; true?

spr Tue, 04 Nov 2014 10:42:12 -0800

I am trying to implement a use case that takes some human input.  Putting
that in a single file (as opposed to a collection of HDFS files) would be a
simpler human interface, so I tried an experiment with whether Spark
Streaming (via textFileStream) will recognize a new version of a filename it
has already digested.  (Yes, I'm deleting and moving a new file into the
same name, not modifying in place.)  It appears the answer is No, it does
not recognize a new version.  Can one of the experts confirm a) this is true
and b) this is intended?


Experiment:
- run an existing program that works to digest new files in a directory
- modify the data-creation script to put the new files always under the same
name instead of different names, then run the script

Outcome:  it sees the first file under that name, but none of the subsequent
files (with different contents, which would show up in output).



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-appears-not-to-recognize-a-more-recent-version-of-an-already-seen-file-true-tp18074.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Spark Streaming appears not to recognize a more recent version of an already-seen file; true?

Reply via email to