With filestream you can actually pass a filter parameter to avoid loading
up .tmp file/directories.
Also, when you move/rename a file, the file creation date doesn't change
and hence spark won't detect them i believe.
Thanks
Best Regards
On Sat, May 2, 2015 at 9:37 PM, Evo Eftimov evo.efti...@isecc.com wrote:
it seems that on Spark Streaming 1.2 the filestream API may have a bug -
it doesn't detect new files when moving or renaming them on HDFS - only
when copying them but that leads to a well known problem with .tmp files
which get removed and make spark steraming filestream throw exception