Jem Tucker created SPARK-5221:
---------------------------------

             Summary: FileInputDStream "remember window" in certain situations 
causes files to be ignored 
                 Key: SPARK-5221
                 URL: https://issues.apache.org/jira/browse/SPARK-5221
             Project: Spark
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 1.2.0, 1.1.1
            Reporter: Jem Tucker
            Priority: Minor


When batch times are greater than 1 minute, if a file begins to be moved into a 
directory just before FileInputDStream.findNewFiles() is called but does not 
become visible untill after it has excecuted and therefore is not included in 
that batch, the file is then ignored in the following batch as its mod time is 
less than the modTimeIgnoreThreshold. This causes data to be ignored in spark 
streaming that shouldnt be, especially when large files are being moved into 
the directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to