Jem Tucker created SPARK-5221: --------------------------------- Summary: FileInputDStream "remember window" in certain situations causes files to be ignored Key: SPARK-5221 URL: https://issues.apache.org/jira/browse/SPARK-5221 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.0, 1.1.1 Reporter: Jem Tucker Priority: Minor
When batch times are greater than 1 minute, if a file begins to be moved into a directory just before FileInputDStream.findNewFiles() is called but does not become visible untill after it has excecuted and therefore is not included in that batch, the file is then ignored in the following batch as its mod time is less than the modTimeIgnoreThreshold. This causes data to be ignored in spark streaming that shouldnt be, especially when large files are being moved into the directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org