[ 
https://issues.apache.org/jira/browse/SPARK-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jem Tucker updated SPARK-5221:
------------------------------
    Priority: Major  (was: Minor)

> FileInputDStream "remember window" in certain situations causes files to be 
> ignored 
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-5221
>                 URL: https://issues.apache.org/jira/browse/SPARK-5221
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.1.1, 1.2.0
>            Reporter: Jem Tucker
>
> When batch times are greater than 1 minute, if a file begins to be moved into 
> a directory just before FileInputDStream.findNewFiles() is called but does 
> not become visible untill after it has excecuted and therefore is not 
> included in that batch, the file is then ignored in the following batch as 
> its mod time is less than the modTimeIgnoreThreshold. This causes data to be 
> ignored in spark streaming that shouldnt be, especially when large files are 
> being moved into the directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to