Oh yes, this was a bug and it has been fixed. Checkout from the master
branch!

https://issues.apache.org/jira/browse/SPARK-2362?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20Streaming%20ORDER%20BY%20created%20DESC%2C%20priority%20ASC

TD


On Mon, Jul 7, 2014 at 7:11 AM, Luis Ángel Vicente Sánchez <
langel.gro...@gmail.com> wrote:

> I have a basic spark streaming job that is watching a folder, processing
> any new file and updating a column family in cassandra using the new
> cassandra-spark-driver.
>
> I think there is a problem with SparkStreamingContext.textFileStream... if
> I start my job in local mode with no files in the folder that is watched
> and then I copy a bunch of files, sometimes spark is continually processing
> those files again and again.
>
> I have noticed that it usually happens when spark doesn't detect all new
> files in one go... i.e. I copied 6 files and spark detected 3 of them as
> new and processed them; then it detected the other 3 as new and processed
> them. After it finished to process all 6 files, it detected again the first
> 3 files as new files and processed them... then the other 3... and again...
> and again... and again.
>
> Should I rise a JIRA issue?
>
> Regards,
>
> Luis
>

Reply via email to