Re: Does Spark Streaming need to list all the files in a directory?

Akhil Das Sun, 02 Aug 2015 01:04:01 -0700

I guess it goes through that 500k files
<https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L193>for
the first time and then use a filter from next time.


Thanks
Best Regards

On Fri, Jul 31, 2015 at 4:39 AM, Tathagata Das <t...@databricks.com> wrote:

> For the first time it needs to list them. AFter that the list should be
> cached by the file stream implementation (as far as I remember).
>
>
> On Thu, Jul 30, 2015 at 3:55 PM, Brandon White <bwwintheho...@gmail.com>
> wrote:
>
>> Is this a known bottle neck for Spark Streaming textFileStream? Does it
>> need to list all the current files in a directory before he gets the new
>> files? Say I have 500k files in a directory, does it list them all in order
>> to get the new files?
>>
>
>

Re: Does Spark Streaming need to list all the files in a directory?

Reply via email to