Re: Spark Streaming FileStream Nested File Support

Adam Ritter Fri, 03 Apr 2015 13:09:43 -0700

That doesn't seem like a good solution unfortunately as I would be needing
this to work in a production environment.  Do you know why the limitation
exists for FileInputDStream in the first place?  Unless I'm missing
something important about how some of the internals work I don't see why
this feature could be added in at some point.


On Fri, Apr 3, 2015 at 12:47 PM, Tathagata Das <t...@databricks.com> wrote:

> I sort-a-hacky workaround is to use a queueStream where you can manually
> create RDDs (using sparkContext.hadoopFile) and insert into the queue. Note
> that this is for testing only as queueStream does not work with driver
> fautl recovery.
>
> TD
>
> On Fri, Apr 3, 2015 at 12:23 PM, adamgerst <adamge...@gmail.com> wrote:
>
>> So after pulling my hair out for a bit trying to convert one of my
>> standard
>> spark jobs to streaming I found that FileInputDStream does not support
>> nested folders (see the brief mention here
>>
>> http://spark.apache.org/docs/latest/streaming-programming-guide.html#basic-sources
>> the fileStream method returns a FileInputDStream).  So before, for my
>> standard job, I was reading from say
>>
>> s3n://mybucket/2015/03/02/*log
>>
>> And could also modify it to simply get an entire months worth of logs.
>> Since the logs are split up based upon their date, when the batch ran for
>> the day, I simply passed in a parameter of the date to make sure I was
>> reading the correct data
>>
>> But since I want to turn this job into a streaming job I need to simply do
>> something like
>>
>> s3n://mybucket/*log
>>
>> This would totally work fine if it were a standard spark application, but
>> fails for streaming.  Is there anyway I can get around this limitation?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-FileStream-Nested-File-Support-tp22370.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: Spark Streaming FileStream Nested File Support

Reply via email to