That doesn't seem like a good solution unfortunately as I would be needing this to work in a production environment. Do you know why the limitation exists for FileInputDStream in the first place? Unless I'm missing something important about how some of the internals work I don't see why this feature could be added in at some point.
On Fri, Apr 3, 2015 at 12:47 PM, Tathagata Das <t...@databricks.com> wrote: > I sort-a-hacky workaround is to use a queueStream where you can manually > create RDDs (using sparkContext.hadoopFile) and insert into the queue. Note > that this is for testing only as queueStream does not work with driver > fautl recovery. > > TD > > On Fri, Apr 3, 2015 at 12:23 PM, adamgerst <adamge...@gmail.com> wrote: > >> So after pulling my hair out for a bit trying to convert one of my >> standard >> spark jobs to streaming I found that FileInputDStream does not support >> nested folders (see the brief mention here >> >> http://spark.apache.org/docs/latest/streaming-programming-guide.html#basic-sources >> the fileStream method returns a FileInputDStream). So before, for my >> standard job, I was reading from say >> >> s3n://mybucket/2015/03/02/*log >> >> And could also modify it to simply get an entire months worth of logs. >> Since the logs are split up based upon their date, when the batch ran for >> the day, I simply passed in a parameter of the date to make sure I was >> reading the correct data >> >> But since I want to turn this job into a streaming job I need to simply do >> something like >> >> s3n://mybucket/*log >> >> This would totally work fine if it were a standard spark application, but >> fails for streaming. Is there anyway I can get around this limitation? >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-FileStream-Nested-File-Support-tp22370.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >