Apparently you can pass comma separated folders. Try the suggestion given here --> http://stackoverflow.com/questions/29426246/spark-streaming-textfilestream-not-supporting-wildcards Let me know if this helps
Srikanth On Wed, Feb 17, 2016 at 5:47 PM, Shixiong(Ryan) Zhu <shixi...@databricks.com > wrote: > textFileStream doesn't support that. It only supports monitoring one > folder. > > On Wed, Feb 17, 2016 at 7:20 AM, in4maniac <sa...@skimlinks.com> wrote: > >> Hi all, >> >> I am new to pyspark streaming and I was following a tutorial I saw in the >> internet >> ( >> https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/network_wordcount.py >> ). >> But I replaced the data input with an s3 directory path as: >> >> lines = ssc.textFileStream("s3n://bucket/first/second/third1/") >> >> When I run the code and upload a file to s3n://bucket/first/second/third1/ >> (such as s3n://bucket/first/second/third1/test1.txt), the file gets >> processed as expected. >> >> Now I want it to listen to multiple directories and process files if they >> get uploaded to any of the directories: >> for example : [s3n://bucket/first/second/third1/, >> s3n://bucket/first/second/third2/ and s3n://bucket/first/second/third3/] >> >> I tried to use the pattern similar to sc.TextFile as : >> >> lines = ssc.textFileStream("s3n://bucket/first/second/*/") >> >> But this actually didn't work. Can someone please explain to me how I >> could >> achieve my objective? >> >> thanks in advance !!! >> >> in4maniac >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/listening-to-recursive-folder-structures-in-s3-using-pyspark-streaming-textFileStream-tp26247.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >