Apparently you can pass comma separated folders.
Try the suggestion given here -->
http://stackoverflow.com/questions/29426246/spark-streaming-textfilestream-not-supporting-wildcards
Let me know if this helps

Srikanth

On Wed, Feb 17, 2016 at 5:47 PM, Shixiong(Ryan) Zhu <shixi...@databricks.com
> wrote:

> textFileStream doesn't support that. It only supports monitoring one
> folder.
>
> On Wed, Feb 17, 2016 at 7:20 AM, in4maniac <sa...@skimlinks.com> wrote:
>
>> Hi all,
>>
>> I am new to pyspark streaming and I was following a tutorial I saw in the
>> internet
>> (
>> https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/network_wordcount.py
>> ).
>> But I replaced the data input with an s3 directory path as:
>>
>> lines = ssc.textFileStream("s3n://bucket/first/second/third1/")
>>
>> When I run the code and upload a file to s3n://bucket/first/second/third1/
>> (such as s3n://bucket/first/second/third1/test1.txt), the file gets
>> processed as expected.
>>
>> Now I want it to listen to multiple directories and process files if they
>> get uploaded to any of the directories:
>> for example : [s3n://bucket/first/second/third1/,
>> s3n://bucket/first/second/third2/ and s3n://bucket/first/second/third3/]
>>
>> I tried to use the pattern similar to sc.TextFile as :
>>
>> lines = ssc.textFileStream("s3n://bucket/first/second/*/")
>>
>> But this actually didn't work. Can someone please explain to me how I
>> could
>> achieve my objective?
>>
>> thanks in advance !!!
>>
>> in4maniac
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/listening-to-recursive-folder-structures-in-s3-using-pyspark-streaming-textFileStream-tp26247.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to