-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

You could retroactively union an existing DStream with one from a
newly created file. Then when another file is "detected", you would
need to re-union the stream an create another DStream. It seems like
the implementation of FileInputDStream only looks for files in the
directory and the filtering is applied using
FileSystem.listStatus(dir, filter) method which does not provide
recursive listing.

A cleaner solution would be to extend FileInputDStream and override
the findNewFiles(...) with the ability to recursively list files
(probably by using FileSystem.listFiles.

Refer: http://stackoverflow.com/a/25645225/113411

- -- Ankur


On 13/05/2015 02:03, lisendong wrote:
> but in fact the directories are not ready at the beginning to my
> task .
> 
> for example:
> 
> /user/root/2015/05/11/data.txt /user/root/2015/05/12/data.txt 
> /user/root/2015/05/13/data.txt
> 
> like this.
> 
> and one new directory one day.
> 
> how to create the new DStream for tomorrow’s new
> directory(/user/root/2015/05/13/) ??
> 
> 
>> 在 2015年5月13日,下午4:59,Ankur Chauhan <achau...@brightcove.com> 写道:
>> 
> I would suggest creating one DStream per directory and then using 
> StreamingContext#union(...) to get a union DStream.
> 
> -- Ankur
> 
> On 13/05/2015 00:53, hotdog wrote:
>>>> I want to use use fileStream in spark streaming to monitor
>>>> multi hdfs directories, such as:
>>>> 
>>>> val list_join_action_stream = ssc.fileStream[LongWritable,
>>>> Text, TextInputFormat]("/user/root/*/*", check_valid_file(_),
>>>>  false).map(_._2.toString).print
>>>> 
>>>> 
>>>> Buy the way, i could not under the meaning of the three class
>>>> : LongWritable, Text, TextInputFormat
>>>> 
>>>> but it doesn't work...
>>>> 
>>>> 
>>>> 
>>>> -- View this message in context: 
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-monitor-
mul
>
>>>> 
ti-directories-in-spark-streaming-task-tp22863.html
>>>> 
>>>> 
> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
>>>> 
>>>> -------------------------------------------------------------------
- --
>>>>
>>>>
>
>>>> 
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>> 
>> [attachment]
>> 
>> 0x6D461C4A.asc download: http://u.163.com/t0/fqZhSPbA
>> 
>> preview: http://u.163.com/t0/2LRiaRy
>> 
>> 
>> 0x6D461C4A.asc.sig download: http://u.163.com/t0/Ij1N9
>> 
>> <0x6D461C4A.asc><0x6D461C4A.asc.sig>
> 
> 
-----BEGIN PGP SIGNATURE-----

iQEbBAEBAgAGBQJVUxl2AAoJEOSJAMhvLp3L4dsH+KxSz/YF7UUiwZDiP36umD1X
3LVU2Io3CGVRDI4OEYs1mvSE2DqMx820DHApl0VxxkYdLmAPUtaAc1zAtWOPgiqQ
GuL0jfdwkVGOBsbF6cycJe6XWMbJUyty0tU1IsvS23OvuhKD2ulgBJieyY/quvSs
dIdFDu4bNhVhuz1KN+Vm44cdfZ/rHchOoaOnSej5zOglSerr/hTFyGZUdalAYMxq
t2P2M2mkHrlqHqqt4EMtEOyi6iDvVPaiaJB8NQ6xbBDs9fSmv3noB5fl19hPc9gk
8G4JbzZkD01Nh2ZRZgH1voE7NPI4P/Z6UTSJBR9qdIgtinoP5JLSBNpRew4WuA==
=7vh9
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to