+1 for adding such a feature. It should be very easy to implement (basically extend the createInputSplits() method)
On Tue, Dec 2, 2014 at 5:22 PM, Vasiliki Kalavri <[email protected]> wrote: > Hi, > > thanks for replying! > > It would certainly be useful for my use case, but not absolutely > necessary. If you think other people might find it useful too, I can open a > issue. > If not, I believe it would be nice to print a warning when a nested > directory is given as input path, > since now, the files that are in the base directory are normally > processed, but the nested ones are simply ignored. > > Cheers, > V. > > On 2 December 2014 at 16:52, Stephan Ewen <[email protected]> wrote: > >> Hi! >> >> Not right now. The input formats do not recursively enumerate files. In >> that, we followed the way Hadoop did it. >> >> If that is something that is interesting, it should not be too hard to >> add to the FileInputFormat an option to do a complete recursive traversal >> of the directory structure. >> >> Greetings, >> Stephan >> >> >> On Tue, Dec 2, 2014 at 4:32 PM, Vasiliki Kalavri < >> [email protected]> wrote: >> >>> Hello all, >>> >>> I want to run a Flink log processing job and my input is stored locally >>> in a nested directory structure, like the following: >>> >>> logs_dir/ >>> |-----/machine1/ >>> |-----------/january.log >>> |-----------/february.log >>> ... >>> |-----/machine2/ >>> ... >>> >>> etc. >>> >>> When providing "logs_dir" as the argument to readTextFile(), nothing is >>> read and no an exception or error is returned. >>> Copying the nested individual files machine1/january.log, >>> machine1/february.log, ..., to the same directory works fine, but I was >>> wondering whether there is a better way to do this? >>> >>> Thank you! >>> V. >>> >> >> >
