Re: Input from nested directory structure

Ufuk Celebi Fri, 05 Dec 2014 01:01:47 -0800

+1 I find this useful as well.

On 04 Dec 2014, at 22:02, Robert Metzger <[email protected]> wrote:


> +1 for adding such a feature. It should be very easy to implement (basically 
> extend the createInputSplits() method)
> 
> On Tue, Dec 2, 2014 at 5:22 PM, Vasiliki Kalavri <[email protected]> 
> wrote:
> Hi,
> 
> thanks for replying!
> 
> It would certainly be useful for my use case, but not absolutely necessary. 
> If you think other people might find it useful too, I can open a issue. 
> If not, I believe it would be nice to print a warning when a nested directory 
> is given as input path, 
> since now, the files that are in the base directory are normally processed, 
> but the nested ones are simply ignored.
> 
> Cheers,
> V.
> 
> On 2 December 2014 at 16:52, Stephan Ewen <[email protected]> wrote:
> Hi!
> 
> Not right now. The input formats do not recursively enumerate files. In that, 
> we followed the way Hadoop did it.
> 
> If that is something that is interesting, it should not be too hard to add to 
> the FileInputFormat an option to do a complete recursive traversal of the 
> directory structure.
> 
> Greetings,
> Stephan
> 
> 
> On Tue, Dec 2, 2014 at 4:32 PM, Vasiliki Kalavri <[email protected]> 
> wrote:
> Hello all,
> 
> I want to run a Flink log processing job and my input is stored locally in a 
> nested directory structure, like the following:
> 
> logs_dir/
> |-----/machine1/
> |-----------/january.log
> |-----------/february.log
> ...
> |-----/machine2/
> ...
> 
> etc.
> 
> When providing "logs_dir" as the argument to readTextFile(), nothing is read 
> and no an exception or error is returned.
> Copying the nested individual files machine1/january.log, 
> machine1/february.log, ..., to the same directory works fine, but I was 
> wondering whether there is a better way to do this?
> 
> Thank you!
> V.
> 
> 
>

Re: Input from nested directory structure

Reply via email to