Re: Input from nested directory structure

Robert Metzger Thu, 04 Dec 2014 13:03:27 -0800

+1 for adding such a feature. It should be very easy to implement
(basically extend the createInputSplits() method)


On Tue, Dec 2, 2014 at 5:22 PM, Vasiliki Kalavri <[email protected]>
wrote:

> Hi,
>
> thanks for replying!
>
> It would certainly be useful for my use case, but not absolutely
> necessary. If you think other people might find it useful too, I can open a
> issue.
> If not, I believe it would be nice to print a warning when a nested
> directory is given as input path,
> since now, the files that are in the base directory are normally
> processed, but the nested ones are simply ignored.
>
> Cheers,
> V.
>
> On 2 December 2014 at 16:52, Stephan Ewen <[email protected]> wrote:
>
>> Hi!
>>
>> Not right now. The input formats do not recursively enumerate files. In
>> that, we followed the way Hadoop did it.
>>
>> If that is something that is interesting, it should not be too hard to
>> add to the FileInputFormat an option to do a complete recursive traversal
>> of the directory structure.
>>
>> Greetings,
>> Stephan
>>
>>
>> On Tue, Dec 2, 2014 at 4:32 PM, Vasiliki Kalavri <
>> [email protected]> wrote:
>>
>>> Hello all,
>>>
>>> I want to run a Flink log processing job and my input is stored locally
>>> in a nested directory structure, like the following:
>>>
>>> logs_dir/
>>> |-----/machine1/
>>> |-----------/january.log
>>> |-----------/february.log
>>> ...
>>> |-----/machine2/
>>> ...
>>>
>>> etc.
>>>
>>> When providing "logs_dir" as the argument to readTextFile(), nothing is
>>> read and no an exception or error is returned.
>>> Copying the nested individual files machine1/january.log,
>>> machine1/february.log, ..., to the same directory works fine, but I was
>>> wondering whether there is a better way to do this?
>>>
>>> Thank you!
>>> V.
>>>
>>
>>
>

Re: Input from nested directory structure

Reply via email to