Great! Append data to HDFS will be a very useful feature! I think that then you should think also how to read efficiently directories containing a lot of small files. I know that this can be quite inefficient so that's why in Spark they give you a coalesce operation to be able to deal siwth such cases..
On Tue, Dec 9, 2014 at 2:39 PM, Vasiliki Kalavri <[email protected]> wrote: > Hi! > > Yes, I took a look into this. I hope I'll be able to find some time to > work on it this week. > I'll keep you updated :) > > Cheers, > V. > > On 9 December 2014 at 14:03, Robert Metzger <[email protected]> wrote: > >> It seems that Vasia started working on adding support for recursive >> reading: https://issues.apache.org/jira/browse/FLINK-1307. >> I'm still occupied with refactoring the YARN client, the HDFS refactoring >> is next on my list. >> >> On Tue, Dec 9, 2014 at 11:59 AM, Flavio Pompermaier <[email protected] >> > wrote: >> >>> Any news about this Robert? >>> >>> Thanks in advance, >>> Flavio >>> >>> On Thu, Dec 4, 2014 at 10:03 PM, Robert Metzger <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> I think there is no support for appending to HDFS files in Flink yet. >>>> HDFS supports it, but there are some adjustments in the system required >>>> (not deleting / creating directories before writing; exposing the append() >>>> methods in the FS abstractions). >>>> >>>> I'm planning to work on the FS abstractions in the next week, if I have >>>> enough time, I can also look into adding support for append(). >>>> >>>> Another approach could be adding support for recursively reading >>>> directories with the input formats. Vasia asked for this feature a few days >>>> ago on the mailing list. If we would have that feature, you could just >>>> write to a directory and read the parent directory (with all the dirs for >>>> the appends). >>>> >>>> Best, >>>> Robert >>>> >>>> On Thu, Dec 4, 2014 at 5:59 PM, Flavio Pompermaier < >>>> [email protected]> wrote: >>>> >>>>> Hi guys, >>>>> how can I efficiently appends data (as plain strings or also avro >>>>> records) to HDFS using Flink? >>>>> Do I need to use Flume or can I avoid it? >>>>> >>>>> Thanks in advance, >>>>> Flavio >>>>> >>>>> >>>> >>> >> >
