Re: HDFS append

Flavio Pompermaier Tue, 09 Dec 2014 06:06:33 -0800

Great! Append data to HDFS will be a very useful feature!
I think that then you should think also how to read efficiently directories
containing a lot of small files. I know that this can be quite inefficient
so that's why in Spark they give you a coalesce operation to be able to
deal siwth such cases..


On Tue, Dec 9, 2014 at 2:39 PM, Vasiliki Kalavri <[email protected]>
wrote:

> Hi!
>
> Yes, I took a look into this. I hope I'll be able to find some time to
> work on it this week.
> I'll keep you updated :)
>
> Cheers,
> V.
>
> On 9 December 2014 at 14:03, Robert Metzger <[email protected]> wrote:
>
>> It seems that Vasia started working on adding support for recursive
>> reading: https://issues.apache.org/jira/browse/FLINK-1307.
>> I'm still occupied with refactoring the YARN client, the HDFS refactoring
>> is next on my list.
>>
>> On Tue, Dec 9, 2014 at 11:59 AM, Flavio Pompermaier <[email protected]
>> > wrote:
>>
>>> Any news about this Robert?
>>>
>>> Thanks in advance,
>>> Flavio
>>>
>>> On Thu, Dec 4, 2014 at 10:03 PM, Robert Metzger <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I think there is no support for appending to HDFS files in Flink yet.
>>>> HDFS supports it, but there are some adjustments in the system required
>>>> (not deleting / creating directories before writing; exposing the append()
>>>> methods in the FS abstractions).
>>>>
>>>> I'm planning to work on the FS abstractions in the next week, if I have
>>>> enough time, I can also look into adding support for append().
>>>>
>>>> Another approach could be adding support for recursively reading
>>>> directories with the input formats. Vasia asked for this feature a few days
>>>> ago on the mailing list. If we would have that feature, you could just
>>>> write to a directory and read the parent directory (with all the dirs for
>>>> the appends).
>>>>
>>>> Best,
>>>> Robert
>>>>
>>>> On Thu, Dec 4, 2014 at 5:59 PM, Flavio Pompermaier <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi guys,
>>>>> how can I efficiently appends data (as plain strings or also avro
>>>>> records) to  HDFS using Flink?
>>>>> Do I need to use Flume or can I avoid it?
>>>>>
>>>>> Thanks in advance,
>>>>> Flavio
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: HDFS append

Reply via email to