Re: Hadoop compatibility and HBase bulk loading

Fabian Hueske Tue, 16 Jan 2018 01:03:54 -0800

No, I'm not aware of anybody working on extending the Hadoop compatibility
support.
I'll also have no time to work on this any time soon :-(


2018-01-13 1:34 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>:

> Any progress on this Fabian? HBase bulk loading is a common task for us
> and it's very annoying and uncomfortable to run a separate YARN job to
> accomplish it...
>
> On 10 Apr 2015 12:26, "Flavio Pompermaier" <pomperma...@okkam.it> wrote:
>
> Great! That will be awesome.
> Thank you Fabian
>
> On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <fhue...@gmail.com> wrote:
>
>> Hmm, that's a tricky question ;-) I would need to have a closer look. But
>> getting custom comparators for sorting and grouping into the Combiner is
>> not that trivial because it touches API, Optimizer, and Runtime code.
>> However, I did that before for the Reducer and with the recent addition of
>> groupCombine the Reducer changes might be just applied to combine.
>>
>> I'll be gone next week, but if you want to, we can have a closer look at
>> the problem after that.
>>
>> 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>>
>>> I think I could also take care of it if somebody can help me and guide
>>> me a little bit..
>>> How long do you think it will require to complete such a task?
>>>
>>> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fhue...@gmail.com>
>>> wrote:
>>>
>>>> We had an effort to execute any HadoopMR program by simply specifying
>>>> the JobConf and execute it (even embedded in regular Flink programs).
>>>> We got quite far but did not complete (counters and custom grouping /
>>>> sorting functions for Combiners are missing if I remember correctly).
>>>> I don't think that anybody is working on that right now, but it would
>>>> definitely be a cool feature.
>>>>
>>>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> I have a nice question about Hadoop compatibility.
>>>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html
>>>>> you say that you can reuse existing mapreduce programs.
>>>>> Could it be possible to manage also complex mapreduce programs like
>>>>> HBase BulkImport that use for example a custom partioner
>>>>> (org.apache.hadoop.mapreduce.Partitioner)?
>>>>>
>>>>> In the bulk-import examples the call 
>>>>> HFileOutputFormat2.configureIncrementalLoadMap
>>>>> that sets a series of job parameters (like partitioner, mapper, reducers,
>>>>> etc) -> http://pastebin.com/8VXjYAEf.
>>>>> The full code of it can be seen at https://github.com/apache/h
>>>>> base/blob/master/hbase-server/src/main/java/org/apache/
>>>>> hadoop/hbase/mapreduce/HFileOutputFormat2.java.
>>>>>
>>>>> Do you think there's any change to make it run in flink?
>>>>>
>>>>> Best,
>>>>> Flavio
>>>>>
>>>>
>>>>
>>>
>>
>
>

Re: Hadoop compatibility and HBase bulk loading

Reply via email to