Re: Hadoop compatibility and HBase bulk loading

Flavio Pompermaier Tue, 16 Jan 2018 01:08:30 -0800

Do you think is that complex to support it? I think we can try to implement
it if someone could give us some support (at least some big picture)


On Tue, Jan 16, 2018 at 10:02 AM, Fabian Hueske <fhue...@gmail.com> wrote:

> No, I'm not aware of anybody working on extending the Hadoop compatibility
> support.
> I'll also have no time to work on this any time soon :-(
>
> 2018-01-13 1:34 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>:
>
>> Any progress on this Fabian? HBase bulk loading is a common task for us
>> and it's very annoying and uncomfortable to run a separate YARN job to
>> accomplish it...
>>
>> On 10 Apr 2015 12:26, "Flavio Pompermaier" <pomperma...@okkam.it> wrote:
>>
>> Great! That will be awesome.
>> Thank you Fabian
>>
>> On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <fhue...@gmail.com>
>> wrote:
>>
>>> Hmm, that's a tricky question ;-) I would need to have a closer look.
>>> But getting custom comparators for sorting and grouping into the Combiner
>>> is not that trivial because it touches API, Optimizer, and Runtime code.
>>> However, I did that before for the Reducer and with the recent addition of
>>> groupCombine the Reducer changes might be just applied to combine.
>>>
>>> I'll be gone next week, but if you want to, we can have a closer look at
>>> the problem after that.
>>>
>>> 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>>>
>>>> I think I could also take care of it if somebody can help me and guide
>>>> me a little bit..
>>>> How long do you think it will require to complete such a task?
>>>>
>>>> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fhue...@gmail.com>
>>>> wrote:
>>>>
>>>>> We had an effort to execute any HadoopMR program by simply specifying
>>>>> the JobConf and execute it (even embedded in regular Flink programs).
>>>>> We got quite far but did not complete (counters and custom grouping /
>>>>> sorting functions for Combiners are missing if I remember correctly).
>>>>> I don't think that anybody is working on that right now, but it would
>>>>> definitely be a cool feature.
>>>>>
>>>>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> I have a nice question about Hadoop compatibility.
>>>>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html
>>>>>> you say that you can reuse existing mapreduce programs.
>>>>>> Could it be possible to manage also complex mapreduce programs like
>>>>>> HBase BulkImport that use for example a custom partioner
>>>>>> (org.apache.hadoop.mapreduce.Partitioner)?
>>>>>>
>>>>>> In the bulk-import examples the call 
>>>>>> HFileOutputFormat2.configureIncrementalLoadMap
>>>>>> that sets a series of job parameters (like partitioner, mapper, reducers,
>>>>>> etc) -> http://pastebin.com/8VXjYAEf.
>>>>>> The full code of it can be seen at https://github.com/apache/h
>>>>>> base/blob/master/hbase-server/src/main/java/org/apache/hadoo
>>>>>> p/hbase/mapreduce/HFileOutputFormat2.java.
>>>>>>
>>>>>> Do you think there's any change to make it run in flink?
>>>>>>
>>>>>> Best,
>>>>>> Flavio
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>


-- 
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 041809 <+39%200461%20041809>

Re: Hadoop compatibility and HBase bulk loading

Reply via email to