No, I'm not aware of anybody working on extending the Hadoop compatibility support. I'll also have no time to work on this any time soon :-(
2018-01-13 1:34 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: > Any progress on this Fabian? HBase bulk loading is a common task for us > and it's very annoying and uncomfortable to run a separate YARN job to > accomplish it... > > On 10 Apr 2015 12:26, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: > > Great! That will be awesome. > Thank you Fabian > > On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <fhue...@gmail.com> wrote: > >> Hmm, that's a tricky question ;-) I would need to have a closer look. But >> getting custom comparators for sorting and grouping into the Combiner is >> not that trivial because it touches API, Optimizer, and Runtime code. >> However, I did that before for the Reducer and with the recent addition of >> groupCombine the Reducer changes might be just applied to combine. >> >> I'll be gone next week, but if you want to, we can have a closer look at >> the problem after that. >> >> 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: >> >>> I think I could also take care of it if somebody can help me and guide >>> me a little bit.. >>> How long do you think it will require to complete such a task? >>> >>> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fhue...@gmail.com> >>> wrote: >>> >>>> We had an effort to execute any HadoopMR program by simply specifying >>>> the JobConf and execute it (even embedded in regular Flink programs). >>>> We got quite far but did not complete (counters and custom grouping / >>>> sorting functions for Combiners are missing if I remember correctly). >>>> I don't think that anybody is working on that right now, but it would >>>> definitely be a cool feature. >>>> >>>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: >>>> >>>>> Hi guys, >>>>> >>>>> I have a nice question about Hadoop compatibility. >>>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html >>>>> you say that you can reuse existing mapreduce programs. >>>>> Could it be possible to manage also complex mapreduce programs like >>>>> HBase BulkImport that use for example a custom partioner >>>>> (org.apache.hadoop.mapreduce.Partitioner)? >>>>> >>>>> In the bulk-import examples the call >>>>> HFileOutputFormat2.configureIncrementalLoadMap >>>>> that sets a series of job parameters (like partitioner, mapper, reducers, >>>>> etc) -> http://pastebin.com/8VXjYAEf. >>>>> The full code of it can be seen at https://github.com/apache/h >>>>> base/blob/master/hbase-server/src/main/java/org/apache/ >>>>> hadoop/hbase/mapreduce/HFileOutputFormat2.java. >>>>> >>>>> Do you think there's any change to make it run in flink? >>>>> >>>>> Best, >>>>> Flavio >>>>> >>>> >>>> >>> >> > >