Hi,
I am considering doing Reduce-Side-Joins, where one input would be read
from HDFS and another one from a HBase Table.
is it somehow possible to use
TableMapReduceUtil.initTableMapperJob(table, scan, Mapper_HBase.class,
..., job);
and
MultipleInputs(job, path, ..., Mapper_HDFS.class)
in the same time for one job?
It seems, MultipleInputs(...) gets the priority when i tried to use
both. The Mapper_HBase was not executed. It executes, when i remove the
MultipleInputs.
And is there something equivalent to MultipleInputs() for HBase Tables?
e.g. MultipleTableInputs()? I saw there was a request here
https://issues.apache.org/jira/browse/HBASE-2965
A workaround would be to write the Scan Results to HDFS first and do the
reduce-side join by using MultipleInputs. But i wanted to avoid this
additional I/O overhead.
Thanks,
Christopher