performance consideration when writing to HBase from MR job

Raghava Mutharaju Sat, 05 Jun 2010 15:45:35 -0700

Hi all,

    If HBase is used as the data sink in an MR job, would there be a
performance improvement if a) is done instead of b)


a) all the Puts are collected in Reduce or Map (if there is no reduce)  and
a batch write is done
b) writing out each <K,V> pair using context.write(k, v)

If a) is considered instead of b) then wouldn't there be a violation of
semantics w.r.t KEYOUT, VALUEOUT (because <K, V> is not being output)?? Is
this OK?

Thank you.

Regards,
Raghava

performance consideration when writing to HBase from MR job

Reply via email to