Write intermediate results of MR job to HBase or to HDFS

Marko Dinic Fri, 11 Dec 2015 05:51:19 -0800

Hello,

I have a sequence of MR jobs which produces some intermediate results -
output of one job is input to another one.


Also, some data is always used as input to MR jobs. That data is stored in
HBase.

I would like to know which of the following is more performant:

1) Write intermediate results to HBase in one job and read from HBase in
the next job
2) Write intermediate results to HDFS in one job and read from HDFS in the
next job

Also, about the data which is always used in MR jobs:

1) Read same data each time from HBase (which includes scanning by rowkey)
2) Read data from HBase only first time, store it to HDFS and read from
HDFS every next time (avoid querying the database each time)

Please elaborate why would you choose one.

Best regards,
-- 
Marko Dinic

Write intermediate results of MR job to HBase or to HDFS

Reply via email to