bq. 'spilling map output' occupied most of whole time. Do you mind giving more detail on the above (percentage of job runtime) ?
Which release of hadoop / hbase are you using ? Cheers On Tue, Aug 18, 2015 at 11:11 PM, dong.yajun <[email protected]> wrote: > Hello, > > Which is the fastest way to dump the content of Hbase table to Hdfs? is > it possible to use the hbase snapshot + Spark to do this? > > now we have already use the hbase snapshot + mapreduce-v2(does not via the > Htable) to convert the HFiles to OrcFile, but we found the 'spilling map > output' occupied most of whole time. so the spark can decrease the cost? > > map task: read the hfile, and convert it to KeyValues > > reduce task: merge the keyvalues of same rowkey > > thanks. > > -- > *Ric Dong* >
