On 3/30/11 8:39 PM, Stack wrote:
What is slow?  The running of the LoadIncrementHFiles or the copy?

Its the LoadIncrementHFiles portion.

If
the former, is it because the table its loading into has different
boundaries than those of the HFiles so the HFiles have to be split?

I'm sure that could be one aspect of it, however from the logs it looks like <1% of the hfiles we're loading have to be split. Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually thinking our problem is that this code loads the hfiles sequentially. Our largest table has over 2500 regions and the data being loaded is fairly well distributed across them, so there end up being around 2500 HFiles for each load period. At 1-2 seconds per HFile that means the loading process is very time consuming.

On the primary cluster (16 regionservers) one of this set of HFiles loads in ~350s vs ~3200s on the backup (with 4 regionservers). Overall the nodes on the backup cluster are running at around 5% CPU (and similarly minimal disk and network usage). So we have plenty of resources to throw at the problem, its just a matter of determining what we can do here other than adding additional nodes to the cluster.

My first thoughts are to try to add some parallelism, either by splitting the HFiles into multiple chunks for separate load instances, or to change LoadIncrementHFiles itself to use multiple loading threads.

Is your data only coming in via bulk load?

Yes, everything we put into hbase is via bulk load. We found it to be a huge improvement over doing individual Puts from the the M/R jobs.

- Adam

Reply via email to