On Mon, Sep 27, 2010 at 4:26 PM, Matt Corgan <[email protected]> wrote:
> I'm sequentially importing ~1 billion small rows (32 byte keys) into a table
> called StatAreaModelLink.  I realize that sequential insertion isn't
> efficient by design, but I'm not in a hurry so I let it run all weekend.
>  It's been proceeding quickly except for ~20s stalls every minute or so.
>
> I also noticed that one regionserver was getting all the load and just
> figured that after each split the later region stayed on the current node.
>  Turns out the last region stopped splitting altogether and now has a 33gb
> store file.
>

Interesting.


> I started importing on 0.20.6, but switched to 0.89.20100726 today.  They
> both seem to act similarly.  Using all default settings except VERSIONS=1.
>
> That regionserver's logs constantly say "Compaction requested for region...
> because regionserver60020.cacheFlusher"
>
> http://pastebin.com/WJDs7ZbM
>
> Am I doing something wrong, like not giving it enough time to flush/compact?
>  There are 23 previous regions that look ok.
>

I wonder if a compaction is running and its just taking a long time.
Grep for 'Starting compaction' in your logs.  See when last started?

I see you continue to flush.  Try taking the load off.

You might also do a:

> bin/hadoop fs -lsr /hbase

... and pastbin it.  I'd be looking for a region with a bunch of files in it.

Finally, you've read about the bulk load [1] tool?

St.Ack

1. http://hbase.apache.org/docs/r0.89.20100726/bulk-loads.html
St.Ack

Reply via email to