On Mon, Sep 27, 2010 at 4:26 PM, Matt Corgan <[email protected]> wrote: > I'm sequentially importing ~1 billion small rows (32 byte keys) into a table > called StatAreaModelLink. I realize that sequential insertion isn't > efficient by design, but I'm not in a hurry so I let it run all weekend. > It's been proceeding quickly except for ~20s stalls every minute or so. > > I also noticed that one regionserver was getting all the load and just > figured that after each split the later region stayed on the current node. > Turns out the last region stopped splitting altogether and now has a 33gb > store file. >
Interesting. > I started importing on 0.20.6, but switched to 0.89.20100726 today. They > both seem to act similarly. Using all default settings except VERSIONS=1. > > That regionserver's logs constantly say "Compaction requested for region... > because regionserver60020.cacheFlusher" > > http://pastebin.com/WJDs7ZbM > > Am I doing something wrong, like not giving it enough time to flush/compact? > There are 23 previous regions that look ok. > I wonder if a compaction is running and its just taking a long time. Grep for 'Starting compaction' in your logs. See when last started? I see you continue to flush. Try taking the load off. You might also do a: > bin/hadoop fs -lsr /hbase ... and pastbin it. I'd be looking for a region with a bunch of files in it. Finally, you've read about the bulk load [1] tool? St.Ack 1. http://hbase.apache.org/docs/r0.89.20100726/bulk-loads.html St.Ack
