Resending this to [email protected] because my mail to [email protected] failed with "550 550 mail to [email protected] not accepted here (state 14)". Is the reply-to getting set correctly? Anyway, responses inline...
On Tue, May 18, 2010 at 1:15 PM, Jean-Daniel Cryans <[email protected]>wrote: > > > > 1. Do more frequent, smaller minor compactions. I guess we would > accomplish > > this by lowering hbase.hstore.compactionThreshold, > > hbase.hstore.blockingStoreFiles, and/or hbase.hstore.compaction.max? > > Without any log files to analyze, it's hard to tell exactly what kind > of compaction(minor/major) and/or split is happening. Minor > compactions don't rewrite all store files and don't try to merge big > files. Do you monitor your cluster? Do you see a lot of IO wait when > reads are slowing down? > Here is a region server log from yesterday: http://pastebin.com/5a04kZVj Every time one of those compactions ran (around 1pm, 4pm, 6pm, etc.) our read performance took a big hit. BTW, is there a way I can tell by looking at the logs whether a minor or major compaction is running? Yes, we do see lots of I/O wait (as high as 30-40% at times) when the compactions are running and reads are slow. Load averages during compactions can spike as high as 60. > > > > > 2. Try to prevent compactions altogether and just cron one major > compaction > > per day when the system load is at its lowest. Not sure that this is a > good > > idea. Does anyone currently do this? > > Cron major compactions, although I still can't tell if it's what you're > hitting. > OK, I'll set up a cron to kick majors off when load is at its lowest. Can't hurt I suppose. > > > > > 3. I noticed that we're sometimes getting messages like "Too many hlogs: > > logs=33, maxlogs=32; forcing flush of 24 regions(s)". Should we disable > the > > write-ahead log when doing bulk updates? I'm not entirely clear on the > > relationship between log flushing/rolling and minor/major compactions. > As I > > understand it, a log flush will create HFiles, which might then trigger a > > minor compaction. Is that correct? Would disabling WAL help? > > HBase limits the rate of inserts to not be overrun by WALs so that if > a machine fails, you don't have to split GBs of files. What about > inserting more slowly into your cluster? Flushes/compactions will be > more spread over time? > > Disabling the WAL during your insert will make it a lot faster, not > necessarily what you want here. > Our inserts are already fairly fast. I think we usually get around 30,000/sec when we do these bulk imports. I'm less concerned about insert speed and more concerned about the impact to reads when we do the bulk imports and a compaction is triggered. Do you think it makes sense to disable WAL for the bulk inserts in this case? Would disabling WAL decrease the number of compactions that are required? > > > > > 4. Hardware upgrade. We're running one 7200RPM SATA disk per > > datanode/regionserver now, so our I/O throughput probably isn't great. > We > > will soon be testing a new hardware configuration with 2 SSDs per node. > I'm > > sure this will help, but I'm looking for some short-term solutions that > will > > work until we migrate to the new hardware. > > Like Ryan said, just shove as much 7.2k RPM disks as you can in each > machine. Google has 12 per borg (number from their Petasort > benchmark). > Yes, thanks to you and Ryan for pointing this out. I didn't realize how important it was to have multiple disks in each node. The performance issues we've been having are probably due to I/O bottlenecks more than anything else. If a hardware upgrade is the final answer, that's fine. I was just hoping for something that would help in the short term until we get hardware with more/faster disks. > > > > > Have there been any performance improvements since 0.20.3 (other than > > HBASE-2180 which we already have) that might help? What is the best > upgrade > > path if we were to upgrade our production HBase cluster is the next 1-2 > > weeks? 0.20.5? Build a snapshot from trunk/0.21? CDH3? > > HBASE-2248 will help you a lot, deploy 0.20.5 on a dev env when it's > ready then when you are confident restart your HBase prod on the new > jars. > OK, I'm eagerly awaiting the next release. Seems like there have been lots of good improvements since 0.20.3! > > > > > Thanks, > > James > > >
