To run major compactions from a shell script: echo "major_compact 'table_name' " | /path/to/hbase/dir/bin/hbase shell
J-D On Tue, May 18, 2010 at 2:29 PM, James Baldassari <[email protected]> wrote: > On Tue, May 18, 2010 at 2:06 PM, Jean-Daniel Cryans > <[email protected]>wrote: > >> > Resending this to [email protected] because my mail to >> > [email protected] failed with "550 550 mail to [email protected] >> > accepted here (state 14)". Is the reply-to getting set correctly? >> Anyway, >> > responses inline... >> >> Yeah that's strange, I just saw it too. It's probably related to the >> fact that Apache infra is moving our mailing lists since we are now a >> top level project. >> >> > >> > Here is a region server log from yesterday: http://pastebin.com/5a04kZVj >> > Every time one of those compactions ran (around 1pm, 4pm, 6pm, etc.) our >> > read performance took a big hit. BTW, is there a way I can tell by >> looking >> > at the logs whether a minor or major compaction is running? Yes, we do >> see >> > lots of I/O wait (as high as 30-40% at times) when the compactions are >> > running and reads are slow. Load averages during compactions can spike >> as >> > high as 60. >> > >> >> Yeah high IO wait will have a direct impact on read performance. Do >> you swap? How much heap was given to the RSs? >> > > Region servers have 9GB heaps. Swap is disabled on all region servers. > > >> >> I see that you're not running with DEBUG, only INFO, so we cannot see >> which type of compaction is going on. >> > > OK, so major vs. minor compaction messages are logged at DEBUG. Maybe the > next time we need to reboot the cluster I'll lower it. > > >> >> > >> > OK, I'll set up a cron to kick majors off when load is at its lowest. >> Can't >> > hurt I suppose. >> >> It's probably the best for the moment. >> > > I manually ran a major compaction around 2-3am this morning, and we haven't > had any compactions since then. I guess running the major at an off-peak > time might have helped, so I'll definitely set up that cron. Is there an > existing HBase script I can leverage to run a compaction via cron, or should > I just roll my own Ruby script? > > >> >> >> HBase limits the rate of inserts to not be overrun by WALs so that if >> >> a machine fails, you don't have to split GBs of files. What about >> >> inserting more slowly into your cluster? Flushes/compactions will be >> >> more spread over time? >> >> >> >> Disabling the WAL during your insert will make it a lot faster, not >> >> necessarily what you want here. >> >> >> > >> > Our inserts are already fairly fast. I think we usually get around >> > 30,000/sec when we do these bulk imports. I'm less concerned about >> insert >> > speed and more concerned about the impact to reads when we do the bulk >> > imports and a compaction is triggered. Do you think it makes sense to >> > disable WAL for the bulk inserts in this case? Would disabling WAL >> decrease >> > the number of compactions that are required? >> >> This is my point, try uploading slower. Disabling WAL, like I said, >> will speed up the upload since you don't write to WAL so compactions >> will happen even at a faster rate! >> > > Thanks for the clarification. It sounds like throttling the bulk updates > will help. > > >> >> > >> > >> > OK, I'm eagerly awaiting the next release. Seems like there have been >> lots >> > of good improvements since 0.20.3! >> >> Lots of people working very hard :P >> >> > >> > >> >> >> >> > >> >> > Thanks, >> >> > James >> >> > >> >> >> > >> >
