To speak to the CompactionQueue being overrun. What we are seeing is that after the dust has settled, and all of the smaller sequence files have been imported, the CompactionQueue is 0. It does not even try to compact. The only way to get the storeFiles down is to run a major compaction. We have hbase.hstore.blockingStoreFiles=15. So why is it not compacting a region with 300 storeFiles? ________________________________________ From: [email protected] [[email protected]] On Behalf Of Stack [[email protected]] Sent: Friday, August 13, 2010 12:33 AM To: [email protected] Subject: Re: Issue with StoreFiles with bulk import.
On Thu, Aug 12, 2010 at 1:13 PM, Jeremy Carroll <[email protected]> wrote: > I'm currently importing some files into HBase and am running into an problem > with a large number of store files being created. Where you see this Jeremy? In the UI? What kinda numbers are you seeing? > We have some back data which is stored in very large sequence files (3-5 Gb > in size). When we import this data the amount of stores created does not get > out of hand. So when you mapreduce using these big files as source and insert into hbase, its not an issue? > When we switch to smaller sequence files being imported we see that the > number of stores rises quite dramatically. Why you need to change? > I do not know if this is happening because we are flushing the commits more > frequently with smaller files. Probably. Have you tinkered with hbase default settings in any way? Perhaps you are getting better parallelism when lots of small files to chomp on? More concurrent maps/clients? So rate of upload goes up? > I'm wondering if anybody has any advice regarding this issue. My main concern > is during this process we do not finish flushing to disk (And we set > WritetoWal False). We always hit the 90 second timeout due to heavy write > load. As these store files pile up, and they do not get committed to disk, we > run into issues where we could lose a lot of data if something were to crash. > The 90 second timeout is the regionserver timing out against zookeeper? Or is it something else? Storefiles are on the filesystem so what do you mean by the above fear of their not being committed to disk? > I have created screen shots of or monitoring application for HBase which > shows the spikes in activity. > > http://twitpic.com/photos/jeremy_carroll > Nice pictures. 30k storefiles is a good number. They will go up as you are doing a bulk load as the compactor is probably overrun. HBase will usually catch up though especially after the upload completes. Do you have compression enabled? I see regions growing steadily rather than spiking as the comment on the graph says. 500 regions ain't too many... How many servers in your cluster? St.Ack > >
