On Fri, Aug 13, 2010 at 8:00 AM, Jeremy Carroll <[email protected]> wrote: > To speak to the CompactionQueue being overrun. What we are seeing is that > after the dust has settled, and all of the smaller sequence files have been > imported, the CompactionQueue is 0. It does not even try to compact. The only > way to get the storeFiles down is to run a major compaction. We have > hbase.hstore.blockingStoreFiles=15. So why is it not compacting a region with > 300 storeFiles? >
If you don't mind, post a good sample from your regionserver logs -- one where you are seeing the 300 storefiles per region -- and your hbase-site.xml as well as a dump of your table schema. You can mail offlist if you would prefer. St.Ack ________________________________________ > From: [email protected] [[email protected]] On Behalf Of Stack > [[email protected]] > Sent: Friday, August 13, 2010 12:33 AM > To: [email protected] > Subject: Re: Issue with StoreFiles with bulk import. > > On Thu, Aug 12, 2010 at 1:13 PM, Jeremy Carroll > <[email protected]> wrote: >> I'm currently importing some files into HBase and am running into an problem >> with a large number of store files being created. > > Where you see this Jeremy? In the UI? What kinda numbers are you seeing? > > >> We have some back data which is stored in very large sequence files (3-5 Gb >> in size). When we import this data the amount of stores created does not get >> out of hand. > > > So when you mapreduce using these big files as source and insert into > hbase, its not an issue? > > >> When we switch to smaller sequence files being imported we see that the >> number of stores rises quite dramatically. > > > Why you need to change? > > >> I do not know if this is happening because we are flushing the commits more >> frequently with smaller files. > > Probably. Have you tinkered with hbase default settings in any way? > > Perhaps you are getting better parallelism when lots of small files to > chomp on? More concurrent maps/clients? So rate of upload goes up? > > >> I'm wondering if anybody has any advice regarding this issue. My main >> concern is during this process we do not finish flushing to disk (And we set >> WritetoWal False). We always hit the 90 second timeout due to heavy write >> load. As these store files pile up, and they do not get committed to disk, >> we run into issues where we could lose a lot of data if something were to >> crash. >> > > > The 90 second timeout is the regionserver timing out against > zookeeper? Or is it something else? > > Storefiles are on the filesystem so what do you mean by the above fear > of their not being committed to disk? > > >> I have created screen shots of or monitoring application for HBase which >> shows the spikes in activity. >> >> http://twitpic.com/photos/jeremy_carroll >> > > > Nice pictures. > > 30k storefiles is a good number. They will go up as you are doing a > bulk load as the compactor is probably overrun. HBase will usually > catch up though especially after the upload completes. > > Do you have compression enabled? > > I see regions growing steadily rather than spiking as the comment on > the graph says. 500 regions ain't too many... > > How many servers in your cluster? > > St.Ack > > >> >> >
