RE: Issue with StoreFiles with bulk import.

Jeremy Carroll Fri, 13 Aug 2010 08:06:27 -0700

To speak to the CompactionQueue being overrun. What we are seeing is that after 
the dust has settled, and all of the smaller sequence files have been imported, 
the CompactionQueue is 0. It does not even try to compact. The only way to get 
the storeFiles down is to run a major compaction. We have 
hbase.hstore.blockingStoreFiles=15. So why is it not compacting a region with 
300 storeFiles?
________________________________________
From: [email protected] [[email protected]] On Behalf Of Stack 
[[email protected]]
Sent: Friday, August 13, 2010 12:33 AM
To: [email protected]
Subject: Re: Issue with StoreFiles with bulk import.

On Thu, Aug 12, 2010 at 1:13 PM, Jeremy Carroll
<[email protected]> wrote:
> I'm currently importing some files into HBase and am running into an problem 
> with a large number of store files being created.

Where you see this Jeremy?  In the UI?  What kinda numbers are you seeing?

> We have some back data which is stored in very large sequence files (3-5 Gb 
> in size). When we import this data the amount of stores created does not get 
> out of hand.

So when you mapreduce using these big files as source and insert into
hbase, its not an issue?

> When we switch to smaller sequence files being imported we see that the 
> number of stores rises quite dramatically.

Why you need to change?

> I do not know if this is happening because we are flushing the commits more 
> frequently with smaller files.

Probably.  Have you tinkered with hbase default settings in any way?

Perhaps you are getting better parallelism when lots of small files to
chomp on?  More concurrent maps/clients?  So rate of upload goes up?

> I'm wondering if anybody has any advice regarding this issue. My main concern 
> is during this process we do not finish flushing to disk (And we set 
> WritetoWal False). We always hit the 90 second timeout due to heavy write 
> load. As these store files pile up, and they do not get committed to disk, we 
> run into issues where we could lose a lot of data if something were to crash.
>

The 90 second timeout is the regionserver timing out against
zookeeper?  Or is it something else?

Storefiles are on the filesystem so what do you mean by the above fear
of their not being committed to disk?

> I have created screen shots of or monitoring application for HBase which 
> shows the spikes in activity.
>
> http://twitpic.com/photos/jeremy_carroll
>

Nice pictures.

30k storefiles is a good number.  They will go up as you are doing a
bulk load as the compactor is probably overrun.   HBase will usually
catch up though especially after the upload completes.

Do you have compression enabled?

I see regions growing steadily rather than spiking as the comment on
the graph says.  500 regions ain't too many...

How many servers in your cluster?

St.Ack

>
>

RE: Issue with StoreFiles with bulk import.

Reply via email to