Re: Issue with StoreFiles with bulk import.

Stack Fri, 13 Aug 2010 08:18:12 -0700

On Fri, Aug 13, 2010 at 8:00 AM, Jeremy Carroll
<[email protected]> wrote:
> To speak to the CompactionQueue being overrun. What we are seeing is that 
> after the dust has settled, and all of the smaller sequence files have been 
> imported, the CompactionQueue is 0. It does not even try to compact. The only 
> way to get the storeFiles down is to run a major compaction. We have 
> hbase.hstore.blockingStoreFiles=15. So why is it not compacting a region with 
> 300 storeFiles?
>



If you don't mind, post a good sample from your regionserver logs --
one where you are seeing the 300 storefiles per region -- and your
hbase-site.xml as well as a dump of your table schema.  You can mail
offlist if you would prefer.

St.Ack


________________________________________
> From: [email protected] [[email protected]] On Behalf Of Stack 
> [[email protected]]
> Sent: Friday, August 13, 2010 12:33 AM
> To: [email protected]
> Subject: Re: Issue with StoreFiles with bulk import.
>
> On Thu, Aug 12, 2010 at 1:13 PM, Jeremy Carroll
> <[email protected]> wrote:
>> I'm currently importing some files into HBase and am running into an problem 
>> with a large number of store files being created.
>
> Where you see this Jeremy?  In the UI?  What kinda numbers are you seeing?
>
>
>> We have some back data which is stored in very large sequence files (3-5 Gb 
>> in size). When we import this data the amount of stores created does not get 
>> out of hand.
>
>
> So when you mapreduce using these big files as source and insert into
> hbase, its not an issue?
>
>
>> When we switch to smaller sequence files being imported we see that the 
>> number of stores rises quite dramatically.
>
>
> Why you need to change?
>
>
>> I do not know if this is happening because we are flushing the commits more 
>> frequently with smaller files.
>
> Probably.  Have you tinkered with hbase default settings in any way?
>
> Perhaps you are getting better parallelism when lots of small files to
> chomp on?  More concurrent maps/clients?  So rate of upload goes up?
>
>
>> I'm wondering if anybody has any advice regarding this issue. My main 
>> concern is during this process we do not finish flushing to disk (And we set 
>> WritetoWal False). We always hit the 90 second timeout due to heavy write 
>> load. As these store files pile up, and they do not get committed to disk, 
>> we run into issues where we could lose a lot of data if something were to 
>> crash.
>>
>
>
> The 90 second timeout is the regionserver timing out against
> zookeeper?  Or is it something else?
>
> Storefiles are on the filesystem so what do you mean by the above fear
> of their not being committed to disk?
>
>
>> I have created screen shots of or monitoring application for HBase which 
>> shows the spikes in activity.
>>
>> http://twitpic.com/photos/jeremy_carroll
>>
>
>
> Nice pictures.
>
> 30k storefiles is a good number.  They will go up as you are doing a
> bulk load as the compactor is probably overrun.   HBase will usually
> catch up though especially after the upload completes.
>
> Do you have compression enabled?
>
> I see regions growing steadily rather than spiking as the comment on
> the graph says.  500 regions ain't too many...
>
> How many servers in your cluster?
>
> St.Ack
>
>
>>
>>
>

Re: Issue with StoreFiles with bulk import.

Reply via email to