Context:  we're still on .89 - so we can't take advantage of the MemStore 
allocation buffers yet.  One of the most important metrics for us was GC-stuck 
region servers, and more nodes + more memory + scheduling periodic cluster 
restarts helped in our situation.  I wholeheartedly agree with the goal of 
constant uptime, but that was an operations approach we took during some rocky 
times that helped keep things "un-interesting" with the cluster (in a good way).

Because the GC pauses would flare up in write-heavy environments (per Todd's 
analysis), this seemed to hit us at the worst possible time (e.g., during an 
index re-built and during a split, which would lead to inconsistent metadata, 
etc.)   We are in a happy place now, and we're always looking to make it 
better, but those are some "obvious but not so obvious" points on how we got 
here.  And don't have too many column families.  



-----Original Message-----
From: Andrew Purtell [mailto:[email protected]] 
Sent: Wednesday, April 13, 2011 1:51 PM
To: [email protected]
Cc: Robert Gonzalez
Subject: RE: HBase is not ready for Primetime

Hi Doug,

> 3) Cluster restart
> 
> We schedule a full shutdown and restart of our cluster each week.  
> It's pretty quick, and HBase just seems happier when we do this.

Can you say a bit more about how HBase is happier versus not?

I can speculate on a number of reasons why this may be the case, but in general 
we should take the view that if the OS has 1000 days of uptime etc. so should 
HBase, and work toward that goal. (Unless the JVM just gets in our way... but 
so far we have not clearly identified an intractable case.)

Best regards,

    - Andy



      

Reply via email to