Thanks again for the information. We're implementing it now.

Just one last question (at least for a bit :-)

If we bump up our dfs.datanode.max.xcievers from 4k to 8k what should we watch 
for in terms of exhausting any system resources? 

We have the Heap Sizes set to:

* DataNode -Xmx2000m
* TaskTracker -Xmx2000m
* RegionServer to -Xmx4000m
* m1.xlarge EC2 Instances with 14GB of RAM.

I'm thinking about removing the TaskTrackers and use non RegionServer based 
instances for running just TaskTrackers when we need to do Map/Reduce.

Just wondering what I should be monitoring or tweaking since the Datanode could 
be doubling the number of Threads its running...

Thanks!
Rob

On Sep 27, 2011, at 12:50 PM, Jean-Daniel Cryans wrote:

> On Tue, Sep 27, 2011 at 12:31 PM, Robert J Berger <[email protected]> wrote:
>> Its not enough. We're still having errors and it caused a regionserver to 
>> shutdown again. No data loss but degraded service (Yay for robustness!)
> 
> Yeah, just up those xcievers.
> 
>>> 
>> I tend to be "conservative" (was going to say cowardly)  towards our HBase 
>> cluster since its the persistent core of our application. So I'm going to 
>> not worry about growing the hfile size on this system.
>>> 
>> Not really, its cause we are so far behind the release cycle. We're still on 
>> HBase 0.20.3. I'm pretty sure much of our problems now would be relieved by 
>> both getting caught up to ether CDHx or latest production Apache release.
> 
> Online merge won't work with 0.20.3 anyways :)
> 
>> 
>> Plus incorporating latest best practices in the design of the next version 
>> to avoid these problems, using different EC2 instance types, disk system 
>> layout, etc (I'll be posting some questions about this soon, would like to 
>> have a discussion on such best practices for our class of HBase cluster).
> 
> Cool.
> 
>> 
>> Ok, Just to clarify since I muddied the water also asking about 
>> hbase.hregion.max.filesize:
>> 
>> If I increase the dfs.datanode.max.xcievers, can I do it on one machine at a 
>> time and only have one datanode down at time ?
>> Or do I need to bring the whole cluster down and update the 
>> dfs.datanode.max.xcievers value and bring it back up?
>> If I can do it a machine at a time, do I have to do it to the 
>> namenode/master machine as well?
> 
> You can roll restart DNs, NN doesn't need to be restarted
> 
>> 
>> Ok, I'm not going to do that for this cluster... We have way too many tables 
>> and its too scary :-)
> 
> You could aim for the ones that grow the most.
> 
>> 
>> I shouldn't have said rolling, I meant the idea of just manually doing the 
>> update of the dfs.datanode.max.xcievers values and restarting one datanode 
>> at a time.
>> We can't use that cool graceful_shutdown option since we're on such an 
>> ancient version of hbase. (another reason I'm itching to upgrade)
>> 
>> But would the hbase rolling restart help, don't we really need to restart 
>> the hdfs system for the dfs.datanode.max.xcievers  change to take place?
> 
> Well if you want to set the max filesize by default for new tables
> higher, you'll need to restart HBase. If not, then don't.
> 
>>> 
>>> For that change to take effect on the new tables, I think only the
>>> master would need to be bounced.
>> I presume you are referring to the hbase.hregion.max.filesize changes (which 
>> I'm not going to do right now) would just need the hbase master to be 
>> bounced?
> 
> Ya.

__________________
Robert J Berger - CTO
Runa Inc.
+1 408-838-8896
http://blog.ibd.com



Reply via email to