Thanks again for the information. We're implementing it now. Just one last question (at least for a bit :-)
If we bump up our dfs.datanode.max.xcievers from 4k to 8k what should we watch for in terms of exhausting any system resources? We have the Heap Sizes set to: * DataNode -Xmx2000m * TaskTracker -Xmx2000m * RegionServer to -Xmx4000m * m1.xlarge EC2 Instances with 14GB of RAM. I'm thinking about removing the TaskTrackers and use non RegionServer based instances for running just TaskTrackers when we need to do Map/Reduce. Just wondering what I should be monitoring or tweaking since the Datanode could be doubling the number of Threads its running... Thanks! Rob On Sep 27, 2011, at 12:50 PM, Jean-Daniel Cryans wrote: > On Tue, Sep 27, 2011 at 12:31 PM, Robert J Berger <[email protected]> wrote: >> Its not enough. We're still having errors and it caused a regionserver to >> shutdown again. No data loss but degraded service (Yay for robustness!) > > Yeah, just up those xcievers. > >>> >> I tend to be "conservative" (was going to say cowardly) towards our HBase >> cluster since its the persistent core of our application. So I'm going to >> not worry about growing the hfile size on this system. >>> >> Not really, its cause we are so far behind the release cycle. We're still on >> HBase 0.20.3. I'm pretty sure much of our problems now would be relieved by >> both getting caught up to ether CDHx or latest production Apache release. > > Online merge won't work with 0.20.3 anyways :) > >> >> Plus incorporating latest best practices in the design of the next version >> to avoid these problems, using different EC2 instance types, disk system >> layout, etc (I'll be posting some questions about this soon, would like to >> have a discussion on such best practices for our class of HBase cluster). > > Cool. > >> >> Ok, Just to clarify since I muddied the water also asking about >> hbase.hregion.max.filesize: >> >> If I increase the dfs.datanode.max.xcievers, can I do it on one machine at a >> time and only have one datanode down at time ? >> Or do I need to bring the whole cluster down and update the >> dfs.datanode.max.xcievers value and bring it back up? >> If I can do it a machine at a time, do I have to do it to the >> namenode/master machine as well? > > You can roll restart DNs, NN doesn't need to be restarted > >> >> Ok, I'm not going to do that for this cluster... We have way too many tables >> and its too scary :-) > > You could aim for the ones that grow the most. > >> >> I shouldn't have said rolling, I meant the idea of just manually doing the >> update of the dfs.datanode.max.xcievers values and restarting one datanode >> at a time. >> We can't use that cool graceful_shutdown option since we're on such an >> ancient version of hbase. (another reason I'm itching to upgrade) >> >> But would the hbase rolling restart help, don't we really need to restart >> the hdfs system for the dfs.datanode.max.xcievers change to take place? > > Well if you want to set the max filesize by default for new tables > higher, you'll need to restart HBase. If not, then don't. > >>> >>> For that change to take effect on the new tables, I think only the >>> master would need to be bounced. >> I presume you are referring to the hbase.hregion.max.filesize changes (which >> I'm not going to do right now) would just need the hbase master to be >> bounced? > > Ya. __________________ Robert J Berger - CTO Runa Inc. +1 408-838-8896 http://blog.ibd.com
