InInline On Sep 27, 2011, at 9:58 AM, Jean-Daniel Cryans wrote:
> Inline. > > J-D > > On Mon, Sep 26, 2011 at 11:47 PM, Robert J Berger <[email protected]> wrote: >> Hi, >> I saw a posting from a year or so ago where Andy said he was running >> datanodes with xcievers set to 10k >> >> Is there any problem doing that? We were running with a 9 >> regionserver/datanode cluster and started getting >> java.io.IOException: xceiverCount 4097 exceeds the limit of concurrent >> xcievers 4096 > > Ouch. > >> >> We had more than 4k regions on most of the nodes. > > Ah that explains. > >> >> We added 4 more nodes but still getting some of the xceiverCount errors >> every once in a while (I think from when we are doing backups via the HBase >> export job). > > Depending on the number of nodes you have, it might not have been enough. Its not enough. We're still having errors and it caused a regionserver to shutdown again. No data loss but degraded service (Yay for robustness!) > >> >> We're not doing any of the things that could be done to reduce the number of >> files and regions. > > If you feel brave, you could try > https://issues.apache.org/jira/browse/HBASE-1621 I tend to be "conservative" (was going to say cowardly) towards our HBase cluster since its the persistent core of our application. So I'm going to not worry about growing the hfile size on this system. > >> We generally don't have a heave read/write load across the cluster, but >> we've been doing more hadoop and backup export jobs than we use to... >> >> We're working on moving to a more modern cluster so I don't want to invest >> much effort in the current one, but have to keep it running till we cut over. > > I hope it's not because of this xciever problem. Not really, its cause we are so far behind the release cycle. We're still on HBase 0.20.3. I'm pretty sure much of our problems now would be relieved by both getting caught up to ether CDHx or latest production Apache release. Plus incorporating latest best practices in the design of the next version to avoid these problems, using different EC2 instance types, disk system layout, etc (I'll be posting some questions about this soon, would like to have a discussion on such best practices for our class of HBase cluster). > >> >> Is there any problem to bump up the dfs.datanode.max.xcievers to like 8k or >> even 10k? > > It really just means that the DNs will keep at most 8k or 10k threads > open at max. As long as you have enough native threads it will hold > on. > Ok, Just to clarify since I muddied the water also asking about hbase.hregion.max.filesize: If I increase the dfs.datanode.max.xcievers, can I do it on one machine at a time and only have one datanode down at time ? Or do I need to bring the whole cluster down and update the dfs.datanode.max.xcievers value and bring it back up? If I can do it a machine at a time, do I have to do it to the namenode/master machine as well? >> >> Can we / should we increase the hbase.hregion.max.filesize to something more >> than 256M on a running system? > > You'd have to change it on the existing tables for the change to take > effect. This means disabling the table, doing an alter, re-enabling. > But it won't merge. > >> Ok, I'm not going to do that for this cluster... We have way too many tables and its too scary :-) >> I presume to change ether we would need to restart the regionservers. Can we >> do it as a rolling restart where we change one regionserver at a time and >> bring one online before updating the next? Or does it have to be updated as >> a cluster? > > Previous answer related... also yes you can roll restart: > http://hbase.apache.org/book/node.management.html#rolling I shouldn't have said rolling, I meant the idea of just manually doing the update of the dfs.datanode.max.xcievers values and restarting one datanode at a time. We can't use that cool graceful_shutdown option since we're on such an ancient version of hbase. (another reason I'm itching to upgrade) But would the hbase rolling restart help, don't we really need to restart the hdfs system for the dfs.datanode.max.xcievers change to take place? > > For that change to take effect on the new tables, I think only the > master would need to be bounced. I presume you are referring to the hbase.hregion.max.filesize changes (which I'm not going to do right now) would just need the hbase master to be bounced? Thanks again! __________________ Robert J Berger - CTO Runa Inc. +1 408-838-8896 http://blog.ibd.com
