I did not try the tcp_tw_reuse, because I am a bit fuzzy on what that exactly does. That is, what is the difference between recycling and reusing?
The man page for tcp just mentions that tcp_tw_reuse allows to reuse TIME_WAIT sockets if it is safe from the protocol viewpoint and that it should not be set without consulting a technical expert (which I do not consider myself to be when it comes to TCP implementation internals). So I wouldn't know how it is determined that something is safe from the protocol viewpoint. Many Linux/UNIX faqs online mostly tell me exactly the same. I do see some recommendations to set tcp_tw_reuse for heavily loaded web servers, but the problem here is running out of client connections, not server sockets. For recycling I expect that the socket in TIME_WAIT is just considered closed and a new connection is built originating from the source port of the socket previously in TW. I would also expect that when a packet from the previous connection leaks through into the new connection that that would lead to a reset, because the sequence number would be wrong (right?). We are definitely not behind at NAT router (actually, at the RIPE NCC all machines have actual public IPs, although I never understood why). All of my assumptions might be wrong, but as I mentioned I am no expert on the subject. I was also lead by the fact that the report for one of the hbase-2492 related issues actually mentions tcp_tw_recycle as a possible solution in the comments. We are not in production yet, so for now we have room to experiment. Nonetheless, if tcp_tw_reuse is actually safer (and I understand why), then I would like to use that instead. (And yes, can't wait for HDFS-941 some day) Friso On Jun 15, 2010, at 6:54 PM, Todd Lipcon wrote: > Might be worth trying tcp_tw_reuse before turning on tw_recycle - as I > understand it, the former is a lot safer than the latter. > > Can't wait for HDFS-941 some day :) > > -Todd > > On Tue, Jun 15, 2010 at 9:10 AM, Jean-Daniel Cryans > <[email protected]>wrote: > >> Friso, >> >> This is very interesting, and nobody answered probably because no one >> tried tcp_tw_recycle. I personally didn't even know about that config >> until a few minutes ago ;) >> >> So from the varnish mailing list, it seems that machines behind >> firewalls or NAT won't play well with that config, but I don't expect >> anyone running a cluster with that kind of setup... unless they are >> doing cross-DC or whatnot. >> http://www.mail-archive.com/[email protected]/msg02912.html >> >> Good stuff! >> >> J-D >> >> On Mon, Jun 14, 2010 at 11:40 PM, Friso van Vollenhoven >> <[email protected]> wrote: >>> Hi all, >>> >>> Since I got no replies to my previous message (see below), I went ahead >> and set the tcp_tw_recycle to true. This worked like a charm. The number of >> sockets in TIME_WAIT went down from many thousands to just a couple (tens). >> Apparently, once set to true, the recycling happens quite eagerly. Most >> importantly, the regionservers no longer shut down (which was the goal). I >> am sharing the info here, just in case it might help someone sometime. >>> >>> >>> Cheers, >>> Friso >>> >>> >>> >>> On Jun 11, 2010, at 11:55 AM, Friso van Vollenhoven wrote: >>> >>>> Hi all, >>>> We are experiencing a lot of "java.net.BindException: Cannot assign >> requested address", which is a case of >> https://issues.apache.org/jira/browse/hbase-2492. At some point, all >> grinds to a halt and regionservers start to shut down. >>>> >>>> I was wondering if anyone has found a way around this problem (other >> than adding more machines to spread the load or reduce the work load). Has >> anyone been able to successfully apply the patch in >> https://issues.apache.org/jira/browse/HDFS-941 to 0.20.2? Or does anyone >> have experience with setting the /proc/sys/net/ipv4/tcp_tw_recycle to 1 >> (true) at the OS level? >>>> >>>> We are running HBase 0.20.4-2524, r941433 and Hadoop 0.20.2. >>>> >>>> Any experiences that anyone can share are greatly appreciated. >>>> >>>> >>>> Best regards, >>>> Friso >>>> >>> >>> >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera
