Re: Hour long timeout to ssh/telnet/ftp to down host?
On Wednesday 13 June 2001 05:40, Luigi Genoni wrote: > On Tue, 12 Jun 2001, Ben Greear wrote: > > You can tune things by setting the tcp-timeout probably..I don't > > know exactly where to set this.. > > /proc/sys/net/ipv4/tcp_fin_timeout > > default is 60. Never got that far. My problem was actually tcp_syn_retries. Remember, I was talking to a host that was unplugged. (I wasn't even getting "host unreachable" messages, the packets were just disappearing.) The default timeout in that case is rediculous do to the exponentially increasing delays between retries. 10 retries wound up being something like 20 minutes. I set it to 5 and everything works beautifully now. ssh (which retries the connection 4 times, and used to take over an hour to time out) now takes just over 3 minutes, which I can live with. Rob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hour long timeout to ssh/telnet/ftp to down host?
On Tue, 12 Jun 2001, Ben Greear wrote: > Rob Landley wrote: > > > > I have scripts that ssh into large numbers of boxes, which are sometimes > > down. The timeout for figuring out the box is down is over an hour. This is > > just insane. > > > > Telnet and ftp behave similarly, or at least tthey lasted the 5 minutes I was > > willing to wait, anyway. Basically anything that calls connect(). If the > > box doesn't respond in 15 seconds, I want to give up. > > > > Is this a problem with the kernel or with glibc? If it's the kernel, I'd > > expect a /proc entry where I can set this, but I can't seem to find one. Is > > there one? What would be involved in writing one? > > > > You can tune things by setting the tcp-timeout probably..I don't > know exactly where to set this.. /proc/sys/net/ipv4/tcp_fin_timeout default is 60. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hour long timeout to ssh/telnet/ftp to down host?
On Tue, 12 Jun 2001, Ben Greear wrote: Rob Landley wrote: I have scripts that ssh into large numbers of boxes, which are sometimes down. The timeout for figuring out the box is down is over an hour. This is just insane. Telnet and ftp behave similarly, or at least tthey lasted the 5 minutes I was willing to wait, anyway. Basically anything that calls connect(). If the box doesn't respond in 15 seconds, I want to give up. Is this a problem with the kernel or with glibc? If it's the kernel, I'd expect a /proc entry where I can set this, but I can't seem to find one. Is there one? What would be involved in writing one? You can tune things by setting the tcp-timeout probably..I don't know exactly where to set this.. /proc/sys/net/ipv4/tcp_fin_timeout default is 60. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hour long timeout to ssh/telnet/ftp to down host?
On Wednesday 13 June 2001 05:40, Luigi Genoni wrote: On Tue, 12 Jun 2001, Ben Greear wrote: You can tune things by setting the tcp-timeout probably..I don't know exactly where to set this.. /proc/sys/net/ipv4/tcp_fin_timeout default is 60. Never got that far. My problem was actually tcp_syn_retries. Remember, I was talking to a host that was unplugged. (I wasn't even getting host unreachable messages, the packets were just disappearing.) The default timeout in that case is rediculous do to the exponentially increasing delays between retries. 10 retries wound up being something like 20 minutes. I set it to 5 and everything works beautifully now. ssh (which retries the connection 4 times, and used to take over an hour to time out) now takes just over 3 minutes, which I can live with. Rob - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hour long timeout to ssh/telnet/ftp to down host?
>You can tune things by setting the tcp-timeout probably..I don't >know exactly where to set this.. Aha, found it. /proc/sys/net/ipv4/tcp_syn_retries I am a victim of the exponential retry falloff, it would seem. syn_retries of 1 takes a few seconds, 3 takes less than half a minute, and 5 takes several minutes. The default value of 10 is what's giving me the problem (something like 20 minutes to time out, according to my earlier tests.) Then the fact that ssh then re-attempts the connection four times before actually failing is where I got my hour and change timeout. ("ssh -v -v -v" comes in handy...) Fun. Can we change the default value for this to something more sane, like 5? Exponential falloff is not good when your order of magnitude hits double digits. > You probably don't want all tcp to time out at 15 seconds anyway, so Just connection initiation. (If their ip stack hasn't replied to me by then, I doubt it's going to.) > I'd suggest either using non-blocking connect (if you have the code > that does the connect), or just set a timer (or use sigalarm) when you > start the attempt, and fail the attempt if the timer or alarm signal > goes off. Except I'm using off-the-shelf ssh. (I asked them about this problem a month ago, and there was some discussion of a workaround on their mailing list, but 2.9 came out and still had the same behavior. Apparently, if it's not a problem in OpenBSD, it's not a problem in OpenSSH...) > > If it's glibc I'm probably better off writing a wrapper to ping the > > destination before trying to connect, or killing the connection after a > > timeout with no traffic. But both of those are really ugly solutions. > > Ugly is relative, and don't use ping because there is still a race > condition (ping worked, but by the time you try tcp, the box is down.) Yeah, but it would eventually time out and recover, I've got ten threads out querying boxes, that's a really rare race condition. And I already acknowledged it was ugly. :) So the problem is just that tcp_syn_retries' default value of 10 is way too high due to the exponentially increasing gap between each retry. Rob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hour long timeout to ssh/telnet/ftp to down host?
Rob Landley wrote: > > I have scripts that ssh into large numbers of boxes, which are sometimes > down. The timeout for figuring out the box is down is over an hour. This is > just insane. > > Telnet and ftp behave similarly, or at least tthey lasted the 5 minutes I was > willing to wait, anyway. Basically anything that calls connect(). If the > box doesn't respond in 15 seconds, I want to give up. > > Is this a problem with the kernel or with glibc? If it's the kernel, I'd > expect a /proc entry where I can set this, but I can't seem to find one. Is > there one? What would be involved in writing one? > You can tune things by setting the tcp-timeout probably..I don't know exactly where to set this.. You probably don't want all tcp to time out at 15 seconds anyway, so I'd suggest either using non-blocking connect (if you have the code that does the connect), or just set a timer (or use sigalarm) when you start the attempt, and fail the attempt if the timer or alarm signal goes off. > If it's glibc I'm probably better off writing a wrapper to ping the > destination before trying to connect, or killing the connection after a > timeout with no traffic. But both of those are really ugly solutions. Ugly is relative, and don't use ping because there is still a race condition (ping worked, but by the time you try tcp, the box is down.) > > Anybody have any light to shed on the situation? > > Rob > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Ben Greear <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Hour long timeout to ssh/telnet/ftp to down host?
I have scripts that ssh into large numbers of boxes, which are sometimes down. The timeout for figuring out the box is down is over an hour. This is just insane. Telnet and ftp behave similarly, or at least tthey lasted the 5 minutes I was willing to wait, anyway. Basically anything that calls connect(). If the box doesn't respond in 15 seconds, I want to give up. Is this a problem with the kernel or with glibc? If it's the kernel, I'd expect a /proc entry where I can set this, but I can't seem to find one. Is there one? What would be involved in writing one? If it's glibc I'm probably better off writing a wrapper to ping the destination before trying to connect, or killing the connection after a timeout with no traffic. But both of those are really ugly solutions. Anybody have any light to shed on the situation? Rob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Hour long timeout to ssh/telnet/ftp to down host?
I have scripts that ssh into large numbers of boxes, which are sometimes down. The timeout for figuring out the box is down is over an hour. This is just insane. Telnet and ftp behave similarly, or at least tthey lasted the 5 minutes I was willing to wait, anyway. Basically anything that calls connect(). If the box doesn't respond in 15 seconds, I want to give up. Is this a problem with the kernel or with glibc? If it's the kernel, I'd expect a /proc entry where I can set this, but I can't seem to find one. Is there one? What would be involved in writing one? If it's glibc I'm probably better off writing a wrapper to ping the destination before trying to connect, or killing the connection after a timeout with no traffic. But both of those are really ugly solutions. Anybody have any light to shed on the situation? Rob - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hour long timeout to ssh/telnet/ftp to down host?
Rob Landley wrote: I have scripts that ssh into large numbers of boxes, which are sometimes down. The timeout for figuring out the box is down is over an hour. This is just insane. Telnet and ftp behave similarly, or at least tthey lasted the 5 minutes I was willing to wait, anyway. Basically anything that calls connect(). If the box doesn't respond in 15 seconds, I want to give up. Is this a problem with the kernel or with glibc? If it's the kernel, I'd expect a /proc entry where I can set this, but I can't seem to find one. Is there one? What would be involved in writing one? You can tune things by setting the tcp-timeout probably..I don't know exactly where to set this.. You probably don't want all tcp to time out at 15 seconds anyway, so I'd suggest either using non-blocking connect (if you have the code that does the connect), or just set a timer (or use sigalarm) when you start the attempt, and fail the attempt if the timer or alarm signal goes off. If it's glibc I'm probably better off writing a wrapper to ping the destination before trying to connect, or killing the connection after a timeout with no traffic. But both of those are really ugly solutions. Ugly is relative, and don't use ping because there is still a race condition (ping worked, but by the time you try tcp, the box is down.) Anybody have any light to shed on the situation? Rob - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Ben Greear [EMAIL PROTECTED] [EMAIL PROTECTED] President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hour long timeout to ssh/telnet/ftp to down host?
You can tune things by setting the tcp-timeout probably..I don't know exactly where to set this.. Aha, found it. /proc/sys/net/ipv4/tcp_syn_retries I am a victim of the exponential retry falloff, it would seem. syn_retries of 1 takes a few seconds, 3 takes less than half a minute, and 5 takes several minutes. The default value of 10 is what's giving me the problem (something like 20 minutes to time out, according to my earlier tests.) Then the fact that ssh then re-attempts the connection four times before actually failing is where I got my hour and change timeout. (ssh -v -v -v comes in handy...) Fun. Can we change the default value for this to something more sane, like 5? Exponential falloff is not good when your order of magnitude hits double digits. You probably don't want all tcp to time out at 15 seconds anyway, so Just connection initiation. (If their ip stack hasn't replied to me by then, I doubt it's going to.) I'd suggest either using non-blocking connect (if you have the code that does the connect), or just set a timer (or use sigalarm) when you start the attempt, and fail the attempt if the timer or alarm signal goes off. Except I'm using off-the-shelf ssh. (I asked them about this problem a month ago, and there was some discussion of a workaround on their mailing list, but 2.9 came out and still had the same behavior. Apparently, if it's not a problem in OpenBSD, it's not a problem in OpenSSH...) If it's glibc I'm probably better off writing a wrapper to ping the destination before trying to connect, or killing the connection after a timeout with no traffic. But both of those are really ugly solutions. Ugly is relative, and don't use ping because there is still a race condition (ping worked, but by the time you try tcp, the box is down.) Yeah, but it would eventually time out and recover, I've got ten threads out querying boxes, that's a really rare race condition. And I already acknowledged it was ugly. :) So the problem is just that tcp_syn_retries' default value of 10 is way too high due to the exponentially increasing gap between each retry. Rob - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/