Re: unreliable connections
* Stuart Henderson s...@spacehopper.org [2014-01-27 13:18]: On 2014/01/26 14:53, Chris Smith wrote: On Thu, Jan 16, 2014 at 8:26 PM, Stuart Henderson s...@spacehopper.org wrote: This could be an MTU or RWIN-related issue. Could my issue have anything to with the miscounting bug for inbound with pf on mentioned in the following commit? CVSROOT:/cvs Module name:src Changes by: henn...@cvs.openbsd.org 2014/01/23 16:51:29 Modified files: sys/net: if_bridge.c pf.c sys/netinet: ip_input.c ip_output.c ip_var.h tcp_input.c tcp_var.h udp_usrreq.c udp_var.h sys/netinet6 : ip6_output.c Log message: since the cksum rewrite the counters for hardware checksummed packets are are lie, since the software engine emulates hardware offloading and that is later indistinguishable. so kill the hw cksummed counters. introduce software checksummed packet counters instead. tcp/udp handles ip ipvshit, ip cksum covered, 6 has no ip layer cksum. as before we still have a miscounting bug for inbound with pf on, to be fixed in the next step. found by, prodding ok naddy And if so was the next step taken and is this miscounting bug fixed? No this is just counting for statistics. and the next step has been taken right after. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services GmbH, http://bsws.de, Full-Service ISP Secure Hosting, Mail and DNS Services. Dedicated Servers, Root to Fully Managed Henning Brauer Consulting, http://henningbrauer.com/
Re: unreliable connections
* Chris Smith obsd_m...@chrissmith.org [2014-03-17 23:41]: I think the source of this reported problem has been found, and happily fixed (the preliminary results are promising). Basically I needed to find some way to get the backups to complete reliably so I started a 20 count ping job a minute before the rsync job (actually an rsnapshot job which connected twice) which did allow the backup both backup connections to work (where previously just the second one connected reliably). In checking the logs for the backup status, the stats from the ping job were also there and these logs showed some dup ping packets on a fairly regular basis as well as some non-answers. As I was then able to get the same inconsistent ping results from the gateway itself (the inside address of the cable modem) I asked the ISP (Comcast) to replace the cable modem. They were fine with that suggestion and the replacement went in today, and I am so far not able to reproduce the inconsistent ping results to any of the /29 address, including the gateway. I'll know for sure once I stop the ping job and the backups still run reliably. that sounds like arp problems, namely very slowarp resolution. I've seen that before, it was very obvious some L2 gear was to blame, but details escaped me by now. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services GmbH, http://bsws.de, Full-Service ISP Secure Hosting, Mail and DNS Services. Dedicated Servers, Root to Fully Managed Henning Brauer Consulting, http://henningbrauer.com/
Re: unreliable connections
I think the source of this reported problem has been found, and happily fixed (the preliminary results are promising). Basically I needed to find some way to get the backups to complete reliably so I started a 20 count ping job a minute before the rsync job (actually an rsnapshot job which connected twice) which did allow the backup both backup connections to work (where previously just the second one connected reliably). In checking the logs for the backup status, the stats from the ping job were also there and these logs showed some dup ping packets on a fairly regular basis as well as some non-answers. As I was then able to get the same inconsistent ping results from the gateway itself (the inside address of the cable modem) I asked the ISP (Comcast) to replace the cable modem. They were fine with that suggestion and the replacement went in today, and I am so far not able to reproduce the inconsistent ping results to any of the /29 address, including the gateway. I'll know for sure once I stop the ping job and the backups still run reliably. Thanks to all, Chris
Re: unreliable connections
On 2014/01/26 14:53, Chris Smith wrote: On Thu, Jan 16, 2014 at 8:26 PM, Stuart Henderson s...@spacehopper.org wrote: This could be an MTU or RWIN-related issue. Could my issue have anything to with the miscounting bug for inbound with pf on mentioned in the following commit? CVSROOT:/cvs Module name:src Changes by: henn...@cvs.openbsd.org 2014/01/23 16:51:29 Modified files: sys/net: if_bridge.c pf.c sys/netinet: ip_input.c ip_output.c ip_var.h tcp_input.c tcp_var.h udp_usrreq.c udp_var.h sys/netinet6 : ip6_output.c Log message: since the cksum rewrite the counters for hardware checksummed packets are are lie, since the software engine emulates hardware offloading and that is later indistinguishable. so kill the hw cksummed counters. introduce software checksummed packet counters instead. tcp/udp handles ip ipvshit, ip cksum covered, 6 has no ip layer cksum. as before we still have a miscounting bug for inbound with pf on, to be fixed in the next step. found by, prodding ok naddy And if so was the next step taken and is this miscounting bug fixed? No this is just counting for statistics. Also recently in an attempt to keep a box at -current there occurred a kernel/userland mismatch that caused pf not to load on reboot after installing the kernel (everything was fine after building userland). I'm fairly certain trying to bring a box dated OpenBSD 5.4-current (GENERIC.MP) #5: Wed Jan 1 14:21:45 EST 2014 will have the same issue. If I attempt to do this remotely will I still be able to shell in in order to update userland (even though with no pf there is no nat and therefore access to/from the inside network will not be possible) after rebooting into the new kernel? Or might it be safe to build userland before rebooting into the new kernel? Thank you, Chris See Upgrading without install kernel in the faq's upgrade guide. Note the specific order to untar sets. This isn't enough for the 5.4--current flag day upgrade (which requires rebuilding the password database with new pwd_mkdb, but new pwd_mkdb won't run until you're on a new kernel), but since you have already passed that step, you should be ok. Obviously the install kernel method is safer if you can manage it.
Re: unreliable connections
On Thu, Jan 16, 2014 at 8:26 PM, Stuart Henderson s...@spacehopper.org wrote: This could be an MTU or RWIN-related issue. Could my issue have anything to with the miscounting bug for inbound with pf on mentioned in the following commit? CVSROOT:/cvs Module name:src Changes by: henn...@cvs.openbsd.org 2014/01/23 16:51:29 Modified files: sys/net: if_bridge.c pf.c sys/netinet: ip_input.c ip_output.c ip_var.h tcp_input.c tcp_var.h udp_usrreq.c udp_var.h sys/netinet6 : ip6_output.c Log message: since the cksum rewrite the counters for hardware checksummed packets are are lie, since the software engine emulates hardware offloading and that is later indistinguishable. so kill the hw cksummed counters. introduce software checksummed packet counters instead. tcp/udp handles ip ipvshit, ip cksum covered, 6 has no ip layer cksum. as before we still have a miscounting bug for inbound with pf on, to be fixed in the next step. found by, prodding ok naddy And if so was the next step taken and is this miscounting bug fixed? Also recently in an attempt to keep a box at -current there occurred a kernel/userland mismatch that caused pf not to load on reboot after installing the kernel (everything was fine after building userland). I'm fairly certain trying to bring a box dated OpenBSD 5.4-current (GENERIC.MP) #5: Wed Jan 1 14:21:45 EST 2014 will have the same issue. If I attempt to do this remotely will I still be able to shell in in order to update userland (even though with no pf there is no nat and therefore access to/from the inside network will not be possible) after rebooting into the new kernel? Or might it be safe to build userland before rebooting into the new kernel? Thank you, Chris
Re: unreliable connections
On Mon, Jan 20, 2014 at 11:31 AM, Chris Smith obsd_m...@chrissmith.org wrote: have moved the block all to the beginning of the ruleset to see if it will make any difference Unfortunately no difference. The attempt to rsync the first directory failed last night, second one worked fine. Any other ideas? Thanks, Chris
Re: unreliable connections
Hello, I would suggest a DNS problem. Do you rsync directly to an ip address or are you using avec domain name ? That would explain why the first only is failing and not the second one. The DNS server you use may have some problems during the night. If you don't use a domain name, this can't be this. If you use one, you can add it to /etc/hosts to by-pass it. If this continue to fail, the problem is elsewhere. I have been monitoring some public dns servers of ISP (with smokeping) and some of them were unrealiable during the night. Regards De: Chris SmithEnvoyé: mercredi 22 janvier 2014 16:23À: Stuart HendersonCc: OpenBSD-MiscObjet: Re: unreliable connections On Mon, Jan 20, 2014 at 11:31 AM, Chris Smith obsd_m...@chrissmith.org wrote: have moved the block all to the beginning of the ruleset to see if it will make any difference Unfortunately no difference. The attempt to rsync the first directory failed last night, second one worked fine. Any other ideas? Thanks, Chris
Re: unreliable connections
On Wed, Jan 22, 2014 at 12:56 PM, Charles RAPENNE char...@bsd.zplay.euwrote: Do you rsync directly to an ip address or are you using avec domain name ? Not DNS - directly to IP address. Thanks, Chris
Re: unreliable connections
On Thu, Jan 16, 2014 at 8:26 PM, Stuart Henderson s...@spacehopper.org wrote: Posting the firewall ruleset may possibly help people diagnose this in more detail. Here's some pertinent pf.conf info: === set skip on { lo enc0 } set block-policy drop set reassemble yes no-df set limit { table-entries 50, tables 50, states 128000, src-nodes 3000, frags 4000 } set loginterface none block all pass in quick on $ext_if inet proto tcp from any to $ext_if port ssh === Originally I had the pass in quick before the block all but changed this around to test the theory. Yes, the rdr for rsync and rdp are not shown but the same problem randomly occurs (and just did) with a direct ssh to the box itself (no forwarding or nat needed). And to other OpenBSD firewall/routers I manage there are no issues, either with a direct shell in or with redirects to inside boxes (but they are not as up-to-date as the one that fails). Chris
Re: unreliable connections
On Thu, Jan 16, 2014 at 8:26 PM, Stuart Henderson s...@spacehopper.org wrote: This could be an MTU or RWIN-related issue. One common problem is if the firewall state is created from an already-established connection rather than a SYN packet, in this case the firewall can't keep track of the RWIN value which is set by many modern OS, and needed in order for a stateful firewall to track the conection Makes sense but there are no other connections between said devices when the problem occurs. Connections only occur when I manually attempt to connect to the firewall itself (ssh), or an inside system, plus a cron job that runs at approx. 4am for an rsync backup. The manual attempts are only bothersome because such random failures don't happen with any other of my remote networks and I can easily re-try (which so far has always worked). The backup is another story as some data doesn't get backed up when the random failure occurs. The rsync cron job does attempt to backup two different directories and so connects twice; it's only the first attempt that fails (no matter which one I attempt to backup first) and so acts just like a failed ssh attempt to the firewall itself - a new attempt immediately after always works. To avoid the risk of this I usually start pf rulesets with block log I do use a block all very near the beginning of the ruleset, although I generally put a 'pass in quick for ssh' before the block all to make sure I never make a change that prevents a remote shell in. I can see that the leading 'pass in' isn't all that necessary and have moved the block all to the beginning of the ruleset to see if it will make any difference. Thank you, Chris
Re: unreliable connections
This issue is still with me. Sporadically the connection will fail, and a connection attempt immediately after the failure will (so far) always work. Again the problem is only with this one remote firewall, all of the others are fine. the hardware is virtually identical, similar versions of the Supermicro 5015A boxes. Also note that said problem box was used in another location with an older version of OpenBSD without said issues. It's possible the ISP's cable modem might be to blame but I'd like to have something to go on before I point that finger. Could really use some ideas on how to troubleshoot this. Chris On Sun, Dec 29, 2013 at 9:56 PM, Chris Smith obsd_m...@chrissmith.org wrote: I'm having a problem connecting with (and through) one OpenBSD box. Both ends are running OpenBSD -current (-current as of last weekend) and I've had the issue through a couple of months of various builds of -current. The problem occurs whether I'm connecting directly to the remote OpenBSD box (firewall) or connecting through it via a redirect to an inside box. The connections attempts are all coming from a Linux box inside my network (and i'm running a recent -current as my firewall), and connections to and through several other remote OpenBSD boxes (although not running a recent -current) all work 100% of the time. With the problem box sometimes the connection never completes. After the failed connection attempt subsequent connection attempts work fine, it's only after some time that the problem may arise again. For example if I attempt to ssh to the problem box I'm greeted with a blank line: $ ssh problem_box And after some minutes, I'l finally receive this: ssh_exchange_identification: read: Connection timed out From another terminal I can then shell in (whether or not I kill the first attempt). The connection states reported are (all address have been munged): my local firewall: all tcp 51.213.211.197:22 - 172.25.12.66:44291 ESTABLISHED:ESTABLISHED all tcp 76.112.133.216:54348 (172.25.12.66:44291) - 51.213.211.197:22 ESTABLISHED:ESTABLISHED all tcp 51.213.211.197:22 - 172.25.12.66:44292 ESTABLISHED:ESTABLISHED all tcp 76.112.133.216:58306 (172.25.12.66:44292) - 51.213.211.197:22 ESTABLISHED:ESTABLISHED the remote firewall: all tcp 51.213.211.197:22 - 76.112.133.216:54348 SYN_SENT:ESTABLISHED all tcp 51.213.211.197:22 - 76.112.133.216:58306 ESTABLISHED:ESTABLISHED The hung connection is the SYN_SENT:ESTABLISHED one and it stays that way for some time, although my local firewall believes it to be established. I've seen the same issue with an RDP connection to an inside Windows box via a redirect. Sometimes the first attempt will not connect, if I kill it and try again, voila, it works. The critical part is that my rsync backup to an internal box fails about every third night due to this issue. As I rsync two different paths (one and then the other) on the remote daemon the first path will fail sporadically, the second path always completes. Have none of these issues with other accounts (but as mentioned the OpenBSD versions on those firewalls are a bit older). Any assistance on resolving this would be much appreciated. Thank you, Chris
Re: unreliable connections
On 2014-01-16, Chris Smith obsd_m...@chrissmith.org wrote: This issue is still with me. Sporadically the connection will fail, and a connection attempt immediately after the failure will (so far) always work. Again the problem is only with this one remote firewall, all of the others are fine. the hardware is virtually identical, similar versions of the Supermicro 5015A boxes. Also note that said problem box was used in another location with an older version of OpenBSD without said issues. This could be an MTU or RWIN-related issue. One common problem is if the firewall state is created from an already-established connection rather than a SYN packet, in this case the firewall can't keep track of the RWIN value which is set by many modern OS, and needed in order for a stateful firewall to track the conection To avoid the risk of this I usually start pf rulesets with block log (*not* 'block in log', etc) just to make sure that no packets are passed by the implicit default rule (which is basically pass all flags any no state) which takes effect if no other rules match. Posting the firewall ruleset may possibly help people diagnose this in more detail.
unreliable connections
I'm having a problem connecting with (and through) one OpenBSD box. Both ends are running OpenBSD -current (-current as of last weekend) and I've had the issue through a couple of months of various builds of -current. The problem occurs whether I'm connecting directly to the remote OpenBSD box (firewall) or connecting through it via a redirect to an inside box. The connections attempts are all coming from a Linux box inside my network (and i'm running a recent -current as my firewall), and connections to and through several other remote OpenBSD boxes (although not running a recent -current) all work 100% of the time. With the problem box sometimes the connection never completes. After the failed connection attempt subsequent connection attempts work fine, it's only after some time that the problem may arise again. For example if I attempt to ssh to the problem box I'm greeted with a blank line: $ ssh problem_box And after some minutes, I'l finally receive this: ssh_exchange_identification: read: Connection timed out From another terminal I can then shell in (whether or not I kill the first attempt). The connection states reported are (all address have been munged): my local firewall: all tcp 51.213.211.197:22 - 172.25.12.66:44291 ESTABLISHED:ESTABLISHED all tcp 76.112.133.216:54348 (172.25.12.66:44291) - 51.213.211.197:22 ESTABLISHED:ESTABLISHED all tcp 51.213.211.197:22 - 172.25.12.66:44292 ESTABLISHED:ESTABLISHED all tcp 76.112.133.216:58306 (172.25.12.66:44292) - 51.213.211.197:22 ESTABLISHED:ESTABLISHED the remote firewall: all tcp 51.213.211.197:22 - 76.112.133.216:54348 SYN_SENT:ESTABLISHED all tcp 51.213.211.197:22 - 76.112.133.216:58306 ESTABLISHED:ESTABLISHED The hung connection is the SYN_SENT:ESTABLISHED one and it stays that way for some time, although my local firewall believes it to be established. I've seen the same issue with an RDP connection to an inside Windows box via a redirect. Sometimes the first attempt will not connect, if I kill it and try again, voila, it works. The critical part is that my rsync backup to an internal box fails about every third night due to this issue. As I rsync two different paths (one and then the other) on the remote daemon the first path will fail sporadically, the second path always completes. Have none of these issues with other accounts (but as mentioned the OpenBSD versions on those firewalls are a bit older). Any assistance on resolving this would be much appreciated. Thank you, Chris