Re: unreliable connections

2014-04-01 Thread Henning Brauer
* Stuart Henderson s...@spacehopper.org [2014-01-27 13:18]:
 On 2014/01/26 14:53, Chris Smith wrote:
  On Thu, Jan 16, 2014 at 8:26 PM, Stuart Henderson s...@spacehopper.org 
  wrote:
   This could be an MTU or RWIN-related issue.
  
  Could my issue have anything to with the miscounting bug for inbound
  with pf on mentioned in the following commit?
  
  CVSROOT:/cvs
  Module name:src
  Changes by: henn...@cvs.openbsd.org 2014/01/23 16:51:29
  
  Modified files:
  sys/net: if_bridge.c pf.c
  sys/netinet: ip_input.c ip_output.c ip_var.h tcp_input.c
   tcp_var.h udp_usrreq.c udp_var.h
  sys/netinet6   : ip6_output.c
  
  Log message:
  since the cksum rewrite the counters for hardware checksummed packets
  are are lie, since the software engine emulates hardware offloading
  and that is later indistinguishable. so kill the hw cksummed counters.
  introduce software checksummed packet counters instead.
  tcp/udp handles ip  ipvshit, ip cksum covered, 6 has no ip layer cksum.
  as before we still have a miscounting bug for inbound with pf on, to be
  fixed in the next step.
  found by, prodding  ok naddy
  
  
  And if so was the next step taken and is this miscounting bug fixed?
 
 No this is just counting for statistics.

and the next step has been taken right after.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services GmbH, http://bsws.de, Full-Service ISP
Secure Hosting, Mail and DNS Services. Dedicated Servers, Root to Fully Managed
Henning Brauer Consulting, http://henningbrauer.com/



Re: unreliable connections

2014-04-01 Thread Henning Brauer
* Chris Smith obsd_m...@chrissmith.org [2014-03-17 23:41]:
 I think the source of this reported problem has been found, and
 happily fixed (the preliminary results are promising).
 
 Basically I needed to find some way to get the backups to complete
 reliably so I started a 20 count ping job a minute before the rsync
 job (actually an rsnapshot job which connected twice) which did allow
 the backup both backup connections to work (where previously just the
 second one connected reliably). In checking the logs for the backup
 status, the stats from the ping job were also there and these logs
 showed some dup ping packets on a fairly regular basis as well as some
 non-answers. As I was then able to get the same inconsistent ping
 results from the gateway itself (the inside address of the cable
 modem) I asked the ISP (Comcast) to replace the cable modem. They were
 fine with that suggestion and the replacement went in today, and I am
 so far not able to reproduce the inconsistent ping results to any of
 the /29 address, including the gateway. I'll know for sure once I stop
 the ping job and the backups still run reliably.

that sounds like arp problems, namely very slowarp resolution. I've
seen that before, it was very obvious some L2 gear was to blame, but
details escaped me by now.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services GmbH, http://bsws.de, Full-Service ISP
Secure Hosting, Mail and DNS Services. Dedicated Servers, Root to Fully Managed
Henning Brauer Consulting, http://henningbrauer.com/



Re: unreliable connections

2014-03-17 Thread Chris Smith
I think the source of this reported problem has been found, and
happily fixed (the preliminary results are promising).

Basically I needed to find some way to get the backups to complete
reliably so I started a 20 count ping job a minute before the rsync
job (actually an rsnapshot job which connected twice) which did allow
the backup both backup connections to work (where previously just the
second one connected reliably). In checking the logs for the backup
status, the stats from the ping job were also there and these logs
showed some dup ping packets on a fairly regular basis as well as some
non-answers. As I was then able to get the same inconsistent ping
results from the gateway itself (the inside address of the cable
modem) I asked the ISP (Comcast) to replace the cable modem. They were
fine with that suggestion and the replacement went in today, and I am
so far not able to reproduce the inconsistent ping results to any of
the /29 address, including the gateway. I'll know for sure once I stop
the ping job and the backups still run reliably.

Thanks to all,

Chris



Re: unreliable connections

2014-01-27 Thread Stuart Henderson
On 2014/01/26 14:53, Chris Smith wrote:
 On Thu, Jan 16, 2014 at 8:26 PM, Stuart Henderson s...@spacehopper.org 
 wrote:
  This could be an MTU or RWIN-related issue.
 
 Could my issue have anything to with the miscounting bug for inbound
 with pf on mentioned in the following commit?
 
 CVSROOT:/cvs
 Module name:src
 Changes by: henn...@cvs.openbsd.org 2014/01/23 16:51:29
 
 Modified files:
 sys/net: if_bridge.c pf.c
 sys/netinet: ip_input.c ip_output.c ip_var.h tcp_input.c
  tcp_var.h udp_usrreq.c udp_var.h
 sys/netinet6   : ip6_output.c
 
 Log message:
 since the cksum rewrite the counters for hardware checksummed packets
 are are lie, since the software engine emulates hardware offloading
 and that is later indistinguishable. so kill the hw cksummed counters.
 introduce software checksummed packet counters instead.
 tcp/udp handles ip  ipvshit, ip cksum covered, 6 has no ip layer cksum.
 as before we still have a miscounting bug for inbound with pf on, to be
 fixed in the next step.
 found by, prodding  ok naddy
 
 
 And if so was the next step taken and is this miscounting bug fixed?

No this is just counting for statistics.

 Also recently in an attempt to keep a box at -current there occurred a
 kernel/userland mismatch that caused pf not to load on reboot after
 installing the kernel (everything was fine after building userland).
 I'm fairly certain trying to bring a box dated OpenBSD 5.4-current
 (GENERIC.MP) #5: Wed Jan  1 14:21:45 EST 2014 will have the same
 issue. If I attempt to do this remotely will I still be able to shell
 in in order to update userland (even though with no pf there is no nat
 and therefore access to/from the inside network will not be possible)
 after rebooting into the new kernel? Or might it be safe to build
 userland before rebooting into the new kernel?
 
 Thank you,
 
 Chris

See Upgrading without install kernel in the faq's upgrade guide.
Note the specific order to untar sets. This isn't enough for the
5.4--current flag day upgrade (which requires rebuilding the
password database with new pwd_mkdb, but new pwd_mkdb won't
run until you're on a new kernel), but since you have already
passed that step, you should be ok.

Obviously the install kernel method is safer if you can manage it.



Re: unreliable connections

2014-01-26 Thread Chris Smith
On Thu, Jan 16, 2014 at 8:26 PM, Stuart Henderson s...@spacehopper.org wrote:
 This could be an MTU or RWIN-related issue.

Could my issue have anything to with the miscounting bug for inbound
with pf on mentioned in the following commit?

CVSROOT:/cvs
Module name:src
Changes by: henn...@cvs.openbsd.org 2014/01/23 16:51:29

Modified files:
sys/net: if_bridge.c pf.c
sys/netinet: ip_input.c ip_output.c ip_var.h tcp_input.c
 tcp_var.h udp_usrreq.c udp_var.h
sys/netinet6   : ip6_output.c

Log message:
since the cksum rewrite the counters for hardware checksummed packets
are are lie, since the software engine emulates hardware offloading
and that is later indistinguishable. so kill the hw cksummed counters.
introduce software checksummed packet counters instead.
tcp/udp handles ip  ipvshit, ip cksum covered, 6 has no ip layer cksum.
as before we still have a miscounting bug for inbound with pf on, to be
fixed in the next step.
found by, prodding  ok naddy


And if so was the next step taken and is this miscounting bug fixed?

Also recently in an attempt to keep a box at -current there occurred a
kernel/userland mismatch that caused pf not to load on reboot after
installing the kernel (everything was fine after building userland).
I'm fairly certain trying to bring a box dated OpenBSD 5.4-current
(GENERIC.MP) #5: Wed Jan  1 14:21:45 EST 2014 will have the same
issue. If I attempt to do this remotely will I still be able to shell
in in order to update userland (even though with no pf there is no nat
and therefore access to/from the inside network will not be possible)
after rebooting into the new kernel? Or might it be safe to build
userland before rebooting into the new kernel?

Thank you,

Chris



Re: unreliable connections

2014-01-22 Thread Chris Smith
On Mon, Jan 20, 2014 at 11:31 AM, Chris Smith obsd_m...@chrissmith.org wrote:
 have moved the block all to the beginning of the ruleset to see if
 it will make any difference

Unfortunately no difference. The attempt to rsync the first directory
failed last night, second one worked fine.

Any other ideas?

Thanks,

Chris



Re: unreliable connections

2014-01-22 Thread Charles RAPENNE
Hello, I would suggest a DNS problem.
Do you rsync directly to an ip address or are you using avec domain name
?  That would explain why the first only is failing and not the second
one.
The DNS server you use may have some problems during the night.
If you don't use a domain name, this can't be this. If you use one, you
can add it to /etc/hosts to by-pass it. If this continue to fail, the
problem is elsewhere.
I have been monitoring some public dns servers of ISP (with smokeping)
and some of them were unrealiable during the night. 
Regards

De: Chris SmithEnvoyé: mercredi 22 janvier 2014 16:23À: Stuart HendersonCc:
OpenBSD-MiscObjet: Re: unreliable connections

On Mon, Jan 20, 2014 at 11:31 AM, Chris Smith obsd_m...@chrissmith.org
wrote:
 have moved the block all to the beginning of the ruleset to see if
 it will make any difference

Unfortunately no difference. The attempt to rsync the first directory
failed last night, second one worked fine.

Any other ideas?

Thanks,

Chris



Re: unreliable connections

2014-01-22 Thread Chris Smith
On Wed, Jan 22, 2014 at 12:56 PM, Charles RAPENNE char...@bsd.zplay.euwrote:

 Do you rsync directly to an ip address or are you using avec domain name ?


Not DNS - directly to IP address.

Thanks,

Chris



Re: unreliable connections

2014-01-22 Thread Chris Smith
On Thu, Jan 16, 2014 at 8:26 PM, Stuart Henderson s...@spacehopper.org wrote:
 Posting the firewall ruleset may possibly help people diagnose this in more 
 detail.

Here's some pertinent pf.conf info:
===
set skip on { lo enc0 }
set block-policy drop
set reassemble yes no-df
set limit { table-entries 50, tables 50, states 128000, src-nodes
3000, frags 4000 }
set loginterface none

block all
pass in quick on $ext_if inet proto tcp from any to $ext_if port ssh
===

Originally I had the pass in quick before the block all but
changed this around to test the theory.

Yes, the rdr for rsync and rdp are not shown but the same problem
randomly occurs (and just did) with a direct ssh to the box itself (no
forwarding or nat needed). And to other OpenBSD firewall/routers I
manage there are no issues, either with a direct shell in or with
redirects to inside boxes (but they are not as up-to-date as the one
that fails).

Chris



Re: unreliable connections

2014-01-20 Thread Chris Smith
On Thu, Jan 16, 2014 at 8:26 PM, Stuart Henderson s...@spacehopper.org wrote:
 This could be an MTU or RWIN-related issue. One common problem is if the
 firewall state is created from an already-established connection rather
 than a SYN packet, in this case the firewall can't keep track of the
 RWIN value which is set by many modern OS, and needed in order for a
 stateful firewall to track the conection

Makes sense but there are no other connections between said devices
when the problem occurs. Connections only occur when I manually
attempt to connect to the firewall itself (ssh), or an inside system,
plus a cron job that runs at approx. 4am for an rsync backup. The
manual attempts are only bothersome because such random failures don't
happen with any other of my remote networks and I can easily re-try
(which so far has always worked). The backup is another story as some
data doesn't get backed up when the random failure occurs. The rsync
cron job does attempt to backup two different directories and so
connects twice; it's only the first attempt that fails (no matter
which one I attempt to backup first) and so acts just like a failed
ssh attempt to the firewall itself - a new attempt immediately after
always works.

 To avoid the risk of this I usually start pf rulesets with block log

I do use a block all very near the beginning of the ruleset,
although I generally put a 'pass in quick for ssh' before the block
all to make sure I never make a change that prevents a remote shell
in. I can see that the leading 'pass in' isn't all that necessary and
have moved the block all to the beginning of the ruleset to see if
it will make any difference.

Thank you,

Chris



Re: unreliable connections

2014-01-16 Thread Chris Smith
This issue is still with me. Sporadically the connection will fail,
and a connection attempt immediately after the failure will (so far)
always work. Again the problem is only with this one remote firewall,
all of the others are fine. the hardware is virtually identical,
similar versions of the Supermicro 5015A boxes. Also note that said
problem box was used in another location with an older version of
OpenBSD without said issues.

It's possible the ISP's cable modem might be to blame but I'd like to
have something to go on before I point that finger.

Could really use some ideas on how to troubleshoot this.

Chris

On Sun, Dec 29, 2013 at 9:56 PM, Chris Smith obsd_m...@chrissmith.org wrote:
 I'm having a problem connecting with (and through) one OpenBSD box.
 Both ends are running OpenBSD -current (-current as of last weekend)
 and I've had the issue through a couple of months of various builds of
 -current.

 The problem occurs whether I'm connecting directly to the remote
 OpenBSD box (firewall) or connecting through it via a redirect to an
 inside box.

 The connections attempts are all coming from a Linux box inside my
 network (and i'm running a recent -current as my firewall), and
 connections to and through several other remote OpenBSD boxes
 (although not running a recent -current) all work 100% of the time.

 With the problem box sometimes the connection never completes. After
 the failed connection attempt subsequent connection attempts work
 fine, it's only after some time that the problem may arise again.

 For example if I attempt to ssh to the problem box I'm greeted with a
 blank line:
 
 $ ssh problem_box

 

 And after some minutes, I'l finally receive this:
 
 ssh_exchange_identification: read: Connection timed out
 

 From another terminal I can then shell in (whether or not I kill the
 first attempt). The connection states reported are (all address have
 been munged):
 my local firewall:
 
 all tcp 51.213.211.197:22 - 172.25.12.66:44291   ESTABLISHED:ESTABLISHED
 all tcp 76.112.133.216:54348 (172.25.12.66:44291) - 51.213.211.197:22
   ESTABLISHED:ESTABLISHED
 all tcp 51.213.211.197:22 - 172.25.12.66:44292   ESTABLISHED:ESTABLISHED
 all tcp 76.112.133.216:58306 (172.25.12.66:44292) - 51.213.211.197:22
   ESTABLISHED:ESTABLISHED
 

 the remote firewall:
 
 all tcp 51.213.211.197:22 - 76.112.133.216:54348   SYN_SENT:ESTABLISHED
 all tcp 51.213.211.197:22 - 76.112.133.216:58306   
 ESTABLISHED:ESTABLISHED
 

 The hung connection is the SYN_SENT:ESTABLISHED one and it stays
 that way for some time, although my local firewall believes it to be
 established.

 I've seen the same issue with an RDP connection to an inside Windows
 box via a redirect. Sometimes the first attempt will not connect, if I
 kill it and try again, voila, it works.

 The critical part is that my rsync backup to an internal box fails
 about every third night due to this issue. As I rsync two different
 paths (one and then the other) on the remote daemon the first path
 will fail sporadically, the second path always completes. Have none of
 these issues with other accounts (but as mentioned the OpenBSD
 versions on those firewalls are a bit older).

 Any assistance on resolving this would be much appreciated.

 Thank you,

 Chris



Re: unreliable connections

2014-01-16 Thread Stuart Henderson
On 2014-01-16, Chris Smith obsd_m...@chrissmith.org wrote:
 This issue is still with me. Sporadically the connection will fail,
 and a connection attempt immediately after the failure will (so far)
 always work. Again the problem is only with this one remote firewall,
 all of the others are fine. the hardware is virtually identical,
 similar versions of the Supermicro 5015A boxes. Also note that said
 problem box was used in another location with an older version of
 OpenBSD without said issues.

This could be an MTU or RWIN-related issue. One common problem is if the
firewall state is created from an already-established connection rather
than a SYN packet, in this case the firewall can't keep track of the
RWIN value which is set by many modern OS, and needed in order for a
stateful firewall to track the conection

To avoid the risk of this I usually start pf rulesets with block log
(*not* 'block in log', etc) just to make sure that no packets are passed by
the implicit default rule (which is basically pass all flags any no state)
which takes effect if no other rules match.

Posting the firewall ruleset may possibly help people diagnose this in more 
detail.



unreliable connections

2013-12-29 Thread Chris Smith
I'm having a problem connecting with (and through) one OpenBSD box.
Both ends are running OpenBSD -current (-current as of last weekend)
and I've had the issue through a couple of months of various builds of
-current.

The problem occurs whether I'm connecting directly to the remote
OpenBSD box (firewall) or connecting through it via a redirect to an
inside box.

The connections attempts are all coming from a Linux box inside my
network (and i'm running a recent -current as my firewall), and
connections to and through several other remote OpenBSD boxes
(although not running a recent -current) all work 100% of the time.

With the problem box sometimes the connection never completes. After
the failed connection attempt subsequent connection attempts work
fine, it's only after some time that the problem may arise again.

For example if I attempt to ssh to the problem box I'm greeted with a
blank line:

$ ssh problem_box



And after some minutes, I'l finally receive this:

ssh_exchange_identification: read: Connection timed out


From another terminal I can then shell in (whether or not I kill the
first attempt). The connection states reported are (all address have
been munged):
my local firewall:

all tcp 51.213.211.197:22 - 172.25.12.66:44291   ESTABLISHED:ESTABLISHED
all tcp 76.112.133.216:54348 (172.25.12.66:44291) - 51.213.211.197:22
  ESTABLISHED:ESTABLISHED
all tcp 51.213.211.197:22 - 172.25.12.66:44292   ESTABLISHED:ESTABLISHED
all tcp 76.112.133.216:58306 (172.25.12.66:44292) - 51.213.211.197:22
  ESTABLISHED:ESTABLISHED


the remote firewall:

all tcp 51.213.211.197:22 - 76.112.133.216:54348   SYN_SENT:ESTABLISHED
all tcp 51.213.211.197:22 - 76.112.133.216:58306   ESTABLISHED:ESTABLISHED


The hung connection is the SYN_SENT:ESTABLISHED one and it stays
that way for some time, although my local firewall believes it to be
established.

I've seen the same issue with an RDP connection to an inside Windows
box via a redirect. Sometimes the first attempt will not connect, if I
kill it and try again, voila, it works.

The critical part is that my rsync backup to an internal box fails
about every third night due to this issue. As I rsync two different
paths (one and then the other) on the remote daemon the first path
will fail sporadically, the second path always completes. Have none of
these issues with other accounts (but as mentioned the OpenBSD
versions on those firewalls are a bit older).

Any assistance on resolving this would be much appreciated.

Thank you,

Chris