Re: NAT reliability in light of recent checksum changes

2015-06-15 Thread Richard Procter
On 7/03/2014, at 2:15 PM, Richard Procter wrote:
 
 I've some ideas about solutions [for modifying checksums more cleanly] but 
 will
 leave those for another email.

Shifting this old thread to tech@: I've posted a patch that re-instates 
the pf algorithm of OpenBSD 5.4 for preserving payload checksums end-to-end
but rewritten without the ugly and error-prone (but speedy!) code and 
aiming to have no significant impact on performance. 

best, 
Richard. 



Re: NAT reliability in light of recent checksum changes

2014-03-06 Thread Richard Procter
On 27/02/2014, at 11:04 AM, Theo de Raadt wrote:
 
 There was a method of converting an in-bound checksum, due to NAT
 conversion, into a new out-bound checksum.  A process is required,
 it's how NAT works.
 
 A new method of version is being used.  It is mathematically equivelant
 to the old method.

First, I agree with Theo that modifying a checksum is
mathematically equivalent to regenerating it; both give the same
result on ideal hardware.

Of course, we use checksums because our hardware isn't ideal, so
let's look at how the two approaches differ when a router 
fault occurs.

Take Stuart Henderson's example:

 Consider this scenario, which has happened in real life.
 
 - NIC supports checksum offloading, verified checksum is OK.
 
 - PCI transfers are broken (in my case it affected multiple
 machines of a certain type, so most likely a motherboard bug),
 causing some corruption in the payload, but the machine won't
 detect them because it doesn't look at checksums itself, just
 trusts the NIC's rx csum good flag.
 
 In this situation, packets which have been NATted that are
 corrupt now get a new checksum that is valid; so the final
 endpoint can not detect the breakage.

That is, when the router offloads and regenerates, the router's
egress NIC will hide any card, stack, bus or memory fault a
verified packet suffered in passing through the router when it
regenerates a new checksum from the now corrupt data.

Looking at the code, the relevant functions are
pf.c:pf_check_proto_cksum(), which trusts the ingress NIC's
checksum good flag, and pf.c:pf_cksum(), which zeros the existing
checksum on that basis and flags it to be regenerated by the
egress NIC[1].

By contrast, checksum modification is far more reliable. In order
to hide payload corruption the update code[1] would have to
modify the checksum to exactly account for it. But that would
have to happen by accident --- by a fault that in effect computes
the necessary change --- as the update code never considers the
payload[0]. It's not impossible but, on the other hand,
checksum regeneration guarantees to hide faults in the
regenerating router. 

We conclude that in the typical offloading case, regenerated
checksums, unlike modified ones, cannot detect faults in the
regenerating routers. 

Whether this difference is significant is a matter 
of judgment and a separate issue.

I've some ideas about solutions but will leave those for 
another email.

best, 
Richard. 

PS. I find the following terminology helpful:

Checksums calculated from the origin data are 'original';
checksums calculated from a copy are 'regenerated'.

Checksums may also be 'modified' to account for altered data in
such a way as to preserve originality for any unaltered data[0].

A checksum is 'end-to-end' if it is delivered original with
respect to the payload. A modified checksum may be end-to-end but
never a regenerated checksum as it is not original.

[0] Strikingly, RFC1631 (1994) and RFC3022 (2001), the NAT RFCs, 
fail to say end-to-end preservation is a property of their checksum 
modification algorithm. I presume it just didn't seem worth 
mentioning as, lacking hardware offload back then, one wouldn't 
regenerate in software on performance grounds alone. It is only 
alluded to in RFC1071 (1988) Computing the Internet Checksum, 
which states that a checksum remains end-to-end when modified 
'since it was not fully recomputed'. Although that's still true 
if NAT modifies it, NAT makes the meaning of 'end-to-end' 
more complex; I think my above terminology helps there. 

[1]
I'll quote OpenBSD code here for completeness, contrasting 
modification (OpenBSD 5.3) with regeneration (OpenBSD 5.4) 

OpenBSD 5.3 NAT modified the checksum as follows: 

--- pf.c 1.818 (OPENBSD_5_3) --- 

Assuming an AF_INET - AF_INET TCP connection. 

pf_test_rule()
3862: pf_translate()
  3881: pf_change_ap()  [ src addr/port ]
 1671: PF_ACPY  [ = pf_addrcpy() ] 
 1689: pf_cksum_fixup(...) 
   [ 
 psuedo code is: 
 sum = fixup(sum, addr16[1]) 
 sum = fixup(sum, addr16[0]) 
 sum = fixup(sum, port) 
   ] 
1662: l = cksum + old - new --- checksum modified 
  [ then presumably account for ones-complement carries ] 
  3887: pf_change_ap() etc [ dst addr/port ] 

On subsequent state matching:  

pf_test()
6788: pf_test_state_tcp() [ for TCP ] 
  4566: pf_change_ap() etc [ for src addr/port ] 
  4574: pf_change_ap() etc [ for dst addr/port ]  

---

OpenBSD 5.4 NAT regenerates checksums as follows: 

--- pf.c 1.863 (post OPENBSD_5_4) --- 

Assuming an AF_INET - AF_INET TCP connection.

On initial rule match: 
pf_test_rule()
3445: pf_translate()
  3707: pf_change_ap()
 1677: PF_ACPY [= pf_addrcpy()] 
3461: pf_cksum()
  6775: pd-hdr.tcp-th_sum = 0; --- checksum zeroed
m-m_pkthdr.csum_flags |= M_TCP_CSUM_OUT --- flagged for recalculation
(if orig checksum good) 

On subsequent state matching: 

Re: NAT reliability in light of recent checksum changes

2014-02-26 Thread Richard Procter
On 24/02/2014, at 9:33 PM, Henning Brauer wrote:

 * Richard Procter richard.n.proc...@gmail.com [2014-01-25 20:41]:
 On 22/01/2014, at 7:19 PM, Henning Brauer wrote:
 * Richard Procter richard.n.proc...@gmail.com [2014-01-22 06:44]:
 This fundamentally weakens its usefulness, though: a correct
 checksum now implies only that the payload likely matches
 what the last NAT router happened to have in its memory
 huh?
 we receive a packet with correct cksum - NAT - packet goes out with
 correct cksum.
 we receive a packet with broken cksum - NAT - we leave the cksum
 alone, i. e. leave it broken.
 Christian said it better than me: routers may corrupt data
 and regenerating the checksum will hide it.
 
 if that happened we had much bigger problems than NAT.

By bigger problems do you mean obvious router stability
issues?  Suppose someone argued that as we'd have obvious
stability issues if unprotected memory was unreliable, ECC
memory is unnecessary. That argument is logically equivalent
to what seems to be yours, that as we'd see obvious
issues if routers were corrupting data, end-to-end
checksums are unnecessary, but I don't buy it.

We know that routers corrupt data. Right now my home
firewall shows 30 TCP segments dropped for bad checksums. As
checks at least as strong are used by every sane link-layer
this virtually implies the dropped packets suffered router
or end-point faults.

Again, it's not just me saying it: ...checksums are used by
higher layers to ensure that data was not corrupted in
intermediate routers or by the sending or receiving host.
The fact that checksums are typically the secondary level of
protection has often led to suggestions that checksums are
superfluous. Hard won experience, however, has shown that
checksums are necessary.  Software errors (such as buffer
mismanagement) and even hardware errors (such as network
adapters with poor DMA hardware that sometimes fail to fully
DMA data) are surprisingly common [let alone memory faults!
RP] and checksums have been very useful in protecting
against such errors.[0]

best, 
Richard. 

[0] Craig Partridge, Jim Hughes, and Jonathan Stone. 1995. 
Performance of checksums and CRCs over real data. SIGCOMM Comput. 
Commun. Rev. 25, 4 (October 1995), 68-76. DOI=10.1145/217391.217413 
http://doi.acm.org/10.1145/217391.217413 page 1 



Re: NAT reliability in light of recent checksum changes

2014-02-26 Thread Theo de Raadt
 On 24/02/2014, at 9:33 PM, Henning Brauer wrote:
 
  * Richard Procter richard.n.proc...@gmail.com [2014-01-25 20:41]:
  On 22/01/2014, at 7:19 PM, Henning Brauer wrote:
  * Richard Procter richard.n.proc...@gmail.com [2014-01-22 06:44]:
  This fundamentally weakens its usefulness, though: a correct
  checksum now implies only that the payload likely matches
  what the last NAT router happened to have in its memory
  huh?
  we receive a packet with correct cksum - NAT - packet goes out with
  correct cksum.
  we receive a packet with broken cksum - NAT - we leave the cksum
  alone, i. e. leave it broken.
  Christian said it better than me: routers may corrupt data
  and regenerating the checksum will hide it.
  
  if that happened we had much bigger problems than NAT.
 
 By bigger problems do you mean obvious router stability
 issues?  Suppose someone argued that as we'd have obvious
 stability issues if unprotected memory was unreliable, ECC
 memory is unnecessary. That argument is logically equivalent
 to what seems to be yours, that as we'd see obvious
 issues if routers were corrupting data, end-to-end
 checksums are unnecessary, but I don't buy it.

What is your solution?

 We know that routers corrupt data. Right now my home
 firewall shows 30 TCP segments dropped for bad checksums. As
 checks at least as strong are used by every sane link-layer
 this virtually implies the dropped packets suffered router
 or end-point faults.

Yes.  And what is your solution?

 Again, it's not just me saying it: ...checksums are used by
 higher layers to ensure that data was not corrupted in
 intermediate routers or by the sending or receiving host.
 The fact that checksums are typically the secondary level of
 protection has often led to suggestions that checksums are
 superfluous. Hard won experience, however, has shown that
 checksums are necessary.  Software errors (such as buffer
 mismanagement) and even hardware errors (such as network
 adapters with poor DMA hardware that sometimes fail to fully
 DMA data) are surprisingly common [let alone memory faults!
 RP] and checksums have been very useful in protecting
 against such errors.[0]

I'll ask again, since you keep just trashing other people's code.  I'm
getting ready to declare you a kook, because I suspect you're going to
suggest we change ethernet header and IP headers or prohibit NAT.



Re: NAT reliability in light of recent checksum changes

2014-02-26 Thread Theo de Raadt
 Again, it's not just me saying it: ...checksums are used by
 higher layers to ensure that data was not corrupted in
 intermediate routers or by the sending or receiving host.
 The fact that checksums are typically the secondary level of
 protection has often led to suggestions that checksums are
 superfluous. Hard won experience, however, has shown that
 checksums are necessary.  Software errors (such as buffer
 mismanagement) and even hardware errors (such as network
 adapters with poor DMA hardware that sometimes fail to fully
 DMA data) are surprisingly common [let alone memory faults!
 RP] and checksums have been very useful in protecting
 against such errors.[0]

Richard, your use of this quote is tantamount to declaring that
Henning has disabled or otherwise gutted checksums.  He has not
disabled checksums.

There was a method of converting an in-bound checksum, due to NAT
conversion, into a new out-bound checksum.  A process is required,
it's how NAT works.

A new method of version is being used.  It is mathematically equivelant
to the old method.

The quote above is about disabling checksums.  Checksums have not
been disabled, in any way.  New checksums are not being invented out
of anyone's ass for old packets.

I believe you are posting cast aspersions on the pf efforts.

Your repeated attempts to false aspersion are only reflecting back
on you.



Re: NAT reliability in light of recent checksum changes

2014-02-26 Thread Richard Procter
On 27/02/2014, at 11:04 AM, Theo de Raadt wrote:

 I believe you are posting cast aspersions on the pf efforts.

Theo, 

I'll insist then that I think pf is a superior piece of code
which I benefit from every day, and that Henning's efforts
to simplify it are so very welcome in a world addicted to
complexity.

My beef is solely with the technique of regenerating
checksums, not the people working on the code. Criticising a
design choice with argument and evidence is not the same as
attacking the designer's integrity or competence and if I
seem to be playing the men and not the ball, it is not my
intent and I apologise.

As to your other points, I will hopefully address them in
another email I have been drafting and should have finished
over the next few days.

best, 
Richard. 



Re: NAT reliability in light of recent checksum changes

2014-02-24 Thread Henning Brauer
* Richard Procter richard.n.proc...@gmail.com [2014-01-25 20:41]:
 On 22/01/2014, at 7:19 PM, Henning Brauer wrote:
  * Richard Procter richard.n.proc...@gmail.com [2014-01-22 06:44]:
  This fundamentally weakens its usefulness, though: a correct
  checksum now implies only that the payload likely matches
  what the last NAT router happened to have in its memory
  huh?
  we receive a packet with correct cksum - NAT - packet goes out with
  correct cksum.
  we receive a packet with broken cksum - NAT - we leave the cksum
  alone, i. e. leave it broken.
 Christian said it better than me: routers may corrupt data
 and regenerating the checksum will hide it.

if that happened we had much bigger problems than NAT.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services GmbH, http://bsws.de, Full-Service ISP
Secure Hosting, Mail and DNS Services. Dedicated Servers, Root to Fully Managed
Henning Brauer Consulting, http://henningbrauer.com/



Re: NAT reliability in light of recent checksum changes

2014-02-24 Thread Henning Brauer
* Geoff Steckel g...@oat.com [2014-01-28 03:20]:
It would be good if when data protected by a checksum is modified,
 the current checksum is validated and some appropriate?

guess what: that is exactly what happens.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services GmbH, http://bsws.de, Full-Service ISP
Secure Hosting, Mail and DNS Services. Dedicated Servers, Root to Fully Managed
Henning Brauer Consulting, http://henningbrauer.com/



Re: NAT reliability in light of recent checksum changes

2014-01-28 Thread Richard Procter
On 28/01/2014, at 4:19 AM, Simon Perreault wrote:

 Le 2014-01-25 14:40, Richard Procter a écrit :
 I'm not saying the calculation is bad. I'm saying it's being
 calculated from the wrong copy of the data and by the wrong
 device. And it's not just me saying it: I'm quoting the guys
 who designed TCP.
 
 Those guys didn't envision NAT.
 
 If you want end-to-end checksum purity, don't do NAT.

Let's look at the options.

The world needs more addresses than IPv4 provides and NAT
gives them to us. There's IPv6, which has about a hundred
billion addresses for every bacteria estimated to live on
the planet[0], but it's not looking to replace IPv4 any time
soon. So NAT is here to stay for a good while longer.

Perhaps I can at least stop using NAT on my own network. In
my case I can't but let's assume I do. This eliminates one
source of error. But my TCP streams may still have
now-undetected one-bit errors (at least) if there may be
routers out there regenerating checksums. As long as there
are, good checksums no longer mean as much by themselves and
if I want at least some assurance the network did its job, I
still need some other way (e.g, checking the network path
contains no such routers, either by inspection or
statistically, or by reimplementing an end-to-end checksum
at a higher layer, etc). Regenerated checksums affect me
whether or not I use NAT myself.

Another option is to always update the checksum as versions
prior to version 5.4 did. It's reasonable to ask, well is
any more reliable than recomputing them as 5.4 does?
That is, can the old update code hide payload corruption,
too?

In order to hide payload corruption the update code would
have to modify the checksum to exactly account for it. But
that would have to happen by accident, as it never considers
the payload. It's not impossible, but, on the other hand,
checksum regeneration guarantees to hide any bad data.
So updates are more reliable.

A lot more reliable, in fact, as you'd require precisely
those memory errors necessary to in effect compute the
correct update, or some freak fault in the ALU that did the
same thing, or some combination of both. And as that has
nothing to do with the update code it is in principle
possible for non-NAT connections, too. For the hardware,
updates are just an extra load/modify/store and so the
chances of a checksum update hiding a corrupted payload are
in practical terms equivalent to those of normal forwarding.

So your statement holds only if checksums are being
regenerated. In general, NAT needn't compromise end-to-end
TCP payload checksum integrity, and in versions prior to
5.4, it didn't.

best, 
Richard. 


[0] Prokaryotes: The unseen majority 
Proc Natl Acad Sci U S A. 1998 June 9; 95(12): 6578–6583.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC33863/

2^128 IPv6 addresses = ~ 10^38 

 ~ 10^38 IPv6 addresses / ~ 10^30 bacteria cells
= 
 ~ 10^8 addresses per cell. 

[1] RFC1071 Computing the Internet Checksum p21 
If anything, [this end-to-end property] is the most powerful 
 feature of the TCP checksum!. Page 15 is also touches on 
 the end-to-end preserving properties of checksum update. 



Re: NAT reliability in light of recent checksum changes

2014-01-28 Thread Simon Perreault

Le 2014-01-27 21:21, Geoff Steckel a écrit :

It would be good if when data protected by a checksum is modified,
the current checksum is validated and some appropriate? action is done
(drop? produce invalid new checksum?) when proceeding.


This is exactly what's being done. Don't you listen when Henning speaks?

Simon



Re: NAT reliability in light of recent checksum changes

2014-01-28 Thread Simon Perreault

Le 2014-01-28 03:39, Richard Procter a écrit :

In order to hide payload corruption the update code would
have to modify the checksum to exactly account for it. But
that would have to happen by accident, as it never considers
the payload. It's not impossible, but, on the other hand,
checksum regeneration guarantees to hide any bad data.
So updates are more reliable.


This analysis is bullshit. You need to take into account the fact that 
checksums are verified before regenerating them. That is, you need to 
compare a) verifying + regenerating vs b) updating. If there's an 
undetectable error, you're going to propagate it no matter whether you 
do a) or b).


Simon



Re: NAT reliability in light of recent checksum changes

2014-01-28 Thread Stuart Henderson
On 2014-01-28, Simon Perreault sperrea...@openbsd.org wrote:
 Le 2014-01-28 03:39, Richard Procter a écrit :
 In order to hide payload corruption the update code would
 have to modify the checksum to exactly account for it. But
 that would have to happen by accident, as it never considers
 the payload. It's not impossible, but, on the other hand,
 checksum regeneration guarantees to hide any bad data.
 So updates are more reliable.

 This analysis is bullshit. You need to take into account the fact that 
 checksums are verified before regenerating them. That is, you need to 
 compare a) verifying + regenerating vs b) updating. If there's an 
 undetectable error, you're going to propagate it no matter whether you 
 do a) or b).

 Simon



Checksums are, in many cases, only verified *on the NIC*.

Consider this scenario, which has happened in real life.

- NIC supports checksum offloading, verified checksum is OK.

- PCI transfers are broken (in my case it affected multiple machines
of a certain type, so most likely a motherboard bug), causing some
corruption in the payload, but the machine won't detect them because
it doesn't look at checksums itself, just trusts the NIC's rx csum
good flag.

In this situation, packets which have been NATted that are corrupt
now get a new checksum that is valid; so the final endpoint can not
detect the breakage.

I'm not sure if this is common enough to be worth worrying about
here, but the analysis is not bullshit.



Re: NAT reliability in light of recent checksum changes

2014-01-28 Thread Giancarlo Razzolini
Em 28-01-2014 15:45, Stuart Henderson escreveu:
 On 2014-01-28, Simon Perreault sperrea...@openbsd.org wrote:
 Le 2014-01-28 03:39, Richard Procter a écrit :
 In order to hide payload corruption the update code would
 have to modify the checksum to exactly account for it. But
 that would have to happen by accident, as it never considers
 the payload. It's not impossible, but, on the other hand,
 checksum regeneration guarantees to hide any bad data.
 So updates are more reliable.
 This analysis is bullshit. You need to take into account the fact that 
 checksums are verified before regenerating them. That is, you need to 
 compare a) verifying + regenerating vs b) updating. If there's an 
 undetectable error, you're going to propagate it no matter whether you 
 do a) or b).

 Simon


 Checksums are, in many cases, only verified *on the NIC*.

 Consider this scenario, which has happened in real life.

 - NIC supports checksum offloading, verified checksum is OK.

 - PCI transfers are broken (in my case it affected multiple machines
 of a certain type, so most likely a motherboard bug), causing some
 corruption in the payload, but the machine won't detect them because
 it doesn't look at checksums itself, just trusts the NIC's rx csum
 good flag.

 In this situation, packets which have been NATted that are corrupt
 now get a new checksum that is valid; so the final endpoint can not
 detect the breakage.

 I'm not sure if this is common enough to be worth worrying about
 here, but the analysis is not bullshit.

Stuart,

It is more common than you might think. I had some gigabit
motherboards in which some models always would corrupt the packets when
using the onboard nic. I believe that in these cases there isn't much
that the OS can do. Unfortunately, it's always the application job to
detect if it is receiving good or bad data.

Cheers,

-- 
Giancarlo Razzolini
GPG: 4096R/77B981BC



Re: NAT reliability in light of recent checksum changes

2014-01-28 Thread Simon Perreault

Le 2014-01-28 12:45, Stuart Henderson a écrit :

This analysis is bullshit. You need to take into account the fact that
checksums are verified before regenerating them. That is, you need to
compare a) verifying + regenerating vs b) updating. If there's an
undetectable error, you're going to propagate it no matter whether you
do a) or b).


Checksums are, in many cases, only verified *on the NIC*.

Consider this scenario, which has happened in real life.

- NIC supports checksum offloading, verified checksum is OK.

- PCI transfers are broken (in my case it affected multiple machines
of a certain type, so most likely a motherboard bug), causing some
corruption in the payload, but the machine won't detect them because
it doesn't look at checksums itself, just trusts the NIC's rx csum
good flag.

In this situation, packets which have been NATted that are corrupt
now get a new checksum that is valid; so the final endpoint can not
detect the breakage.

I'm not sure if this is common enough to be worth worrying about
here, but the analysis is not bullshit.


You're right. I was in the rough, sorry, and thanks for the explanation. 
I don't think this scenario is worth worrying about though.


Simon



Re: NAT reliability in light of recent checksum changes

2014-01-27 Thread Nick Bender
On Mon, Jan 27, 2014 at 8:19 AM, Simon Perreault 
simon.perrea...@viagenie.ca wrote:

 Le 2014-01-25 14:40, Richard Procter a écrit :

  I'm not saying the calculation is bad. I'm saying it's being
 calculated from the wrong copy of the data and by the wrong
 device. And it's not just me saying it: I'm quoting the guys
 who designed TCP.


 Those guys didn't envision NAT.

 If you want end-to-end checksum purity, don't do NAT.

 Simon


Relying on TCP checksums is risky - they are too weak.

I live at the end of a wireless link that starts at around 7K feet
elevation, goes over a 12K foot ridge, lands at my neighbors roof at 10k
feet and then bounces across the street to my house. At one point I was
having lots of issues with data corruption - updates failing, even images
on web pages going technicolor half way through the download. The ISP
ultimately determined there was a bad transmitter and replaced it. The
corruption was so severe that it was overwhelming the TCP checksums to the
point that as far as TCP was concerned it was delivering good data (just
not the same data twice :-). Until they fixed the issue I was able to run a
proxy over ssh which gave me slower but reliable network service.

-N



Re: NAT reliability in light of recent checksum changes

2014-01-27 Thread Simon Perreault

Le 2014-01-25 14:40, Richard Procter a écrit :

I'm not saying the calculation is bad. I'm saying it's being
calculated from the wrong copy of the data and by the wrong
device. And it's not just me saying it: I'm quoting the guys
who designed TCP.


Those guys didn't envision NAT.

If you want end-to-end checksum purity, don't do NAT.

Simon



Re: NAT reliability in light of recent checksum changes

2014-01-27 Thread Giancarlo Razzolini
Em 27-01-2014 14:30, Nick Bender escreveu:
 On Mon, Jan 27, 2014 at 8:19 AM, Simon Perreault 
 simon.perrea...@viagenie.ca wrote:

 Le 2014-01-25 14:40, Richard Procter a écrit :

  I'm not saying the calculation is bad. I'm saying it's being
 calculated from the wrong copy of the data and by the wrong
 device. And it's not just me saying it: I'm quoting the guys
 who designed TCP.

 Those guys didn't envision NAT.

 If you want end-to-end checksum purity, don't do NAT.

 Simon


 Relying on TCP checksums is risky - they are too weak.

 I live at the end of a wireless link that starts at around 7K feet
 elevation, goes over a 12K foot ridge, lands at my neighbors roof at 10k
 feet and then bounces across the street to my house. At one point I was
 having lots of issues with data corruption - updates failing, even images
 on web pages going technicolor half way through the download. The ISP
 ultimately determined there was a bad transmitter and replaced it. The
 corruption was so severe that it was overwhelming the TCP checksums to the
 point that as far as TCP was concerned it was delivering good data (just
 not the same data twice :-). Until they fixed the issue I was able to run a
 proxy over ssh which gave me slower but reliable network service.

 -N

I had the same issue on a different scenario. I traveled to a place
where the internet connection was so slow and so unreliable, that almost
all https handshakes would never complete. And yet checksums had a rate
of almost 60% of them being ok. That's why I always have a VPN server
lying around, to route my traffic to. In my experience, on very
unreliable connections, a UDP vpn, such as openvpn, saves the day. NAT
should (and will) have a very slow and painful death. But, then again,
IPv4 is about to die for more than a decade, and it's still here. I
guess the death will be very, very, very slow.

Cheers,

--
Giancarlo Razzolini
GPG: 4096R/77B981BC



Re: NAT reliability in light of recent checksum changes

2014-01-27 Thread Why 42? The lists account.
FWIW, you don't have to out in the sticks (the backwoods?) to have
a network problem:


http://mina.naguib.ca/blog/2012/10/22/the-little-ssh-that-sometimes-couldnt.html

However, as I understand it, in this case the TCP checksumming worked
and protected the application from the corrupted data.

Cheers,
Robb.



Re: NAT reliability in light of recent checksum changes

2014-01-27 Thread Giancarlo Razzolini
Em 27-01-2014 19:05, Why 42? The lists account. escreveu:
 FWIW, you don't have to out in the sticks (the backwoods?) to have
 a network problem:

 
 http://mina.naguib.ca/blog/2012/10/22/the-little-ssh-that-sometimes-couldnt.html

 However, as I understand it, in this case the TCP checksumming worked
 and protected the application from the corrupted data.

 Cheers,
 Robb.

I wasn't exactly in the woods, but I had a 600Kbps unreliable ADSL
connection that would send the packets. But the latency and corruption
was so severe that TLS handshakes would take too long. And even if
complete, the connection wouldn't sustain itself. Anyway, the UDP vpn
improved things quite a bit. This due, well, to UDP of course, and to
the dynamic compression, reducing the amount of data sent to the wire.

This case you pointed, the TCP checksumming was doing it's job.
Successfully protecting the application. This kind of things, where bits
randomly flip, proves that computer science can be anything but an
EXACT science. That's one of the reasons why the machines will
(hopefully) always need humans.

Cheers,

-- 
Giancarlo Razzolini
GPG: 4096R/77B981BC



Re: NAT reliability in light of recent checksum changes

2014-01-27 Thread Geoff Steckel
On 01/27/2014 08:07 PM, Giancarlo Razzolini wrote:
 Em 27-01-2014 19:05, Why 42? The lists account. escreveu:
 FWIW, you don't have to out in the sticks (the backwoods?) to have
 a network problem:

  
 http://mina.naguib.ca/blog/2012/10/22/the-little-ssh-that-sometimes-couldnt.html

 However, as I understand it, in this case the TCP checksumming worked
 and protected the application from the corrupted data.

 Cheers,
 Robb.

  I wasn't exactly in the woods, but I had a 600Kbps unreliable ADSL
 connection that would send the packets. But the latency and corruption
 was so severe that TLS handshakes would take too long. And even if
 complete, the connection wouldn't sustain itself. Anyway, the UDP vpn
 improved things quite a bit. This due, well, to UDP of course, and to
 the dynamic compression, reducing the amount of data sent to the wire.

  This case you pointed, the TCP checksumming was doing it's job.
 Successfully protecting the application. This kind of things, where bits
 randomly flip, proves that computer science can be anything but an
 EXACT science. That's one of the reasons why the machines will
 (hopefully) always need humans.

 Cheers,

To add to the preceeding...
One client of mine used a CVS repository via coast-to-coast NFS.

Somewhere in the deeps, the UDP checksum was set to 0 (no checksum).
Somewhere else, one bit in each packet was corrupted.

   If the UDP checksum had been present we would have seen the bad
data a lot sooner. We had to go back at least a month, sometimes more,
to find good data, and then recreate all the edits.

   This scenario shows a danger of silently passing corrupt packets.

   It would be good if when data protected by a checksum is modified,
the current checksum is validated and some appropriate? action is done
(drop? produce invalid new checksum?) when proceeding.

Geoff Steckel



Re: NAT reliability in light of recent checksum changes

2014-01-25 Thread Richard Procter
On 22/01/2014, at 7:19 PM, Henning Brauer wrote:

 * Richard Procter richard.n.proc...@gmail.com [2014-01-22 06:44]:
 This fundamentally weakens its usefulness, though: a correct
 checksum now implies only that the payload likely matches
 what the last NAT router happened to have in its memory
 
 huh?
 we receive a packet with correct cksum - NAT - packet goes out with
 correct cksum.
 we receive a packet with broken cksum - NAT - we leave the cksum
 alone, i. e. leave it broken.

Christian said it better than me: routers may corrupt data
and regenerating the checksum will hide it.

That's more than a theoretical concern. The article I
referenced is a detailed study of real-world traces
co-authored by a member of the Stanford distributed systems
group that concludes Probably the strongest message of this
study is that the networking hardware is often trashing the
packets which are entrusted to it[0].

More generally, TCP checksums provide for an acceptable
error rate that is independent of the reliability of the
underlying network[*] by allowing us to verify its workings.
But it's no longer possible to verify network operation if 
it may be regenerating TCP checksums, as these may hide 
network faults. That's a fundamental change from the scheme 
Cerf and Khan emphasized in their design notes for what 
became known as TCP:

The remainder of the packet consists of text for delivery
to the destination and a trailing check sum used for
end-to-end software verification. The GATEWAY does /not/
modify the text and merely forwards the check sum along
without computing or recomputing it.[1]

 It doesn't seem you know what you are talking about. the
 cksum is dead simple, if we had bugs in claculating or
 verifying it, we really had a LOT of other problems.

I'm not saying the calculation is bad. I'm saying it's being
calculated from the wrong copy of the data and by the wrong
device. And it's not just me saying it: I'm quoting the guys 
who designed TCP. 

 There is no undetected error rate, nothing really changes
 there.

I disagree. Every TCP stream containing aribitrary data may
have undetected errors as checksums cannot detect all the
errors networks may make (being shorter than the data they
cover). The engineer's task is to make network errors
reliably negligible in practice.

As network regenerated checksums may hide any amount of
arbitrary data corruption I believe it's correct to say the
network error rate undetected by TCP is then unknown and
unbounded.

best, 
Richard. 

[*] Under reasonable assumptions of the error modes most likely
in practice. And some applications require lower error rates 
than TCP checksums can provide.

[0]
http://conferences.sigcomm.org/sigcomm/2000/conf/paper/sigcomm2000-9-1.pdf

Jonathan Stone and Craig Partridge. 2000. When the CRC and
TCP checksum disagree.  In Proceedings of the conference on
Applications, Technologies, Architectures, and Protocols for
Computer Communication (SIGCOMM '00). ACM, New York, NY,
USA, 309-319.  DOI=10.1145/347059.347561
http://doi.acm.org/10.1145/347059.347561

[1] A Protocol for Packet Network Intercommunication 
V. Cerf, R. Khan, IEEE Trans on Comms, Vol Com-22, No 5 May
1974 Page 3 in original emphasis.



Re: NAT reliability in light of recent checksum changes

2014-01-25 Thread Theo de Raadt
From owner-misc+M137142=deraadt=cvs.openbsd@openbsd.org Sat Jan 25 
12:41:53 2014
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; 
h=content-type:mime-version:subject:from:in-reply-to:date 
:content-transfer-encoding:message-id:references:to; 
bh=utWhBX3niMM2LVtE8mfHlY/ky3wCOdmsmIjoMdLaY5Q=; 
b=EDAHtMzwKNWiAeY56T7Fkl0Q29kOMAMn5QUkTmADQG5qZJ7k9mOWDRnjlN0DLClrDO 
TpAA7OUGMfA55tXh/dEkKgtjb3inl7IMNyhUahJrETz0uHedS9xyZSTKBbDi9zVWfey1 
V3broKdxZP3MA6jmF0aT4jdkaDfC/Hj7UhSX79Qc6zMkr3wZMN6e3sA+31RCnrCj/hwf 
8oDhmqPtNYVGBZMm9hyhX1x/FTp/3Ra6tWzUnDtnKozUq2ZeovgLwG3JjcFooQ5572Ef 
w1uIA4w2em5DRlUSdDtome8dVVewRb25ZeNkPMe8Gul6azVh2zqNNYx7a9b71mLTwGML YXwA==
X-Received: by 10.68.204.4 with SMTP id ku4mr21464025pbc.66.1390678851934; 
Sat, 25 Jan 2014 11:40:51 -0800 (PST)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1085)
Subject: Re: NAT reliability in light of recent checksum changes
From: Richard Procter richard.n.proc...@gmail.com
In-Reply-To: 20140122061907.gk15...@quigon.bsws.de
Date: Sun, 26 Jan 2014 08:40:44 +1300
Content-Transfer-Encoding: 8bit
References: 8d493091-c15d-46a2-8004-32dd59395...@gmail.com 
20140122061907.gk15...@quigon.bsws.de
To: misc@openbsd.org
X-Mailer: Apple Mail (2.1085)
List-Help: mailto:majord...@openbsd.org?body=help
List-ID: misc.openbsd.org
List-Owner: mailto:owner-m...@openbsd.org
List-Post: mailto:misc@openbsd.org
List-Subscribe: mailto:majord...@openbsd.org?body=sub%20misc
List-Unsubscribe: mailto:majord...@openbsd.org?body=unsub%20misc
X-Loop: misc@openbsd.org
Precedence: list
Sender: owner-m...@openbsd.org

On 22/01/2014, at 7:19 PM, Henning Brauer wrote:

 * Richard Procter richard.n.proc...@gmail.com [2014-01-22 06:44]:
 This fundamentally weakens its usefulness, though: a correct
 checksum now implies only that the payload likely matches
 what the last NAT router happened to have in its memory
 
 huh?
 we receive a packet with correct cksum - NAT - packet goes out with
 correct cksum.
 we receive a packet with broken cksum - NAT - we leave the cksum
 alone, i. e. leave it broken.

Christian said it better than me: routers may corrupt data
and regenerating the checksum will hide it.

That's more than a theoretical concern. The article I
referenced is a detailed study of real-world traces
co-authored by a member of the Stanford distributed systems
group that concludes Probably the strongest message of this
study is that the networking hardware is often trashing the
packets which are entrusted to it[0].

More generally, TCP checksums provide for an acceptable
error rate that is independent of the reliability of the
underlying network[*] by allowing us to verify its workings.
But it's no longer possible to verify network operation if 
it may be regenerating TCP checksums, as these may hide 
network faults. That's a fundamental change from the scheme 
Cerf and Khan emphasized in their design notes for what 
became known as TCP:

The remainder of the packet consists of text for delivery
to the destination and a trailing check sum used for
end-to-end software verification. The GATEWAY does /not/
modify the text and merely forwards the check sum along
without computing or recomputing it.[1]

 It doesn't seem you know what you are talking about. the
 cksum is dead simple, if we had bugs in claculating or
 verifying it, we really had a LOT of other problems.

I'm not saying the calculation is bad. I'm saying it's being
calculated from the wrong copy of the data and by the wrong
device. And it's not just me saying it: I'm quoting the guys 
who designed TCP. 

 There is no undetected error rate, nothing really changes
 there.

I disagree. Every TCP stream containing aribitrary data may
have undetected errors as checksums cannot detect all the
errors networks may make (being shorter than the data they
cover). The engineer's task is to make network errors
reliably negligible in practice.

As network regenerated checksums may hide any amount of
arbitrary data corruption I believe it's correct to say the
network error rate undetected by TCP is then unknown and
unbounded.

best, 
Richard. 

[*] Under reasonable assumptions of the error modes most likely
in practice. And some applications require lower error rates 
than TCP checksums can provide.

[0]
http://conferences.sigcomm.org/sigcomm/2000/conf/paper/sigcomm2000-9-1.pdf

Jonathan Stone and Craig Partridge. 2000. When the CRC and
TCP checksum disagree.  In Proceedings of the conference on
Applications, Technologies, Architectures, and Protocols for
Computer Communication (SIGCOMM '00). ACM, New York, NY,
USA, 309-319.  DOI=10.1145/347059.347561
http://doi.acm.org/10.1145/347059.347561

[1] A Protocol for Packet Network Intercommunication 
V. Cerf, R. Khan, IEEE Trans on Comms, Vol Com-22, No 5 May
1974 Page 3 in original emphasis.



Re: NAT reliability in light of recent checksum changes

2014-01-23 Thread Christian Weisgerber
Henning Brauer lists-open...@bsws.de wrote:

  This fundamentally weakens its usefulness, though: a correct
  checksum now implies only that the payload likely matches
  what the last NAT router happened to have in its memory,
  whereas the receiver wants to know whether what it got is
  what was originally transmitted.
 
 we receive a packet with correct cksum - NAT - packet goes out with
 correct cksum.
 we receive a packet with broken cksum - NAT - we leave the cksum
 alone, i. e. leave it broken.

The point Richard may be trying to make is that a packet may be
corrupted in memory on the NAT gateway (e.g. RAM error, buggy code
writing into random location), and that regenerating the checksum
hides such corruption.

-- 
Christian naddy Weisgerber  na...@mips.inka.de



Re: NAT reliability in light of recent checksum changes

2014-01-21 Thread Richard Procter
On 2014-01-15, Stuart Henderson s...@spacehopper.org wrote:
 On 2014-01-14, Richard Procter richard.n.proc...@gmail.com wrote:
 
 I've a question about the new checksum changes. [...] 
 My understanding is that checksums are now always recalculated when
 a header is altered, never updated.
 
 Is that right and if so has this affected NAT reliability? 
 
 Recalculation here would compromise reliable end-to-end transport 
 as the payload checksum no longer covers the entire network path, 
 and so break a basic transport layer design principle.
 
 That is exactly what slides 30-33 talk about. PF now checks
 the incoming packets before it rewrites the checksum, so it can
 reject them if they are broken.

Right -- so NAT now replaces the existing transport checksum
with one newly computed from the payload [0].

This fundamentally weakens its usefulness, though: a correct
checksum now implies only that the payload likely matches
what the last NAT router happened to have in its memory,
whereas the receiver wants to know whether what it got is
what was originally transmitted. In the worst case of NAT on
every intermediate node the transport checksum is
effectively reduced to an adjunct of the link layer
checksum.

This means transport layer payload integrity is no longer
reliant on the quality of the checksum algorithm alone but 
now depends too on the reliability of the path the packet 
took through the network.

I think it's great to see someone working hard to simplify 
crucial code but in light of the above I believe pf should 
always update the checksum, as it did in versions prior to 
5.4, as the alternative fundamentally undermines TCP by 
making the undetected error rate of its streams unknown and 
unbounded. One might argue networks these days are reliable; 
I think it better to avoid the need to make the argument. 
In any case the work I've found on that question is not 
reassuring [1].

best, 
Richard. 

[0] pf.c 1.863

On initial rule match: 
pf_test_rule()
  3445: pf_translate()
 3707: pf_change_ap()
1677: PF_ACPY [= pf_addrcpy()] 
  3461: pf_cksum()
 6775: pd-hdr.tcp-th_sum = 0;
   m-m_pkthdr.csum_flags |= M_TCP_CSUM_OUT 
   (if orig checksum good) 

On subsequent state matching: 
pf_test_state() 
   ~4445: pf_change_ap() etc
   4471: pf_cksum() etc

[1] Probably the strongest message of this study is that the 
networking hardware is often trashing the packets which are 
entrusted to it

http://conferences.sigcomm.org/sigcomm/2000/conf/paper/sigcomm2000-9-1.pdf

Jonathan Stone and Craig Partridge. 2000. When the CRC and TCP checksum 
disagree. 
In Proceedings of the conference on Applications, Technologies, Architectures, 
and 
Protocols for Computer Communication (SIGCOMM '00). ACM, New York, NY, USA, 
309-319. 
DOI=10.1145/347059.347561 http://doi.acm.org/10.1145/347059.347561



Re: NAT reliability in light of recent checksum changes

2014-01-21 Thread Henning Brauer
* Richard Procter richard.n.proc...@gmail.com [2014-01-22 06:44]:
  That is exactly what slides 30-33 talk about. PF now checks
  the incoming packets before it rewrites the checksum, so it can
  reject them if they are broken.
 Right -- so NAT now replaces the existing transport checksum
 with one newly computed from the payload [0].

correct - IF the original cksum was right.

 This fundamentally weakens its usefulness, though: a correct
 checksum now implies only that the payload likely matches
 what the last NAT router happened to have in its memory,
 whereas the receiver wants to know whether what it got is
 what was originally transmitted. In the worst case of NAT on
 every intermediate node the transport checksum is
 effectively reduced to an adjunct of the link layer
 checksum.

huh?
we receive a packet with correct cksum - NAT - packet goes out with
correct cksum.
we receive a packet with broken cksum - NAT - we leave the cksum
alone, i. e. leave it broken.

 I think it's great to see someone working hard to simplify 
 crucial code but in light of the above I believe pf should 
 always update the checksum, as it did in versions prior to 
 5.4, as the alternative fundamentally undermines TCP by 
 making the undetected error rate of its streams unknown and 
 unbounded. One might argue networks these days are reliable; 
 I think it better to avoid the need to make the argument. 
 In any case the work I've found on that question is not 
 reassuring [1].

It doesn't seem you know what you are talking about. the cksum is dead
simple, if we had bugs in claculating or verifying it, we really had a
LOT of other problems. There is no undetected error rate, nothing
really changes there.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services GmbH, http://bsws.de, Full-Service ISP
Secure Hosting, Mail and DNS Services. Dedicated Servers, Root to Fully Managed
Henning Brauer Consulting, http://henningbrauer.com/



Re: NAT reliability in light of recent checksum changes

2014-01-15 Thread Stuart Henderson
On 2014-01-14, Richard Procter richard.n.proc...@gmail.com wrote:
 Hi all, 

 I'm using OpenBSD 5.3 to provide an Alix-based home firewall. Thank
 you all for the commitment to elegant, well-documented software which
 isn't pernicious to the mental health of its users.

 I've a question about the new checksum changes[0], being interested 
 in such things and having listened to Henning's presentation and 
 poked around in the archives a little. My understanding is that 
 checksums are now always recalculated when a header is altered, 
 never updated.[1]

 Is that right and if so has this affected NAT reliability? 
 
 Recalculation here would compromise reliable end-to-end transport 
 as the payload checksum no longer covers the entire network path, 
 and so break a basic transport layer design principle.[2][3]

That is exactly what slides 30-33 talk about. PF now checks
the incoming packets before it rewrites the checksum, so it can
reject them if they are broken.

 [1] e.g.
26:45 slide 27, 'use protocol checksum offloading better'
http://quigon.bsws.de/papers/2013/EuroBSDcon/mgp00027.html 
30:51 slide 30, 'consequences in pf'
http://quigon.bsws.de/papers/2013/EuroBSDcon/mgp00030.html
https://www.youtube.com/watch?v=AymV11igbLY 
'The surprising complexity of checksums in TCP/IP'

 [2] V. Cerf, R. Khan, IEEE Trans on Comms, Vol Com-22, No 5 May 1974
 Page 3 in original emphasis. 

 The remainder of the packet consists of text for delivery to the
 destination and a trailing check sum used for end-to-end software
 verification. The GATEWAY does /not/ modify the text and merely
 forwards the check sum along without computing or recomputing it.

 [3] Page 3. http://www.ietf.org/rfc/rfc793.txt

 The TCP must recover from data that is damaged, lost, duplicated, or
 delivered out of order by the internet communication system. [...]
 Damage is handled by adding a checksum to each segment transmitted,
 checking it at the receiver, and discarding damaged segments. 



NAT reliability in light of recent checksum changes

2014-01-14 Thread Richard Procter
Hi all, 

I'm using OpenBSD 5.3 to provide an Alix-based home firewall. Thank
you all for the commitment to elegant, well-documented software which
isn't pernicious to the mental health of its users.

I've a question about the new checksum changes[0], being interested 
in such things and having listened to Henning's presentation and 
poked around in the archives a little. My understanding is that 
checksums are now always recalculated when a header is altered, 
never updated.[1]

Is that right and if so has this affected NAT reliability? 
Recalculation here would compromise reliable end-to-end transport 
as the payload checksum no longer covers the entire network path, 
and so break a basic transport layer design principle.[2][3]

best, 
Richard.

[0] http://www.openbsd.org/54.html Reworked checksum handling for
network protocols.

[1] e.g.
   26:45 slide 27, 'use protocol checksum offloading better'
   http://quigon.bsws.de/papers/2013/EuroBSDcon/mgp00027.html 
   30:51 slide 30, 'consequences in pf'
   http://quigon.bsws.de/papers/2013/EuroBSDcon/mgp00030.html
   https://www.youtube.com/watch?v=AymV11igbLY 
   'The surprising complexity of checksums in TCP/IP'

[2] V. Cerf, R. Khan, IEEE Trans on Comms, Vol Com-22, No 5 May 1974
Page 3 in original emphasis. 

 The remainder of the packet consists of text for delivery to the
 destination and a trailing check sum used for end-to-end software
 verification. The GATEWAY does /not/ modify the text and merely
 forwards the check sum along without computing or recomputing it.

[3] Page 3. http://www.ietf.org/rfc/rfc793.txt

 The TCP must recover from data that is damaged, lost, duplicated, or
 delivered out of order by the internet communication system. [...]
 Damage is handled by adding a checksum to each segment transmitted,
 checking it at the receiver, and discarding damaged segments.