Re: NAT reliability in light of recent checksum changes
On 7/03/2014, at 2:15 PM, Richard Procter wrote: I've some ideas about solutions [for modifying checksums more cleanly] but will leave those for another email. Shifting this old thread to tech@: I've posted a patch that re-instates the pf algorithm of OpenBSD 5.4 for preserving payload checksums end-to-end but rewritten without the ugly and error-prone (but speedy!) code and aiming to have no significant impact on performance. best, Richard.
Re: NAT reliability in light of recent checksum changes
On 27/02/2014, at 11:04 AM, Theo de Raadt wrote: There was a method of converting an in-bound checksum, due to NAT conversion, into a new out-bound checksum. A process is required, it's how NAT works. A new method of version is being used. It is mathematically equivelant to the old method. First, I agree with Theo that modifying a checksum is mathematically equivalent to regenerating it; both give the same result on ideal hardware. Of course, we use checksums because our hardware isn't ideal, so let's look at how the two approaches differ when a router fault occurs. Take Stuart Henderson's example: Consider this scenario, which has happened in real life. - NIC supports checksum offloading, verified checksum is OK. - PCI transfers are broken (in my case it affected multiple machines of a certain type, so most likely a motherboard bug), causing some corruption in the payload, but the machine won't detect them because it doesn't look at checksums itself, just trusts the NIC's rx csum good flag. In this situation, packets which have been NATted that are corrupt now get a new checksum that is valid; so the final endpoint can not detect the breakage. That is, when the router offloads and regenerates, the router's egress NIC will hide any card, stack, bus or memory fault a verified packet suffered in passing through the router when it regenerates a new checksum from the now corrupt data. Looking at the code, the relevant functions are pf.c:pf_check_proto_cksum(), which trusts the ingress NIC's checksum good flag, and pf.c:pf_cksum(), which zeros the existing checksum on that basis and flags it to be regenerated by the egress NIC[1]. By contrast, checksum modification is far more reliable. In order to hide payload corruption the update code[1] would have to modify the checksum to exactly account for it. But that would have to happen by accident --- by a fault that in effect computes the necessary change --- as the update code never considers the payload[0]. It's not impossible but, on the other hand, checksum regeneration guarantees to hide faults in the regenerating router. We conclude that in the typical offloading case, regenerated checksums, unlike modified ones, cannot detect faults in the regenerating routers. Whether this difference is significant is a matter of judgment and a separate issue. I've some ideas about solutions but will leave those for another email. best, Richard. PS. I find the following terminology helpful: Checksums calculated from the origin data are 'original'; checksums calculated from a copy are 'regenerated'. Checksums may also be 'modified' to account for altered data in such a way as to preserve originality for any unaltered data[0]. A checksum is 'end-to-end' if it is delivered original with respect to the payload. A modified checksum may be end-to-end but never a regenerated checksum as it is not original. [0] Strikingly, RFC1631 (1994) and RFC3022 (2001), the NAT RFCs, fail to say end-to-end preservation is a property of their checksum modification algorithm. I presume it just didn't seem worth mentioning as, lacking hardware offload back then, one wouldn't regenerate in software on performance grounds alone. It is only alluded to in RFC1071 (1988) Computing the Internet Checksum, which states that a checksum remains end-to-end when modified 'since it was not fully recomputed'. Although that's still true if NAT modifies it, NAT makes the meaning of 'end-to-end' more complex; I think my above terminology helps there. [1] I'll quote OpenBSD code here for completeness, contrasting modification (OpenBSD 5.3) with regeneration (OpenBSD 5.4) OpenBSD 5.3 NAT modified the checksum as follows: --- pf.c 1.818 (OPENBSD_5_3) --- Assuming an AF_INET - AF_INET TCP connection. pf_test_rule() 3862: pf_translate() 3881: pf_change_ap() [ src addr/port ] 1671: PF_ACPY [ = pf_addrcpy() ] 1689: pf_cksum_fixup(...) [ psuedo code is: sum = fixup(sum, addr16[1]) sum = fixup(sum, addr16[0]) sum = fixup(sum, port) ] 1662: l = cksum + old - new --- checksum modified [ then presumably account for ones-complement carries ] 3887: pf_change_ap() etc [ dst addr/port ] On subsequent state matching: pf_test() 6788: pf_test_state_tcp() [ for TCP ] 4566: pf_change_ap() etc [ for src addr/port ] 4574: pf_change_ap() etc [ for dst addr/port ] --- OpenBSD 5.4 NAT regenerates checksums as follows: --- pf.c 1.863 (post OPENBSD_5_4) --- Assuming an AF_INET - AF_INET TCP connection. On initial rule match: pf_test_rule() 3445: pf_translate() 3707: pf_change_ap() 1677: PF_ACPY [= pf_addrcpy()] 3461: pf_cksum() 6775: pd-hdr.tcp-th_sum = 0; --- checksum zeroed m-m_pkthdr.csum_flags |= M_TCP_CSUM_OUT --- flagged for recalculation (if orig checksum good) On subsequent state matching:
Re: NAT reliability in light of recent checksum changes
On 24/02/2014, at 9:33 PM, Henning Brauer wrote: * Richard Procter richard.n.proc...@gmail.com [2014-01-25 20:41]: On 22/01/2014, at 7:19 PM, Henning Brauer wrote: * Richard Procter richard.n.proc...@gmail.com [2014-01-22 06:44]: This fundamentally weakens its usefulness, though: a correct checksum now implies only that the payload likely matches what the last NAT router happened to have in its memory huh? we receive a packet with correct cksum - NAT - packet goes out with correct cksum. we receive a packet with broken cksum - NAT - we leave the cksum alone, i. e. leave it broken. Christian said it better than me: routers may corrupt data and regenerating the checksum will hide it. if that happened we had much bigger problems than NAT. By bigger problems do you mean obvious router stability issues? Suppose someone argued that as we'd have obvious stability issues if unprotected memory was unreliable, ECC memory is unnecessary. That argument is logically equivalent to what seems to be yours, that as we'd see obvious issues if routers were corrupting data, end-to-end checksums are unnecessary, but I don't buy it. We know that routers corrupt data. Right now my home firewall shows 30 TCP segments dropped for bad checksums. As checks at least as strong are used by every sane link-layer this virtually implies the dropped packets suffered router or end-point faults. Again, it's not just me saying it: ...checksums are used by higher layers to ensure that data was not corrupted in intermediate routers or by the sending or receiving host. The fact that checksums are typically the secondary level of protection has often led to suggestions that checksums are superfluous. Hard won experience, however, has shown that checksums are necessary. Software errors (such as buffer mismanagement) and even hardware errors (such as network adapters with poor DMA hardware that sometimes fail to fully DMA data) are surprisingly common [let alone memory faults! RP] and checksums have been very useful in protecting against such errors.[0] best, Richard. [0] Craig Partridge, Jim Hughes, and Jonathan Stone. 1995. Performance of checksums and CRCs over real data. SIGCOMM Comput. Commun. Rev. 25, 4 (October 1995), 68-76. DOI=10.1145/217391.217413 http://doi.acm.org/10.1145/217391.217413 page 1
Re: NAT reliability in light of recent checksum changes
On 24/02/2014, at 9:33 PM, Henning Brauer wrote: * Richard Procter richard.n.proc...@gmail.com [2014-01-25 20:41]: On 22/01/2014, at 7:19 PM, Henning Brauer wrote: * Richard Procter richard.n.proc...@gmail.com [2014-01-22 06:44]: This fundamentally weakens its usefulness, though: a correct checksum now implies only that the payload likely matches what the last NAT router happened to have in its memory huh? we receive a packet with correct cksum - NAT - packet goes out with correct cksum. we receive a packet with broken cksum - NAT - we leave the cksum alone, i. e. leave it broken. Christian said it better than me: routers may corrupt data and regenerating the checksum will hide it. if that happened we had much bigger problems than NAT. By bigger problems do you mean obvious router stability issues? Suppose someone argued that as we'd have obvious stability issues if unprotected memory was unreliable, ECC memory is unnecessary. That argument is logically equivalent to what seems to be yours, that as we'd see obvious issues if routers were corrupting data, end-to-end checksums are unnecessary, but I don't buy it. What is your solution? We know that routers corrupt data. Right now my home firewall shows 30 TCP segments dropped for bad checksums. As checks at least as strong are used by every sane link-layer this virtually implies the dropped packets suffered router or end-point faults. Yes. And what is your solution? Again, it's not just me saying it: ...checksums are used by higher layers to ensure that data was not corrupted in intermediate routers or by the sending or receiving host. The fact that checksums are typically the secondary level of protection has often led to suggestions that checksums are superfluous. Hard won experience, however, has shown that checksums are necessary. Software errors (such as buffer mismanagement) and even hardware errors (such as network adapters with poor DMA hardware that sometimes fail to fully DMA data) are surprisingly common [let alone memory faults! RP] and checksums have been very useful in protecting against such errors.[0] I'll ask again, since you keep just trashing other people's code. I'm getting ready to declare you a kook, because I suspect you're going to suggest we change ethernet header and IP headers or prohibit NAT.
Re: NAT reliability in light of recent checksum changes
Again, it's not just me saying it: ...checksums are used by higher layers to ensure that data was not corrupted in intermediate routers or by the sending or receiving host. The fact that checksums are typically the secondary level of protection has often led to suggestions that checksums are superfluous. Hard won experience, however, has shown that checksums are necessary. Software errors (such as buffer mismanagement) and even hardware errors (such as network adapters with poor DMA hardware that sometimes fail to fully DMA data) are surprisingly common [let alone memory faults! RP] and checksums have been very useful in protecting against such errors.[0] Richard, your use of this quote is tantamount to declaring that Henning has disabled or otherwise gutted checksums. He has not disabled checksums. There was a method of converting an in-bound checksum, due to NAT conversion, into a new out-bound checksum. A process is required, it's how NAT works. A new method of version is being used. It is mathematically equivelant to the old method. The quote above is about disabling checksums. Checksums have not been disabled, in any way. New checksums are not being invented out of anyone's ass for old packets. I believe you are posting cast aspersions on the pf efforts. Your repeated attempts to false aspersion are only reflecting back on you.
Re: NAT reliability in light of recent checksum changes
On 27/02/2014, at 11:04 AM, Theo de Raadt wrote: I believe you are posting cast aspersions on the pf efforts. Theo, I'll insist then that I think pf is a superior piece of code which I benefit from every day, and that Henning's efforts to simplify it are so very welcome in a world addicted to complexity. My beef is solely with the technique of regenerating checksums, not the people working on the code. Criticising a design choice with argument and evidence is not the same as attacking the designer's integrity or competence and if I seem to be playing the men and not the ball, it is not my intent and I apologise. As to your other points, I will hopefully address them in another email I have been drafting and should have finished over the next few days. best, Richard.
Re: NAT reliability in light of recent checksum changes
* Richard Procter richard.n.proc...@gmail.com [2014-01-25 20:41]: On 22/01/2014, at 7:19 PM, Henning Brauer wrote: * Richard Procter richard.n.proc...@gmail.com [2014-01-22 06:44]: This fundamentally weakens its usefulness, though: a correct checksum now implies only that the payload likely matches what the last NAT router happened to have in its memory huh? we receive a packet with correct cksum - NAT - packet goes out with correct cksum. we receive a packet with broken cksum - NAT - we leave the cksum alone, i. e. leave it broken. Christian said it better than me: routers may corrupt data and regenerating the checksum will hide it. if that happened we had much bigger problems than NAT. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services GmbH, http://bsws.de, Full-Service ISP Secure Hosting, Mail and DNS Services. Dedicated Servers, Root to Fully Managed Henning Brauer Consulting, http://henningbrauer.com/
Re: NAT reliability in light of recent checksum changes
* Geoff Steckel g...@oat.com [2014-01-28 03:20]: It would be good if when data protected by a checksum is modified, the current checksum is validated and some appropriate? guess what: that is exactly what happens. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services GmbH, http://bsws.de, Full-Service ISP Secure Hosting, Mail and DNS Services. Dedicated Servers, Root to Fully Managed Henning Brauer Consulting, http://henningbrauer.com/
Re: NAT reliability in light of recent checksum changes
On 28/01/2014, at 4:19 AM, Simon Perreault wrote: Le 2014-01-25 14:40, Richard Procter a écrit : I'm not saying the calculation is bad. I'm saying it's being calculated from the wrong copy of the data and by the wrong device. And it's not just me saying it: I'm quoting the guys who designed TCP. Those guys didn't envision NAT. If you want end-to-end checksum purity, don't do NAT. Let's look at the options. The world needs more addresses than IPv4 provides and NAT gives them to us. There's IPv6, which has about a hundred billion addresses for every bacteria estimated to live on the planet[0], but it's not looking to replace IPv4 any time soon. So NAT is here to stay for a good while longer. Perhaps I can at least stop using NAT on my own network. In my case I can't but let's assume I do. This eliminates one source of error. But my TCP streams may still have now-undetected one-bit errors (at least) if there may be routers out there regenerating checksums. As long as there are, good checksums no longer mean as much by themselves and if I want at least some assurance the network did its job, I still need some other way (e.g, checking the network path contains no such routers, either by inspection or statistically, or by reimplementing an end-to-end checksum at a higher layer, etc). Regenerated checksums affect me whether or not I use NAT myself. Another option is to always update the checksum as versions prior to version 5.4 did. It's reasonable to ask, well is any more reliable than recomputing them as 5.4 does? That is, can the old update code hide payload corruption, too? In order to hide payload corruption the update code would have to modify the checksum to exactly account for it. But that would have to happen by accident, as it never considers the payload. It's not impossible, but, on the other hand, checksum regeneration guarantees to hide any bad data. So updates are more reliable. A lot more reliable, in fact, as you'd require precisely those memory errors necessary to in effect compute the correct update, or some freak fault in the ALU that did the same thing, or some combination of both. And as that has nothing to do with the update code it is in principle possible for non-NAT connections, too. For the hardware, updates are just an extra load/modify/store and so the chances of a checksum update hiding a corrupted payload are in practical terms equivalent to those of normal forwarding. So your statement holds only if checksums are being regenerated. In general, NAT needn't compromise end-to-end TCP payload checksum integrity, and in versions prior to 5.4, it didn't. best, Richard. [0] Prokaryotes: The unseen majority Proc Natl Acad Sci U S A. 1998 June 9; 95(12): 6578–6583. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC33863/ 2^128 IPv6 addresses = ~ 10^38 ~ 10^38 IPv6 addresses / ~ 10^30 bacteria cells = ~ 10^8 addresses per cell. [1] RFC1071 Computing the Internet Checksum p21 If anything, [this end-to-end property] is the most powerful feature of the TCP checksum!. Page 15 is also touches on the end-to-end preserving properties of checksum update.
Re: NAT reliability in light of recent checksum changes
Le 2014-01-27 21:21, Geoff Steckel a écrit : It would be good if when data protected by a checksum is modified, the current checksum is validated and some appropriate? action is done (drop? produce invalid new checksum?) when proceeding. This is exactly what's being done. Don't you listen when Henning speaks? Simon
Re: NAT reliability in light of recent checksum changes
Le 2014-01-28 03:39, Richard Procter a écrit : In order to hide payload corruption the update code would have to modify the checksum to exactly account for it. But that would have to happen by accident, as it never considers the payload. It's not impossible, but, on the other hand, checksum regeneration guarantees to hide any bad data. So updates are more reliable. This analysis is bullshit. You need to take into account the fact that checksums are verified before regenerating them. That is, you need to compare a) verifying + regenerating vs b) updating. If there's an undetectable error, you're going to propagate it no matter whether you do a) or b). Simon
Re: NAT reliability in light of recent checksum changes
On 2014-01-28, Simon Perreault sperrea...@openbsd.org wrote: Le 2014-01-28 03:39, Richard Procter a écrit : In order to hide payload corruption the update code would have to modify the checksum to exactly account for it. But that would have to happen by accident, as it never considers the payload. It's not impossible, but, on the other hand, checksum regeneration guarantees to hide any bad data. So updates are more reliable. This analysis is bullshit. You need to take into account the fact that checksums are verified before regenerating them. That is, you need to compare a) verifying + regenerating vs b) updating. If there's an undetectable error, you're going to propagate it no matter whether you do a) or b). Simon Checksums are, in many cases, only verified *on the NIC*. Consider this scenario, which has happened in real life. - NIC supports checksum offloading, verified checksum is OK. - PCI transfers are broken (in my case it affected multiple machines of a certain type, so most likely a motherboard bug), causing some corruption in the payload, but the machine won't detect them because it doesn't look at checksums itself, just trusts the NIC's rx csum good flag. In this situation, packets which have been NATted that are corrupt now get a new checksum that is valid; so the final endpoint can not detect the breakage. I'm not sure if this is common enough to be worth worrying about here, but the analysis is not bullshit.
Re: NAT reliability in light of recent checksum changes
Em 28-01-2014 15:45, Stuart Henderson escreveu: On 2014-01-28, Simon Perreault sperrea...@openbsd.org wrote: Le 2014-01-28 03:39, Richard Procter a écrit : In order to hide payload corruption the update code would have to modify the checksum to exactly account for it. But that would have to happen by accident, as it never considers the payload. It's not impossible, but, on the other hand, checksum regeneration guarantees to hide any bad data. So updates are more reliable. This analysis is bullshit. You need to take into account the fact that checksums are verified before regenerating them. That is, you need to compare a) verifying + regenerating vs b) updating. If there's an undetectable error, you're going to propagate it no matter whether you do a) or b). Simon Checksums are, in many cases, only verified *on the NIC*. Consider this scenario, which has happened in real life. - NIC supports checksum offloading, verified checksum is OK. - PCI transfers are broken (in my case it affected multiple machines of a certain type, so most likely a motherboard bug), causing some corruption in the payload, but the machine won't detect them because it doesn't look at checksums itself, just trusts the NIC's rx csum good flag. In this situation, packets which have been NATted that are corrupt now get a new checksum that is valid; so the final endpoint can not detect the breakage. I'm not sure if this is common enough to be worth worrying about here, but the analysis is not bullshit. Stuart, It is more common than you might think. I had some gigabit motherboards in which some models always would corrupt the packets when using the onboard nic. I believe that in these cases there isn't much that the OS can do. Unfortunately, it's always the application job to detect if it is receiving good or bad data. Cheers, -- Giancarlo Razzolini GPG: 4096R/77B981BC
Re: NAT reliability in light of recent checksum changes
Le 2014-01-28 12:45, Stuart Henderson a écrit : This analysis is bullshit. You need to take into account the fact that checksums are verified before regenerating them. That is, you need to compare a) verifying + regenerating vs b) updating. If there's an undetectable error, you're going to propagate it no matter whether you do a) or b). Checksums are, in many cases, only verified *on the NIC*. Consider this scenario, which has happened in real life. - NIC supports checksum offloading, verified checksum is OK. - PCI transfers are broken (in my case it affected multiple machines of a certain type, so most likely a motherboard bug), causing some corruption in the payload, but the machine won't detect them because it doesn't look at checksums itself, just trusts the NIC's rx csum good flag. In this situation, packets which have been NATted that are corrupt now get a new checksum that is valid; so the final endpoint can not detect the breakage. I'm not sure if this is common enough to be worth worrying about here, but the analysis is not bullshit. You're right. I was in the rough, sorry, and thanks for the explanation. I don't think this scenario is worth worrying about though. Simon
Re: NAT reliability in light of recent checksum changes
On Mon, Jan 27, 2014 at 8:19 AM, Simon Perreault simon.perrea...@viagenie.ca wrote: Le 2014-01-25 14:40, Richard Procter a écrit : I'm not saying the calculation is bad. I'm saying it's being calculated from the wrong copy of the data and by the wrong device. And it's not just me saying it: I'm quoting the guys who designed TCP. Those guys didn't envision NAT. If you want end-to-end checksum purity, don't do NAT. Simon Relying on TCP checksums is risky - they are too weak. I live at the end of a wireless link that starts at around 7K feet elevation, goes over a 12K foot ridge, lands at my neighbors roof at 10k feet and then bounces across the street to my house. At one point I was having lots of issues with data corruption - updates failing, even images on web pages going technicolor half way through the download. The ISP ultimately determined there was a bad transmitter and replaced it. The corruption was so severe that it was overwhelming the TCP checksums to the point that as far as TCP was concerned it was delivering good data (just not the same data twice :-). Until they fixed the issue I was able to run a proxy over ssh which gave me slower but reliable network service. -N
Re: NAT reliability in light of recent checksum changes
Le 2014-01-25 14:40, Richard Procter a écrit : I'm not saying the calculation is bad. I'm saying it's being calculated from the wrong copy of the data and by the wrong device. And it's not just me saying it: I'm quoting the guys who designed TCP. Those guys didn't envision NAT. If you want end-to-end checksum purity, don't do NAT. Simon
Re: NAT reliability in light of recent checksum changes
Em 27-01-2014 14:30, Nick Bender escreveu: On Mon, Jan 27, 2014 at 8:19 AM, Simon Perreault simon.perrea...@viagenie.ca wrote: Le 2014-01-25 14:40, Richard Procter a écrit : I'm not saying the calculation is bad. I'm saying it's being calculated from the wrong copy of the data and by the wrong device. And it's not just me saying it: I'm quoting the guys who designed TCP. Those guys didn't envision NAT. If you want end-to-end checksum purity, don't do NAT. Simon Relying on TCP checksums is risky - they are too weak. I live at the end of a wireless link that starts at around 7K feet elevation, goes over a 12K foot ridge, lands at my neighbors roof at 10k feet and then bounces across the street to my house. At one point I was having lots of issues with data corruption - updates failing, even images on web pages going technicolor half way through the download. The ISP ultimately determined there was a bad transmitter and replaced it. The corruption was so severe that it was overwhelming the TCP checksums to the point that as far as TCP was concerned it was delivering good data (just not the same data twice :-). Until they fixed the issue I was able to run a proxy over ssh which gave me slower but reliable network service. -N I had the same issue on a different scenario. I traveled to a place where the internet connection was so slow and so unreliable, that almost all https handshakes would never complete. And yet checksums had a rate of almost 60% of them being ok. That's why I always have a VPN server lying around, to route my traffic to. In my experience, on very unreliable connections, a UDP vpn, such as openvpn, saves the day. NAT should (and will) have a very slow and painful death. But, then again, IPv4 is about to die for more than a decade, and it's still here. I guess the death will be very, very, very slow. Cheers, -- Giancarlo Razzolini GPG: 4096R/77B981BC
Re: NAT reliability in light of recent checksum changes
FWIW, you don't have to out in the sticks (the backwoods?) to have a network problem: http://mina.naguib.ca/blog/2012/10/22/the-little-ssh-that-sometimes-couldnt.html However, as I understand it, in this case the TCP checksumming worked and protected the application from the corrupted data. Cheers, Robb.
Re: NAT reliability in light of recent checksum changes
Em 27-01-2014 19:05, Why 42? The lists account. escreveu: FWIW, you don't have to out in the sticks (the backwoods?) to have a network problem: http://mina.naguib.ca/blog/2012/10/22/the-little-ssh-that-sometimes-couldnt.html However, as I understand it, in this case the TCP checksumming worked and protected the application from the corrupted data. Cheers, Robb. I wasn't exactly in the woods, but I had a 600Kbps unreliable ADSL connection that would send the packets. But the latency and corruption was so severe that TLS handshakes would take too long. And even if complete, the connection wouldn't sustain itself. Anyway, the UDP vpn improved things quite a bit. This due, well, to UDP of course, and to the dynamic compression, reducing the amount of data sent to the wire. This case you pointed, the TCP checksumming was doing it's job. Successfully protecting the application. This kind of things, where bits randomly flip, proves that computer science can be anything but an EXACT science. That's one of the reasons why the machines will (hopefully) always need humans. Cheers, -- Giancarlo Razzolini GPG: 4096R/77B981BC
Re: NAT reliability in light of recent checksum changes
On 01/27/2014 08:07 PM, Giancarlo Razzolini wrote: Em 27-01-2014 19:05, Why 42? The lists account. escreveu: FWIW, you don't have to out in the sticks (the backwoods?) to have a network problem: http://mina.naguib.ca/blog/2012/10/22/the-little-ssh-that-sometimes-couldnt.html However, as I understand it, in this case the TCP checksumming worked and protected the application from the corrupted data. Cheers, Robb. I wasn't exactly in the woods, but I had a 600Kbps unreliable ADSL connection that would send the packets. But the latency and corruption was so severe that TLS handshakes would take too long. And even if complete, the connection wouldn't sustain itself. Anyway, the UDP vpn improved things quite a bit. This due, well, to UDP of course, and to the dynamic compression, reducing the amount of data sent to the wire. This case you pointed, the TCP checksumming was doing it's job. Successfully protecting the application. This kind of things, where bits randomly flip, proves that computer science can be anything but an EXACT science. That's one of the reasons why the machines will (hopefully) always need humans. Cheers, To add to the preceeding... One client of mine used a CVS repository via coast-to-coast NFS. Somewhere in the deeps, the UDP checksum was set to 0 (no checksum). Somewhere else, one bit in each packet was corrupted. If the UDP checksum had been present we would have seen the bad data a lot sooner. We had to go back at least a month, sometimes more, to find good data, and then recreate all the edits. This scenario shows a danger of silently passing corrupt packets. It would be good if when data protected by a checksum is modified, the current checksum is validated and some appropriate? action is done (drop? produce invalid new checksum?) when proceeding. Geoff Steckel
Re: NAT reliability in light of recent checksum changes
On 22/01/2014, at 7:19 PM, Henning Brauer wrote: * Richard Procter richard.n.proc...@gmail.com [2014-01-22 06:44]: This fundamentally weakens its usefulness, though: a correct checksum now implies only that the payload likely matches what the last NAT router happened to have in its memory huh? we receive a packet with correct cksum - NAT - packet goes out with correct cksum. we receive a packet with broken cksum - NAT - we leave the cksum alone, i. e. leave it broken. Christian said it better than me: routers may corrupt data and regenerating the checksum will hide it. That's more than a theoretical concern. The article I referenced is a detailed study of real-world traces co-authored by a member of the Stanford distributed systems group that concludes Probably the strongest message of this study is that the networking hardware is often trashing the packets which are entrusted to it[0]. More generally, TCP checksums provide for an acceptable error rate that is independent of the reliability of the underlying network[*] by allowing us to verify its workings. But it's no longer possible to verify network operation if it may be regenerating TCP checksums, as these may hide network faults. That's a fundamental change from the scheme Cerf and Khan emphasized in their design notes for what became known as TCP: The remainder of the packet consists of text for delivery to the destination and a trailing check sum used for end-to-end software verification. The GATEWAY does /not/ modify the text and merely forwards the check sum along without computing or recomputing it.[1] It doesn't seem you know what you are talking about. the cksum is dead simple, if we had bugs in claculating or verifying it, we really had a LOT of other problems. I'm not saying the calculation is bad. I'm saying it's being calculated from the wrong copy of the data and by the wrong device. And it's not just me saying it: I'm quoting the guys who designed TCP. There is no undetected error rate, nothing really changes there. I disagree. Every TCP stream containing aribitrary data may have undetected errors as checksums cannot detect all the errors networks may make (being shorter than the data they cover). The engineer's task is to make network errors reliably negligible in practice. As network regenerated checksums may hide any amount of arbitrary data corruption I believe it's correct to say the network error rate undetected by TCP is then unknown and unbounded. best, Richard. [*] Under reasonable assumptions of the error modes most likely in practice. And some applications require lower error rates than TCP checksums can provide. [0] http://conferences.sigcomm.org/sigcomm/2000/conf/paper/sigcomm2000-9-1.pdf Jonathan Stone and Craig Partridge. 2000. When the CRC and TCP checksum disagree. In Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM '00). ACM, New York, NY, USA, 309-319. DOI=10.1145/347059.347561 http://doi.acm.org/10.1145/347059.347561 [1] A Protocol for Packet Network Intercommunication V. Cerf, R. Khan, IEEE Trans on Comms, Vol Com-22, No 5 May 1974 Page 3 in original emphasis.
Re: NAT reliability in light of recent checksum changes
From owner-misc+M137142=deraadt=cvs.openbsd@openbsd.org Sat Jan 25 12:41:53 2014 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=utWhBX3niMM2LVtE8mfHlY/ky3wCOdmsmIjoMdLaY5Q=; b=EDAHtMzwKNWiAeY56T7Fkl0Q29kOMAMn5QUkTmADQG5qZJ7k9mOWDRnjlN0DLClrDO TpAA7OUGMfA55tXh/dEkKgtjb3inl7IMNyhUahJrETz0uHedS9xyZSTKBbDi9zVWfey1 V3broKdxZP3MA6jmF0aT4jdkaDfC/Hj7UhSX79Qc6zMkr3wZMN6e3sA+31RCnrCj/hwf 8oDhmqPtNYVGBZMm9hyhX1x/FTp/3Ra6tWzUnDtnKozUq2ZeovgLwG3JjcFooQ5572Ef w1uIA4w2em5DRlUSdDtome8dVVewRb25ZeNkPMe8Gul6azVh2zqNNYx7a9b71mLTwGML YXwA== X-Received: by 10.68.204.4 with SMTP id ku4mr21464025pbc.66.1390678851934; Sat, 25 Jan 2014 11:40:51 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1085) Subject: Re: NAT reliability in light of recent checksum changes From: Richard Procter richard.n.proc...@gmail.com In-Reply-To: 20140122061907.gk15...@quigon.bsws.de Date: Sun, 26 Jan 2014 08:40:44 +1300 Content-Transfer-Encoding: 8bit References: 8d493091-c15d-46a2-8004-32dd59395...@gmail.com 20140122061907.gk15...@quigon.bsws.de To: misc@openbsd.org X-Mailer: Apple Mail (2.1085) List-Help: mailto:majord...@openbsd.org?body=help List-ID: misc.openbsd.org List-Owner: mailto:owner-m...@openbsd.org List-Post: mailto:misc@openbsd.org List-Subscribe: mailto:majord...@openbsd.org?body=sub%20misc List-Unsubscribe: mailto:majord...@openbsd.org?body=unsub%20misc X-Loop: misc@openbsd.org Precedence: list Sender: owner-m...@openbsd.org On 22/01/2014, at 7:19 PM, Henning Brauer wrote: * Richard Procter richard.n.proc...@gmail.com [2014-01-22 06:44]: This fundamentally weakens its usefulness, though: a correct checksum now implies only that the payload likely matches what the last NAT router happened to have in its memory huh? we receive a packet with correct cksum - NAT - packet goes out with correct cksum. we receive a packet with broken cksum - NAT - we leave the cksum alone, i. e. leave it broken. Christian said it better than me: routers may corrupt data and regenerating the checksum will hide it. That's more than a theoretical concern. The article I referenced is a detailed study of real-world traces co-authored by a member of the Stanford distributed systems group that concludes Probably the strongest message of this study is that the networking hardware is often trashing the packets which are entrusted to it[0]. More generally, TCP checksums provide for an acceptable error rate that is independent of the reliability of the underlying network[*] by allowing us to verify its workings. But it's no longer possible to verify network operation if it may be regenerating TCP checksums, as these may hide network faults. That's a fundamental change from the scheme Cerf and Khan emphasized in their design notes for what became known as TCP: The remainder of the packet consists of text for delivery to the destination and a trailing check sum used for end-to-end software verification. The GATEWAY does /not/ modify the text and merely forwards the check sum along without computing or recomputing it.[1] It doesn't seem you know what you are talking about. the cksum is dead simple, if we had bugs in claculating or verifying it, we really had a LOT of other problems. I'm not saying the calculation is bad. I'm saying it's being calculated from the wrong copy of the data and by the wrong device. And it's not just me saying it: I'm quoting the guys who designed TCP. There is no undetected error rate, nothing really changes there. I disagree. Every TCP stream containing aribitrary data may have undetected errors as checksums cannot detect all the errors networks may make (being shorter than the data they cover). The engineer's task is to make network errors reliably negligible in practice. As network regenerated checksums may hide any amount of arbitrary data corruption I believe it's correct to say the network error rate undetected by TCP is then unknown and unbounded. best, Richard. [*] Under reasonable assumptions of the error modes most likely in practice. And some applications require lower error rates than TCP checksums can provide. [0] http://conferences.sigcomm.org/sigcomm/2000/conf/paper/sigcomm2000-9-1.pdf Jonathan Stone and Craig Partridge. 2000. When the CRC and TCP checksum disagree. In Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM '00). ACM, New York, NY, USA, 309-319. DOI=10.1145/347059.347561 http://doi.acm.org/10.1145/347059.347561 [1] A Protocol for Packet Network Intercommunication V. Cerf, R. Khan, IEEE Trans on Comms, Vol Com-22, No 5 May 1974 Page 3 in original emphasis.
Re: NAT reliability in light of recent checksum changes
Henning Brauer lists-open...@bsws.de wrote: This fundamentally weakens its usefulness, though: a correct checksum now implies only that the payload likely matches what the last NAT router happened to have in its memory, whereas the receiver wants to know whether what it got is what was originally transmitted. we receive a packet with correct cksum - NAT - packet goes out with correct cksum. we receive a packet with broken cksum - NAT - we leave the cksum alone, i. e. leave it broken. The point Richard may be trying to make is that a packet may be corrupted in memory on the NAT gateway (e.g. RAM error, buggy code writing into random location), and that regenerating the checksum hides such corruption. -- Christian naddy Weisgerber na...@mips.inka.de
Re: NAT reliability in light of recent checksum changes
On 2014-01-15, Stuart Henderson s...@spacehopper.org wrote: On 2014-01-14, Richard Procter richard.n.proc...@gmail.com wrote: I've a question about the new checksum changes. [...] My understanding is that checksums are now always recalculated when a header is altered, never updated. Is that right and if so has this affected NAT reliability? Recalculation here would compromise reliable end-to-end transport as the payload checksum no longer covers the entire network path, and so break a basic transport layer design principle. That is exactly what slides 30-33 talk about. PF now checks the incoming packets before it rewrites the checksum, so it can reject them if they are broken. Right -- so NAT now replaces the existing transport checksum with one newly computed from the payload [0]. This fundamentally weakens its usefulness, though: a correct checksum now implies only that the payload likely matches what the last NAT router happened to have in its memory, whereas the receiver wants to know whether what it got is what was originally transmitted. In the worst case of NAT on every intermediate node the transport checksum is effectively reduced to an adjunct of the link layer checksum. This means transport layer payload integrity is no longer reliant on the quality of the checksum algorithm alone but now depends too on the reliability of the path the packet took through the network. I think it's great to see someone working hard to simplify crucial code but in light of the above I believe pf should always update the checksum, as it did in versions prior to 5.4, as the alternative fundamentally undermines TCP by making the undetected error rate of its streams unknown and unbounded. One might argue networks these days are reliable; I think it better to avoid the need to make the argument. In any case the work I've found on that question is not reassuring [1]. best, Richard. [0] pf.c 1.863 On initial rule match: pf_test_rule() 3445: pf_translate() 3707: pf_change_ap() 1677: PF_ACPY [= pf_addrcpy()] 3461: pf_cksum() 6775: pd-hdr.tcp-th_sum = 0; m-m_pkthdr.csum_flags |= M_TCP_CSUM_OUT (if orig checksum good) On subsequent state matching: pf_test_state() ~4445: pf_change_ap() etc 4471: pf_cksum() etc [1] Probably the strongest message of this study is that the networking hardware is often trashing the packets which are entrusted to it http://conferences.sigcomm.org/sigcomm/2000/conf/paper/sigcomm2000-9-1.pdf Jonathan Stone and Craig Partridge. 2000. When the CRC and TCP checksum disagree. In Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM '00). ACM, New York, NY, USA, 309-319. DOI=10.1145/347059.347561 http://doi.acm.org/10.1145/347059.347561
Re: NAT reliability in light of recent checksum changes
* Richard Procter richard.n.proc...@gmail.com [2014-01-22 06:44]: That is exactly what slides 30-33 talk about. PF now checks the incoming packets before it rewrites the checksum, so it can reject them if they are broken. Right -- so NAT now replaces the existing transport checksum with one newly computed from the payload [0]. correct - IF the original cksum was right. This fundamentally weakens its usefulness, though: a correct checksum now implies only that the payload likely matches what the last NAT router happened to have in its memory, whereas the receiver wants to know whether what it got is what was originally transmitted. In the worst case of NAT on every intermediate node the transport checksum is effectively reduced to an adjunct of the link layer checksum. huh? we receive a packet with correct cksum - NAT - packet goes out with correct cksum. we receive a packet with broken cksum - NAT - we leave the cksum alone, i. e. leave it broken. I think it's great to see someone working hard to simplify crucial code but in light of the above I believe pf should always update the checksum, as it did in versions prior to 5.4, as the alternative fundamentally undermines TCP by making the undetected error rate of its streams unknown and unbounded. One might argue networks these days are reliable; I think it better to avoid the need to make the argument. In any case the work I've found on that question is not reassuring [1]. It doesn't seem you know what you are talking about. the cksum is dead simple, if we had bugs in claculating or verifying it, we really had a LOT of other problems. There is no undetected error rate, nothing really changes there. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services GmbH, http://bsws.de, Full-Service ISP Secure Hosting, Mail and DNS Services. Dedicated Servers, Root to Fully Managed Henning Brauer Consulting, http://henningbrauer.com/
Re: NAT reliability in light of recent checksum changes
On 2014-01-14, Richard Procter richard.n.proc...@gmail.com wrote: Hi all, I'm using OpenBSD 5.3 to provide an Alix-based home firewall. Thank you all for the commitment to elegant, well-documented software which isn't pernicious to the mental health of its users. I've a question about the new checksum changes[0], being interested in such things and having listened to Henning's presentation and poked around in the archives a little. My understanding is that checksums are now always recalculated when a header is altered, never updated.[1] Is that right and if so has this affected NAT reliability? Recalculation here would compromise reliable end-to-end transport as the payload checksum no longer covers the entire network path, and so break a basic transport layer design principle.[2][3] That is exactly what slides 30-33 talk about. PF now checks the incoming packets before it rewrites the checksum, so it can reject them if they are broken. [1] e.g. 26:45 slide 27, 'use protocol checksum offloading better' http://quigon.bsws.de/papers/2013/EuroBSDcon/mgp00027.html 30:51 slide 30, 'consequences in pf' http://quigon.bsws.de/papers/2013/EuroBSDcon/mgp00030.html https://www.youtube.com/watch?v=AymV11igbLY 'The surprising complexity of checksums in TCP/IP' [2] V. Cerf, R. Khan, IEEE Trans on Comms, Vol Com-22, No 5 May 1974 Page 3 in original emphasis. The remainder of the packet consists of text for delivery to the destination and a trailing check sum used for end-to-end software verification. The GATEWAY does /not/ modify the text and merely forwards the check sum along without computing or recomputing it. [3] Page 3. http://www.ietf.org/rfc/rfc793.txt The TCP must recover from data that is damaged, lost, duplicated, or delivered out of order by the internet communication system. [...] Damage is handled by adding a checksum to each segment transmitted, checking it at the receiver, and discarding damaged segments.