Francois PIETTE wrote: >> (3) In the code running after a failed download I'm removing the last >> portion of the received data, just in case it's corrupted. I noticed >> this behavior in a freeware download manager I used to use some time >> ago. But now I'm asking: is this really necessary? HTTP traffic travels >> over TCP and TCP is supposed to be an checksummed, reliable protocol. Is >> the probability of receiving corrupt data high enough to make such >> radical surgery useful? If so, is tail-trimming the best solution, or >> should I implement some other kind of checksumming to make sure no >> portion of the code is actually corrupt (after all, corruption might >> occur anywhere in the document, not just in the last received portion). >> > > You are right, I don't see any reason to throw away part of the received > data. By TCP protocol specification, only valid data is delivered to the > application. If you get it, it is correct data. > I did a bit of googling on TCP, IP and transport layer checksumming and found some interesting results. It seems that TCP provides an additive checksum to protect it's payload, and apparently that's not very strong. Transport layers provide allot stronger passwords: Ethernet uses 32 bit CRC checksums, PPP uses 16 bit CRC checksums and ATM uses a 8 bit CRC checksum that only protects the header.
The weakest link from HTTP server to CLIENT is probably the link from CLIENT to ISP. The worst case scenario would be a dial-up connection using PPP-only. A better but still bad scenario would be an ADSL connection running PPP-over-ATM (because ATM does not seem to checksum it's payload). Most of my clients would fit in one of those 2 groups! Considering those facts, throwing away part of the already received data might be a good thing IF the connection to the HTTP server was lost because the client lost it's connection to the ISP. Chances are the connection to the ISP was lost because the line became noisy. If the connection to the HTTP server was lost on grounds of network congestion then throwing away part of the received data would be meaningless since the chances for data corruption are equal all over the received data, they're not higher towards the end of the file. Conclusion: I think data corruption might be a problem in some cases. Notice how all Linux distributions include MD5 hashes for all downloads, so they can be checked on the receiving end? I decided to implement MD5-based file checking for my downloads (my application only downloads stuff from my own site, so I've got everything under my control). I've done this because I know I've got quite a few clients on very bad dial-up lines. If I didn't have those clients I would have done no checking at all. -- Cosmin Prund -- To unsubscribe or change your settings for TWSocket mailing list please goto http://www.elists.org/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be