Francois PIETTE wrote:
>> (3) In the code running after a failed download I'm removing the last
>> portion of the received data, just in case it's corrupted. I noticed
>> this behavior in a freeware download manager I used to use some time
>> ago. But now I'm asking: is this really necessary? HTTP traffic travels
>> over TCP and TCP is supposed to be an checksummed, reliable protocol. Is
>> the probability of receiving corrupt data high enough to make such
>> radical surgery useful? If so, is tail-trimming the best solution, or
>> should I implement some other kind of checksumming to make sure no
>> portion of the code is actually corrupt (after all, corruption might
>> occur anywhere in the document, not just in the last received portion).
>>     
>
> You are right, I don't see any reason to throw away part of the received 
> data. By TCP protocol specification, only valid data is delivered to the 
> application. If you get it, it is correct data.
>   
I did a bit of googling on TCP, IP and transport layer checksumming and 
found some interesting results. It seems that TCP provides an additive 
checksum to protect it's payload, and apparently that's not very strong. 
Transport layers provide allot stronger passwords: Ethernet uses 32 bit 
CRC checksums, PPP uses 16 bit CRC checksums and ATM uses a 8 bit CRC 
checksum that only protects the header.

The weakest link from HTTP server to CLIENT is probably the link from 
CLIENT to ISP. The worst case scenario would be a dial-up connection 
using PPP-only. A better but still bad scenario would be an ADSL 
connection running PPP-over-ATM (because ATM does not seem to checksum 
it's payload). Most of my clients would fit in one of those 2 groups!

Considering those facts, throwing away part of the already received data 
might be a good thing IF the connection to the HTTP server was lost 
because the client lost it's connection to the ISP. Chances are the 
connection to the ISP was lost because the line became noisy. If the 
connection to the HTTP server was lost on grounds of network congestion 
then throwing away part of the received data would be meaningless since 
the chances for data corruption are equal all over the received data, 
they're not higher towards the end of the file.

Conclusion: I think data corruption might be a problem in some cases. 
Notice how all Linux distributions include MD5 hashes for all downloads, 
so they can be checked on the receiving end? I decided to implement 
MD5-based file checking for my downloads (my application only downloads 
stuff from my own site, so I've got everything under my control). I've 
done this because I know I've got quite a few clients on very bad 
dial-up lines. If I didn't have those clients I would have done no 
checking at all.


--
Cosmin Prund
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

Reply via email to