Please cc: (copy) your response(s) to my email, [EMAIL PROTECTED], as I am not subscribed to the list, thank you.
I've repeatedly tried to download the wikipedia database, one of which is 8 gigabytes, and the other, 57.5 gigabytes, from a server which supports resumability. The downloads are consistently breaking around the 4 gigabyte mark. The ceiling isn't on my end: running and downloading to NTFS. I've also done test downloads from the same wiki server (download.wikimedia.org) (it works fine) and repeated tests of my own bandwidth and network (somewhat slower than it should be, with congestion at times and sporadic dropouts, but since wget supports resumability, that shouldn't be an issue) which rules out those factors--granted, the download might slow down or be broken off, but why can't it resume after the 4 gigabyte mark? I've used a file splitting program to break the partially downloaded database file into smaller parts of differing size. Here are my results: 6 gigabyte file to start. (The 6 gigabyte file resulted from a "lucky patch" when the connection was unbroken after resuming a 4 gigabyte file--but that isn't acceptable for my purposes) 6 gigabytes broken into 2 gigabyte segments: first 2 gigabyte segment resumed successfully. 6 gigs broken into 3 gigabyte segments: first 3 gigabytes resumed successfully. 6 gigs broken into 4.5 gigabyte segment(s)(seg. 2-partial): will not resume. 6 gig broken into 4.1 gigabyte segment(s) (seg. 2-partial): will not resume. 6 gig broken into 3.9 gigabyte segment(s) (seg. 2-partial): resumed successfully. Of course, the original 6 gigabyte partial file couldn't be resumed. As you are aware, NTFS, while certainly not the Rolls-Royce of FS's, supports multiple exabytes, and therefore that 4 gig "ceiling" would only apply under a Win-32 formatted partition. Such limits are rare in up-to-date operating systems. I've considered if the data stream is being corrupted, but wget (to my knowledge) doesn't do error checking in the file itself, it just checks remote and local file sizes and does a difference comparison, downloading the remainder if the file size is smaller on the client side. And even if the file were being corrupted, the file-splitting program (which is not adding headers) should have ameliorated the problem by now (by excising the corrupt part), unless either: 1. the corruption is happening at the same point each time; or 2. the server, or something interposed between myself and the server, is blocking the download when resumption of the download of the database file is detected at or beyond the 4 gigabyte mark. I've also tried different download managers: Truedownloader (open-source download manager), which is rejected by the server; and getright, a good commercial program, but it is throttled at 19k/s--making the smaller download well over 120 hours--too slow, especially not knowing if the file is any good to begin with. Wikipedia doesn't have tech support, and I haven't seen anything about this error/problem listed in a search that should encompass their forums--but they do suggest the use of wget for that particular application, so I would infer that the problem is at least related to wget itself. I am using wgetgui (as I mentioned in my previous post to the mailing list) and yes, all the options are checked correctly, I've double and triple-checked and quaduple-checked everything. And then I checked again. The database size is irrelevant: it could be 100 gigabytes, and that would present no difficulty from the standpoint of bandwidth. However, the reason we have such programs as wget is to deal with redundancy and resumability issues with large file downloads. I also see that you've been working on large file issues with wget since 2002, and security issues. But the internet has network protocols to deal with this--what is happening? Why can't I get the data? Have the network transport protocols failed? Has wget failed? The data is supposed to go from point A to point B--what is stopping that? It doesn't make sense. If I'm running up against a wall, I want to see that wall. If something is failing, I want to know what is failing so I can fix it. Do you have an intermediary server that I can FTP off of to get the wikipedia databases? What about CuteFTP? ************************************************************* This e-mail and any files transmitted with it may contain confidential and/or proprietary information. It is intended solely for the use of the individual or entity who is the intended recipient. Unauthorized use of this information is prohibited. If you have received this in error, please contact the sender by replying to this message and delete this material from any system it may be on. This disclaimer precedes in law, and supercedes any and all other disclaimers, regardless of conflicts of construction or interpretation. ************************************************************* __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
