Attempt to download http://www.razor1911.com/dubmood/chiphop/dubmood__zabutom_-_svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3
Hello, I was trying to download a file with a somewhat unusual name (see subject/below). Works well with Safari, doesn't work at all with wget, though. I tried it this way: $wget http://www.razor1911.com/dubmood/chiphop/dubmood__zabutom_- _svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3 --14:37:07-- http://www.razor1911.com/dubmood/chiphop/ dubmood__zabutom_-_svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3 = `dubmood__zabutom_-_svenska_akademien_-_m?nga_b? ckar_sm?.mp3' Resolving www.razor1911.com... 88.80.5.18 Connecting to www.razor1911.com|88.80.5.18|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 6,598,784 (6.3M) [audio/mpeg] dubmood__zabutom_-_svenska_akademien_-_m?nga_b?ckar_sm?.mp3: Invalid argument Cannot write to `dubmood__zabutom_-_svenska_akademien_-_m?nga_b? ckar_sm?.mp3' (Invalid argument). The file can be found here: http://www.razor1911.com/dubmood/ (the link reads Svenska akademien - Många Bäckar Små) I'm on a PPC Mac (mini), Mac OS 10.4.8. $wget --version GNU Wget 1.10.2 (obtained via darwinports) Elias Pipping
RE: Attempt to download http://www.razor1911.com/dubmood/chiphop/dubmood__zabutom_-_svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3
this is most likely an OS filename character encoding issue. when in doubt about file names just rename the file (-O option, look it up in the manual). this invocation works just fine: ~% wget -O dubmood.mp3 http://www.razor1911.com/dubmood/chiphop/dubmood__zabutom_-_svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3; --06:12:30-- http://www.razor1911.com/dubmood/chiphop/dubmood__zabutom_-_svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3 = `dubmood.mp3' Resolving www.razor1911.com... 88.80.5.18 Connecting to www.razor1911.com|88.80.5.18|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 6,598,784 (6.3M) [audio/mpeg] 100%[=] 6,598,784160.70K/sETA 00:00 06:13:14 (150.55 KB/s) - `dubmood.mp3' saved [6598784/6598784] -Original Message- From: Elias Pipping [mailto:[EMAIL PROTECTED] Sent: Sunday, December 24, 2006 5:52 AM To: [EMAIL PROTECTED] Subject: Attempt to download http://www.razor1911.com/dubmood/chiphop/dubmood__zabutom_-_ svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3 Hello, I was trying to download a file with a somewhat unusual name (see subject/below). Works well with Safari, doesn't work at all with wget, though. I tried it this way: $wget http://www.razor1911.com/dubmood/chiphop/dubmood__zabutom_- _svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3 --14:37:07-- http://www.razor1911.com/dubmood/chiphop/ dubmood__zabutom_-_svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3 = `dubmood__zabutom_-_svenska_akademien_-_m?nga_b? ckar_sm?.mp3' Resolving www.razor1911.com... 88.80.5.18 Connecting to www.razor1911.com|88.80.5.18|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 6,598,784 (6.3M) [audio/mpeg] dubmood__zabutom_-_svenska_akademien_-_m?nga_b?ckar_sm?.mp3: Invalid argument Cannot write to `dubmood__zabutom_-_svenska_akademien_-_m?nga_b? ckar_sm?.mp3' (Invalid argument). The file can be found here: http://www.razor1911.com/dubmood/ (the link reads Svenska akademien - Många Bäckar Små) I'm on a PPC Mac (mini), Mac OS 10.4.8. $wget --version GNU Wget 1.10.2 (obtained via darwinports) Elias Pipping
re: 4 gig ceiling on wget download of wiki database. Wikipedia database being blocked?
Please cc: (copy) your response(s) to my email, [EMAIL PROTECTED], as I am not subscribed to the list, thank you. I've repeatedly tried to download the wikipedia database, one of which is 8 gigabytes, and the other, 57.5 gigabytes, from a server which supports resumability. The downloads are consistently breaking around the 4 gigabyte mark. The ceiling isn't on my end: running and downloading to NTFS. I've also done test downloads from the same wiki server (download.wikimedia.org) (it works fine) and repeated tests of my own bandwidth and network (somewhat slower than it should be, with congestion at times and sporadic dropouts, but since wget supports resumability, that shouldn't be an issue) which rules out those factors--granted, the download might slow down or be broken off, but why can't it resume after the 4 gigabyte mark? I've used a file splitting program to break the partially downloaded database file into smaller parts of differing size. Here are my results: 6 gigabyte file to start. (The 6 gigabyte file resulted from a lucky patch when the connection was unbroken after resuming a 4 gigabyte file--but that isn't acceptable for my purposes) 6 gigabytes broken into 2 gigabyte segments: first 2 gigabyte segment resumed successfully. 6 gigs broken into 3 gigabyte segments: first 3 gigabytes resumed successfully. 6 gigs broken into 4.5 gigabyte segment(s)(seg. 2-partial): will not resume. 6 gig broken into 4.1 gigabyte segment(s) (seg. 2-partial): will not resume. 6 gig broken into 3.9 gigabyte segment(s) (seg. 2-partial): resumed successfully. Of course, the original 6 gigabyte partial file couldn't be resumed. As you are aware, NTFS, while certainly not the Rolls-Royce of FS's, supports multiple exabytes, and therefore that 4 gig ceiling would only apply under a Win-32 formatted partition. Such limits are rare in up-to-date operating systems. I've considered if the data stream is being corrupted, but wget (to my knowledge) doesn't do error checking in the file itself, it just checks remote and local file sizes and does a difference comparison, downloading the remainder if the file size is smaller on the client side. And even if the file were being corrupted, the file-splitting program (which is not adding headers) should have ameliorated the problem by now (by excising the corrupt part), unless either: 1. the corruption is happening at the same point each time; or 2. the server, or something interposed between myself and the server, is blocking the download when resumption of the download of the database file is detected at or beyond the 4 gigabyte mark. I've also tried different download managers: Truedownloader (open-source download manager), which is rejected by the server; and getright, a good commercial program, but it is throttled at 19k/s--making the smaller download well over 120 hours--too slow, especially not knowing if the file is any good to begin with. Wikipedia doesn't have tech support, and I haven't seen anything about this error/problem listed in a search that should encompass their forums--but they do suggest the use of wget for that particular application, so I would infer that the problem is at least related to wget itself. I am using wgetgui (as I mentioned in my previous post to the mailing list) and yes, all the options are checked correctly, I've double and triple-checked and quaduple-checked everything. And then I checked again. The database size is irrelevant: it could be 100 gigabytes, and that would present no difficulty from the standpoint of bandwidth. However, the reason we have such programs as wget is to deal with redundancy and resumability issues with large file downloads. I also see that you've been working on large file issues with wget since 2002, and security issues. But the internet has network protocols to deal with this--what is happening? Why can't I get the data? Have the network transport protocols failed? Has wget failed? The data is supposed to go from point A to point B--what is stopping that? It doesn't make sense. If I'm running up against a wall, I want to see that wall. If something is failing, I want to know what is failing so I can fix it. Do you have an intermediary server that I can FTP off of to get the wikipedia databases? What about CuteFTP? * This e-mail and any files transmitted with it may contain confidential and/or proprietary information. It is intended solely for the use of the individual or entity who is the intended recipient. Unauthorized use of this information is prohibited. If you have received this in error, please contact the sender by replying to this message and delete this material from any system it may be on. This disclaimer precedes in law, and supercedes any and all other disclaimers, regardless of conflicts of construction or interpretation.
Re: re: 4 gig ceiling on wget download of wiki database. Wikipedia database being blocked?
From: Jonathan Bazemore: I've repeatedly tried [...] If it's still true that you're using wget 1.9, you can probably try until doomsday with little chance of success. Wget 1.9 does not support large files. Wget 1.10.2 does support large files. Try the current version of wget, 1.10.2, which offers large-file support on many systems, possibly including your unspecified one. Still my advice. In the future, it might help if you would supply some useful information, like the wget version you're using, and the system type you're using it on. Also, actual commands used and actual output which results would be more useful than vague descriptions like consistently breaking and will not resume. I've used a file splitting program to break the partially downloaded database file into smaller parts of differing size. Here are my results: [...] So, what, you're messing with the partially downloaded file, and you expect wget to figure out what to do? Good luck. [...] wget (to my knowledge) doesn't do error checking in the file itself, it just checks remote and local file sizes and does a difference comparison, downloading the remainder if the file size is smaller on the client side. Only if it can cope with a number as big as the size of the file. Wget 1.9 uses 32-bit integers for file size, and that's not enough bits for numbers over 4G. And if you start breaking up the partially downloaded file, what's it supposed to use for the size of the data already downloaded? Wikipedia doesn't have tech support, [...] Perhaps because they'd get too many questions like this one too many times. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547