Attempt to download http://www.razor1911.com/dubmood/chiphop/dubmood__zabutom_-_svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3

2006-12-24 Thread Elias Pipping

Hello,

I was trying to download a file with a somewhat unusual name (see  
subject/below). Works well with Safari, doesn't work at all with  
wget, though.


I tried it this way:

	$wget http://www.razor1911.com/dubmood/chiphop/dubmood__zabutom_- 
_svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3
	--14:37:07--  http://www.razor1911.com/dubmood/chiphop/ 
dubmood__zabutom_-_svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3
	   = `dubmood__zabutom_-_svenska_akademien_-_m?nga_b? 
ckar_sm?.mp3'

Resolving www.razor1911.com... 88.80.5.18
Connecting to www.razor1911.com|88.80.5.18|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6,598,784 (6.3M) [audio/mpeg]
	dubmood__zabutom_-_svenska_akademien_-_m?nga_b?ckar_sm?.mp3:  
Invalid argument


	Cannot write to `dubmood__zabutom_-_svenska_akademien_-_m?nga_b? 
ckar_sm?.mp3' (Invalid argument).


The file can be found here: http://www.razor1911.com/dubmood/ (the  
link reads Svenska akademien - Många Bäckar Små)


I'm on a PPC Mac (mini), Mac OS 10.4.8.

$wget --version
GNU Wget 1.10.2

(obtained via darwinports)

Elias Pipping

RE: Attempt to download http://www.razor1911.com/dubmood/chiphop/dubmood__zabutom_-_svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3

2006-12-24 Thread George Pavlov
this is most likely an OS filename character encoding issue. when in doubt 
about file names just rename the file (-O option, look it up in the manual). 
this invocation works just fine:

~% wget -O dubmood.mp3 
http://www.razor1911.com/dubmood/chiphop/dubmood__zabutom_-_svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3;
   
--06:12:30--  
http://www.razor1911.com/dubmood/chiphop/dubmood__zabutom_-_svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3
   = `dubmood.mp3'
Resolving www.razor1911.com... 88.80.5.18
Connecting to www.razor1911.com|88.80.5.18|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6,598,784 (6.3M) [audio/mpeg]

100%[=]
 6,598,784160.70K/sETA 00:00

06:13:14 (150.55 KB/s) - `dubmood.mp3' saved [6598784/6598784]




 -Original Message-
 From: Elias Pipping [mailto:[EMAIL PROTECTED] 
 Sent: Sunday, December 24, 2006 5:52 AM
 To: [EMAIL PROTECTED]
 Subject: Attempt to download 
 http://www.razor1911.com/dubmood/chiphop/dubmood__zabutom_-_
 svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3
 
 Hello,
 
 I was trying to download a file with a somewhat unusual name (see  
 subject/below). Works well with Safari, doesn't work at all with  
 wget, though.
 
 I tried it this way:
 
   $wget 
 http://www.razor1911.com/dubmood/chiphop/dubmood__zabutom_- 
 _svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3
   --14:37:07--  http://www.razor1911.com/dubmood/chiphop/ 
 dubmood__zabutom_-_svenska_akademien_-_m%E5nga_b%E4ckar_sm%E5.mp3
  = `dubmood__zabutom_-_svenska_akademien_-_m?nga_b? 
 ckar_sm?.mp3'
   Resolving www.razor1911.com... 88.80.5.18
   Connecting to www.razor1911.com|88.80.5.18|:80... connected.
   HTTP request sent, awaiting response... 200 OK
   Length: 6,598,784 (6.3M) [audio/mpeg]
   dubmood__zabutom_-_svenska_akademien_-_m?nga_b?ckar_sm?.mp3:  
 Invalid argument
 
   Cannot write to 
 `dubmood__zabutom_-_svenska_akademien_-_m?nga_b? 
 ckar_sm?.mp3' (Invalid argument).
 
 The file can be found here: http://www.razor1911.com/dubmood/ (the  
 link reads Svenska akademien - Många Bäckar Små)
 
 I'm on a PPC Mac (mini), Mac OS 10.4.8.
 
   $wget --version
   GNU Wget 1.10.2
 
 (obtained via darwinports)
 
 Elias Pipping
 


re: 4 gig ceiling on wget download of wiki database. Wikipedia database being blocked?

2006-12-24 Thread Jonathan Bazemore
Please cc: (copy) your response(s) to my email,
[EMAIL PROTECTED], as I am not subscribed to the
list, thank you.

I've repeatedly tried to download the wikipedia
database, one of which is 8 gigabytes, and the other,
57.5 gigabytes, from a server which supports
resumability.

The downloads are consistently breaking around the 4
gigabyte mark.  The ceiling isn't on my end: running
and downloading to NTFS.  I've also done test
downloads from the same wiki server
(download.wikimedia.org) (it works fine) and repeated
tests of my own bandwidth and network (somewhat slower
than it should be, with congestion at times and
sporadic dropouts, but since wget supports
resumability, that shouldn't be an issue) which rules
out those factors--granted, the download might slow
down or be broken off, but why can't it resume after
the 4 gigabyte mark?  

I've used a file splitting program to break the
partially downloaded database file into smaller parts
of differing size.  Here are my results:


6 gigabyte file to start.  (The 6 gigabyte file
resulted from a lucky patch when the connection was
unbroken after resuming a 4 gigabyte file--but that
isn't acceptable for my purposes) 

6 gigabytes broken into 2 gigabyte segments:

first 2 gigabyte segment resumed successfully.

6 gigs broken into 3 gigabyte segments:

first 3 gigabytes resumed successfully.

6 gigs broken into 4.5 gigabyte segment(s)(seg.
2-partial):

will not resume.

6 gig broken into 4.1 gigabyte segment(s) (seg.
2-partial):

will not resume.

6 gig broken into 3.9 gigabyte segment(s) (seg.
2-partial):

resumed successfully.  

Of course, the original 6 gigabyte partial file
couldn't be resumed.

As you are aware, NTFS, while certainly not the
Rolls-Royce of FS's, supports multiple exabytes, and
therefore that 4 gig ceiling would only apply under
a Win-32 formatted partition.  Such limits are rare in
up-to-date operating systems.

I've considered if the data stream is being corrupted,
but wget (to my knowledge) doesn't do error checking
in the file itself, it just checks remote and local
file sizes and does a difference comparison,
downloading the remainder if the file size is smaller
on the client side.  And even if the file were being
corrupted, the file-splitting program (which is not
adding headers) should have ameliorated the problem by
now (by excising the corrupt part), unless either: 1.
the corruption is happening at the same point each
time; or 2. the server, or something interposed
between myself and the server, is blocking the
download when resumption of the download of the
database file is detected at or beyond the 4 gigabyte
mark.  

I've also tried different download managers:
Truedownloader (open-source download manager), which
is rejected by the server; and getright, a good
commercial program, but it is throttled at
19k/s--making the smaller download well over 120
hours--too slow, especially not knowing if the file is
any good to begin with.

Wikipedia doesn't have tech support, and I haven't
seen anything about this error/problem listed in a
search that should encompass their forums--but they do
suggest the use of wget for that particular
application, so I would infer that the problem is at
least related to wget itself.

I am using wgetgui (as I mentioned in my previous post
to the mailing list) and yes, all the options are
checked correctly, I've double and triple-checked and
quaduple-checked everything.  And then I checked
again. 
 
The database size is irrelevant: it could be 100
gigabytes, and that would present no difficulty from
the standpoint of bandwidth.  However, the reason we
have such programs as wget is to deal with redundancy
and resumability issues with large file downloads.  I
also see that you've been working on large file issues
with wget since 2002, and security issues.  But the
internet has network protocols to deal with this--what
is happening?

Why can't I get the data?  Have the network transport
protocols failed?  Has wget failed?  The data is
supposed to go from point A to point B--what is
stopping that?  It doesn't make sense.  

If I'm running up against a wall, I want to see that
wall.  If something is failing, I want to know what is
failing so I can fix it.  

Do you have an intermediary server that I can FTP off
of to get the wikipedia databases?  What about
CuteFTP?  



*
This e-mail and any files transmitted with it may contain confidential and/or 
proprietary information.  It is intended solely for the use of the individual 
or entity who is the intended recipient.  Unauthorized use of this information 
is prohibited.  If you have received this in error, please contact the sender 
by replying to this message and delete this material from any system it may be 
on.  This disclaimer precedes in law, and supercedes any and all other 
disclaimers, regardless of conflicts of construction or interpretation.

Re: re: 4 gig ceiling on wget download of wiki database. Wikipedia database being blocked?

2006-12-24 Thread Steven M. Schweda
From: Jonathan Bazemore:

 I've repeatedly tried [...]

   If it's still true that you're using wget 1.9, you can probably try
until doomsday with little chance of success.  Wget 1.9 does not support
large files.  Wget 1.10.2 does support large files.

Try the current version of wget, 1.10.2, which offers large-file
 support on many systems, possibly including your unspecified one.

   Still my advice.

   In the future, it might help if you would supply some useful
information, like the wget version you're using, and the system type
you're using it on.  Also, actual commands used and actual output which
results would be more useful than vague descriptions like consistently
breaking and will not resume.

 I've used a file splitting program to break the
 partially downloaded database file into smaller parts
 of differing size.  Here are my results: [...]

   So, what, you're messing with the partially downloaded file, and you
expect wget to figure out what to do?  Good luck.

 [...] wget (to my knowledge) doesn't do error checking
 in the file itself, it just checks remote and local
 file sizes and does a difference comparison,
 downloading the remainder if the file size is smaller
 on the client side.

   Only if it can cope with a number as big as the size of the file. 
Wget 1.9 uses 32-bit integers for file size, and that's not enough bits
for numbers over 4G.  And if you start breaking up the partially
downloaded file, what's it supposed to use for the size of the data
already downloaded?

 Wikipedia doesn't have tech support, [...]

   Perhaps because they'd get too many questions like this one too many
times.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547