Re: %20 and spaces in a URL

2004-05-21 Thread Hrvoje Niksic
Fred Holmes [EMAIL PROTECTED] writes:

 But I want WGET to convert %20 to space (I think).

Why would you want that?  A URL with a literal space is illegal, at
least for HTTP -- Wget would have to convert the space to %20 to be
able to send the URL to the HTTP server anyway.

OTOH, if you're talking about *file* names, %20 should already be
converted to space.



Re: %20 and spaces in a URL

2004-05-21 Thread Fred Holmes
But I want WGET to convert %20 to space (I think).  I'm using 1.9.1.  I haven't 
checked to see if there is a new stable version.  I'm using the windows binary. I'd 
love to go linux, but the startup transient is too much.  (And the computer they 
furnish me at work is Windows)

I regularly use WGET to download a file that is referenced in my work, so that the 
file is just downloaded and not opened by my browser into its viewing application.  
For large files the download seems to go much faster as well.

Thanks,

Fred Holmes

At 09:01 PM 5/20/2004, Hrvoje Niksic wrote:
Fred Holmes [EMAIL PROTECTED] writes:

 If I have a URL that has %20 in place of spaces, and I use the URL
 directly as the argument of WGET, it seems that the file is always
 not found.  I've discovered that if I replace each %20 with a
 space, and put quotation marks around the entire URL, it works.

That's weird.  Wget converts space to %20, so I don't see any
difference between using space and %20.  Are you sure the URLs are
otherwise the same?  What version of Wget are you using?



Re: %20 and spaces in a URL

2004-05-21 Thread Fred Holmes
At 04:55 AM 5/21/2004, Hrvoje Niksic wrote:
OTOH, if you're talking about *file* names, %20 should already be
converted to space. 

Yes, these URLs are for files, e.g., .pdf and .doc and .zip files.  When I get to work 
today, if I can make a few minutes of time, I'll post an explicit example (or discover 
my error, I hope).

I don't mean literally change to spaces, just parse the %20 correctly so that the file 
is in fact found and downloaded.  I'm downloading single files, references on Google, 
using WGET instead of the browser.

Thanks for your help.

Fred Holmes 



%20 and spaces in a URL -- #2

2004-05-21 Thread Fred Holmes
Here is an example of an instance where a filename containing
%20 fails, but replacing the %20 with spaces, and enclosing in
quotes works.  At the end I find that just putting the original
URL (with %20) in quotation marks makes it work.  There is 
something else unusual about this URL.

The first command validates the version.
The second command is the desired URL/file and fails with %20
The third command repeats the second command with the -d switch.
The fourth command has %20 replaced with space and works.

Other files with %20 on different hosts/servers behave similarly.

Microsoft Windows 2000 [Version 5.00.2195]
(C) Copyright 1985-2000 Microsoft Corp.

C:\Documents and Settings\fholmes\Desktop\WGET-TESTWGET -V
GNU Wget 1.9.1

Copyright (C) 2003 Free Software Foundation, Inc.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

Originally written by Hrvoje Niksic [EMAIL PROTECTED].

C:\Documents and Settings\fholmes\Desktop\WGET-TESTWGET http://hqinet001.hqmc.u
smc.mil/pr/concepts/2004/PDF/CP%2004%20Chap%204%20pdfs/CP04%20CHAP%204%20Aviati
on%20Combat%20Element%20-%20pp186_SINGLE%20INTEGRATED%20AIR%20PICTURE.pdf
--10:13:38--  http://hqinet001.hqmc.usmc.mil/p
   = `p'
Resolving hqinet001.hqmc.usmc.mil... 192.156.19.119
Connecting to hqinet001.hqmc.usmc.mil[192.156.19.119]:80... connected.
HTTP request sent, awaiting response... 404 Object Not Found
10:13:38 ERROR 404: Object Not Found.

'r' is not recognized as an internal or external command,
operable program or batch file.

C:\Documents and Settings\fholmes\Desktop\WGET-TESTWGET -d http://hqinet001.hqm
c.usmc.mil/pr/concepts/2004/PDF/CP%2004%20Chap%204%20pdfs/CP04%20CHAP%204%20Avi
ation%20Combat%20Element%20-%20pp186_SINGLE%20INTEGRATED%20AIR%20PICTURE.pdf
DEBUG output created by Wget 1.9.1 on Windows.

set_sleep_mode(): mode 0x8001, rc 0x8000
--10:14:10--  http://hqinet001.hqmc.usmc.mil/p
   = `p'
Resolving hqinet001.hqmc.usmc.mil... seconds 0.00, 192.156.19.119
Caching hqinet001.hqmc.usmc.mil = 192.156.19.119
Connecting to hqinet001.hqmc.usmc.mil[192.156.19.119]:80... seconds 0.00, connec
ted.
Created socket 720.
Releasing 00894758 (new refcount 1).
---request begin---
GET /p HTTP/1.0
User-Agent: Wget/1.9.1
Host: hqinet001.hqmc.usmc.mil
Accept: */*
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... HTTP/1.1 404 Object Not Found
Server: Microsoft-IIS/4.0
Date: Fri, 21 May 2004 14:19:44 GMT
Content-Length: 461
Content-Type: text/html


Closing fd 720
10:14:10 ERROR 404: Object Not Found.

'r' is not recognized as an internal or external command,
operable program or batch file.

C:\Documents and Settings\fholmes\Desktop\WGET-TESTWGET -d http://hqinet001.hq
mc.usmc.mil/pr/concepts/2004/PDF/CP 04 Chap 4 pdfs/CP04 CHAP 4 Aviation Combat
Element - pp186_SINGLE INTEGRATED AIR PICTURE.pdf
DEBUG output created by Wget 1.9.1 on Windows.

set_sleep_mode(): mode 0x8001, rc 0x8000
--10:15:54--  http://hqinet001.hqmc.usmc.mil/pr/concepts/2004/PDF/CP%2004%20Cha
p%204%20pdfs/CP04%20CHAP%204%20Aviation%20Combat%20Element%20-%20pp186_SINGLE%20
INTEGRATED%20AIR%20PICTURE.pdf
   = `CP04 CHAP 4 Aviation Combat Element - pp186_SINGLE INTEGRATED AIR
 PICTURE.pdf'
Resolving hqinet001.hqmc.usmc.mil... seconds 0.00, 192.156.19.119
Caching hqinet001.hqmc.usmc.mil = 192.156.19.119
Connecting to hqinet001.hqmc.usmc.mil[192.156.19.119]:80... seconds 0.00, connec
ted.
Created socket 720.
Releasing 00895088 (new refcount 1).
---request begin---
GET /pr/concepts/2004/PDF/CP%2004%20Chap%204%20pdfs/CP04%20CHAP%204%20Aviation%
20Combat%20Element%20-%20pp186_SINGLE%20INTEGRATED%20AIR%20PICTURE.pdf HTTP/1.0
User-Agent: Wget/1.9.1
Host: hqinet001.hqmc.usmc.mil
Accept: */*
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... HTTP/1.1 200 OK
Server: Microsoft-IIS/4.0
Connection: keep-alive
Date: Fri, 21 May 2004 14:21:28 GMT
Content-Type: application/pdf
Accept-Ranges: bytes
Last-Modified: Wed, 10 Mar 2004 14:25:50 GMT
ETag: dee42996ab6c41:3994
Content-Length: 10971


Found hqinet001.hqmc.usmc.mil in host_name_addresses_map (00895088)
Registered fd 720 for persistent reuse.
Length: 10,971 [application/pdf]

100%[] 10,97159.19K/s

10:15:54 (59.19 KB/s) - `CP04 CHAP 4 Aviation Combat Element - pp186_SINGLE INTE
GRATED AIR PICTURE.pdf' saved [10971/10971]


C:\Documents and Settings\fholmes\Desktop\WGET-TEST

--

And now I have discovered that just putting quotation marks around the entire
URL makes it work, but it is not found on the first try and then it is
found when WGET automatically makes a second try with a little different syntax.
?  See below.

Microsoft Windows 2000 [Version 5.00.2195]
(C) Copyright 1985-2000 

%20 and spaces in a URL -- #3

2004-05-21 Thread Fred Holmes
Well, it's not simply the %20 that is the problem.  Here's a simple, straightforward 
URL that has %20's in it and it downloads just fine.  My apologies for the bum steer.

Fred Holmes


Microsoft Windows 2000 [Version 5.00.2195]
(C) Copyright 1985-2000 Microsoft Corp.

C:\Documents and Settings\fholmes\DesktopWGET http://www.dau.mil/pubs/glossary/
11th%20Glossary%202003.pdf
--15:54:09--  http://www.dau.mil/pubs/glossary/11th%20Glossary%202003.pdf
   = `11th Glossary 2003.pdf'
Resolving www.dau.mil... 128.190.170.224
Connecting to www.dau.mil[128.190.170.224]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 782,785 [application/pdf]

100%[] 782,785   79.13K/sETA 00:00

15:54:17 (97.62 KB/s) - `11th Glossary 2003.pdf' saved [782785/782785]


C:\Documents and Settings\fholmes\Desktop