wget 1.11 alpha1 - content disposition filename

2006-06-17 Thread Jochen Roderburg

Hi,

I was happy to see that a long missed future was now implemented in this alpha,
namely the interpretaion of the filename in the content dispostion header.
Just recently I had hacked a little script together to achieve this, when I
wanted to download a greater number of files where this was used  ;-)

I had a few cases, however, which did not come out as expected, but I think the
error is this time in the sending web application and not in wget.

E.g, a file which was supposed to have the name B&W.txt came with the header:
Content-Disposition: attachment; filename=B&W.txt;
All programs I tried (the new wget and several browsers and my own script ;-)
seemed to stop parsing at the first semicolon and produced the filename B&.

Any thoughts ??

Best Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany



wget 1.11 alpha1 - bug with timestamping option

2006-06-17 Thread Jochen Roderburg

Hi,

I have tried out the wget alpha under Linux and found that the timestamping
option (which I usually have defined) does not work correctly.

First thing I saw, that on *every* download I got a line
   Remote file is newer, retrieving.
in the output, even when there was no local file.
That looked like a cosmetic issue only, but further tests show that more things
were going wrong.


First test run, wgetrc disabled for test, local file not present before:

wget.111 -d -N http://www.uni-koeln.de

Setting --timestamping (timestamping) to 1
DEBUG output created by Wget 1.11-alpha-1 on linux-gnu.

--20:46:41--  http://www.uni-koeln.de/
Resolving www.uni-koeln.de... 134.95.19.39
Caching www.uni-koeln.de => 134.95.19.39
Connecting to www.uni-koeln.de|134.95.19.39|:80... connected.
Created socket 3.
Releasing 0x08086440 (new refcount 1).

---request begin---
HEAD / HTTP/1.0
User-Agent: Wget/1.11-alpha-1
Accept: */*
Host: www.uni-koeln.de
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Sat, 17 Jun 2006 18:46:41 GMT
Server: Apache/2.0.52
Last-Modified: Wed, 14 Jun 2006 06:47:06 GMT
Accept-Ranges: bytes
Content-Type: text/html; charset=iso-8859-1
Connection: close

---response end---
200 OK
hs->local_file is: index.html (not existing)
TEXTHTML is on.
Length: unspecified [text/html]
Closed fd 3
Remote file is newer, retrieving.

--20:46:41--  http://www.uni-koeln.de/
Found www.uni-koeln.de in host_name_addresses_map (0x8086440)
Connecting to www.uni-koeln.de|134.95.19.39|:80... connected.
Created socket 3.
Releasing 0x08086440 (new refcount 1).

---request begin---
GET / HTTP/1.0
User-Agent: Wget/1.11-alpha-1
Accept: */*
Host: www.uni-koeln.de
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Sat, 17 Jun 2006 18:46:42 GMT
Server: Apache/2.0.52
Last-Modified: Wed, 14 Jun 2006 06:47:06 GMT
Accept-Ranges: bytes
Content-Type: text/html; charset=iso-8859-1
Connection: close

---response end---
200 OK
hs->local_file is: index.html (not existing)
TEXTHTML is on.
Length: unspecified [text/html]
Saving to: `index.html'

[   <=>
] 20,703  2
7.0K/s   in 0.7s

Closed fd 3
20:46:43 (27.0 KB/s) - `index.html' saved [20703]
[EMAIL PROTECTED]:~
bash 722 > cat wget111-1.log
wget.111 -d -N http://www.uni-koeln.de

Setting --timestamping (timestamping) to 1
DEBUG output created by Wget 1.11-alpha-1 on linux-gnu.

--20:46:41--  http://www.uni-koeln.de/
Resolving www.uni-koeln.de... 134.95.19.39
Caching www.uni-koeln.de => 134.95.19.39
Connecting to www.uni-koeln.de|134.95.19.39|:80... connected.
Created socket 3.
Releasing 0x08086440 (new refcount 1).

---request begin---
HEAD / HTTP/1.0
User-Agent: Wget/1.11-alpha-1
Accept: */*
Host: www.uni-koeln.de
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Sat, 17 Jun 2006 18:46:41 GMT
Server: Apache/2.0.52
Last-Modified: Wed, 14 Jun 2006 06:47:06 GMT
Accept-Ranges: bytes
Content-Type: text/html; charset=iso-8859-1
Connection: close

---response end---
200 OK
hs->local_file is: index.html (not existing)
TEXTHTML is on.
Length: unspecified [text/html]
Closed fd 3
Remote file is newer, retrieving.

--20:46:41--  http://www.uni-koeln.de/
Found www.uni-koeln.de in host_name_addresses_map (0x8086440)
Connecting to www.uni-koeln.de|134.95.19.39|:80... connected.
Created socket 3.
Releasing 0x08086440 (new refcount 1).

---request begin---
GET / HTTP/1.0
User-Agent: Wget/1.11-alpha-1
Accept: */*
Host: www.uni-koeln.de
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Sat, 17 Jun 2006 18:46:42 GMT
Server: Apache/2.0.52
Last-Modified: Wed, 14 Jun 2006 06:47:06 GMT
Accept-Ranges: bytes
Content-Type: text/html; charset=iso-8859-1
Connection: close

---response end---
200 OK
hs->local_file is: index.html (not existing)
TEXTHTML is on.
Length: unspecified [text/html]
Saving to: `index.html'

[   <=>
] 20,703  27.0K/s   in 0.7s

Closed fd 3
20:46:43 (27.0 KB/s) - `index.html' saved [20703]


Old version just does a HTTP GET in this case and write to the local file.

Here I see, it does a HTTP HEAD first, *then* says:
   local_file is: index.html (not existing)
which is correct.
Then it says:
   Remote file is newer, retrieving.
which is questionable, as there is no local file yet for comparison.
Then it does a HTTP get and saves the file, which is correct.


When I do the same request again (now with the file existing local) I get:


wget.111 -d -N http://www.uni-koeln.de

Setting --timestamping (timestamping) to 1
DEBUG output created by Wget 1.11-alpha-1 on linux-gnu.

--20:49:01--  http://www.uni-koeln.de/
Resolving www.uni-koeln.de... 134.95.19.39
Caching www.uni-koeln.de => 134.95.19.39
Connecting to www.uni-koeln.de|134.95.19.39|:80... connected.
Created socket 3.
Releasing 0x0

Documentation (manpage) "bug"

2006-06-17 Thread Linda Walsh

FYI:

On the manpage, where it talks about "no-proxy", the manpage
says:
--no-proxy
  Don't use proxies, even if the appropriate *_proxy environment
  variable is defined.

  For more information about the use of proxies with Wget,
   ^
  -Q quota

Note -- the sentence referring to "more information about the use of
proxies" stops in the middle of saying anything and starts with "-Q quota".