Re: rapidshare download problem

2008-07-21 Thread Jochen Roderburg

Zitat von Doruk Fisek [EMAIL PROTECTED]:


Thu, 17 Jul 2008 15:07:18 -0700, Micah Cowan [EMAIL PROTECTED] :


Then, please provide the logs from both wget 1.10.2 and wget 1.11.4
(with --auth-no-challenge), with the --debug flag.

I attached the logs you requested.

wget 1.10.2 didn't recognize the --auth-no-challenge parameter, so I
only used it in 1.11.4



Looks like the --auth-no-challenge option does not work correctly with  
the http://username:[EMAIL PROTECTED]/syntax.


When you put username/password in separate parameters it should work:
--http-user=username --http-passwd=password http://rs60tl.rapidshare.com/
It *does* work for me in this form with other servers  ;-)

Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany



This message was sent using IMP, the Internet Messaging Program.



Re: Release: GNU Wget 1.11

2008-01-27 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:


 It gives me great pleasure to announce the release of

   GNU Wget 1.11

 It's been over two years since the last release, 1.10.2, but we've made
 it. (Thanks mainly to the efforts of Wget's previous maintainer, Mauro
 Tortonesi - thanks Mauro!)


Hurrah, hurrah,  congratulations  ;-)

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany



Re: wget 1.11

2008-01-21 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:

  Just curious: what is now holding back a release of the 1.11 version?

 *sigh*, just waiting on the disclaimer from my employer. I actually
 really expect to get that finished up this week, though. I'm told that
 the company lawyer has already approved it, but they need to get that
 approval back in writing, so it should be _really_ soon.

Ah, that is now a type of reason I had not thought of  ;-)


  And I have already again a number of my strange cases which I did not
 want to
  report during the last minutes before the anticipated release  ;-)

 Oh noes!... well, better sock 'em to me. Better to know about them than
 to remain ignorant, at any rate. The worst that can happen is I decide
 to punt 'em until a 1.11 patch release, or later: and if they're really
 _really_ awful (which seems relatively unlikely at this point, but...),
 then I'll want to fix them ASAP before the release.

Hmm, these are really strange cases with some really strange hosts only.
Involved are content-disposition (again ;-) and redirection. Often wget somehow
loses the Content-Length information and keeps on downloading ad infinitum.
I did not yet try to research it in detail, but help me with
no-content-disposition and -O to set the file-name, which always works.

Best Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany


wget 1.11

2008-01-20 Thread Jochen Roderburg

Hi Michael,

Just curious: what is now holding back a release of the 1.11 version?

I have not seen any changes in the source repository for a month.

And I have already again a number of my strange cases which I did not want to
report during the last minutes before the anticipated release  ;-)

Best Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany


Re: how do I download a/this URL that redirects at the server side?

2007-12-25 Thread Jochen Roderburg
Zitat von [EMAIL PROTECTED] [EMAIL PROTECTED]:

  The windows port, wget interlog one, returned
 ..
 Connecting to www.theregister.co.uk:80... connected!
 HTTP request sent, awaiting response... 404 Not Found
 02:32:09 ERROR 404: Not Found.

 I guess the windows port doesn`t deal with 301 error or something


As always, a full output with -d option would help here.
What do you mean e.g. with wget interlog one  ?

My Windows version works fine with that URL:

C:\ wget -d http://www.theregister.co.uk/content/4/23517.html
DEBUG output created by Wget 1.10.2 on Windows.

--18:49:24--  http://www.theregister.co.uk/content/4/23517.html
   = `23517.html'
Resolving www.theregister.co.uk... seconds 0.00, 212.100.234.54
Caching www.theregister.co.uk = 212.100.234.54
Connecting to www.theregister.co.uk|212.100.234.54|:80... seconds 0.00, connecte
d.
Created socket 760.
Releasing 0x008b4ba0 (new refcount 1).

---request begin---
GET /content/4/23517.html HTTP/1.0
User-Agent: Wget/1.10.2
Accept: */*
Host: www.theregister.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 301 Moved Permanently
Date: Tue, 25 Dec 2007 17:49:32 GMT
Location: http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f
bi/
Cache-Control: max-age=1800
Expires: Tue, 25 Dec 2007 18:19:32 GMT
Content-Length: 378
Connection: close
Content-Type: text/html; charset=iso-8859-1

---response end---
301 Moved Permanently
Location: http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f
bi/ [following]
Closed fd 760
--18:49:25--  http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_
by_fbi/
   = `index.html'
Found www.theregister.co.uk in host_name_addresses_map (008B4BA0)
Connecting to www.theregister.co.uk|212.100.234.54|:80... seconds 0.00, connecte
d.
Created socket 760.
Releasing 0x008b4ba0 (new refcount 1).

---request begin---
GET /2001/12/31/winxp_hole_misrepresented_by_fbi/ HTTP/1.0
User-Agent: Wget/1.10.2
Accept: */*
Host: www.theregister.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Tue, 25 Dec 2007 17:49:32 GMT
Server: Apache/2.0.54 (Debian GNU/Linux)
Accept-Ranges: bytes
Cache-Control: max-age=1800
Expires: Tue, 25 Dec 2007 18:19:32 GMT
Vary: Accept-Encoding,User-Agent
Connection: close
Content-Type: text/html

---response end---
200 OK
Length: unspecified [text/html]

[ = ] 27.556--.--K/s

Closed fd 760
18:49:25 (353.44 KB/s) - `index.html' saved [27556]


Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany


Re: how do I download a/this URL that redirects at the server side?

2007-12-25 Thread Jochen Roderburg
Zitat von Jochen Roderburg [EMAIL PROTECTED]:

 What do you mean e.g. with wget interlog one  ?

Hmm, googled for wget interlog and found a veery old Windows version 1.5.3
from 1999 there, which indeed gets a 404 Error from your host.

I think the server does not like the request header
Host: www.theregister.co.uk:80
with port number which is sent by this version.

You can get a current Windows version on
http://www.christopherlewis.com/WGet/default.htm

Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany


Re: Using wget through FTP proxy server

2007-10-25 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:

 There are two bugs here, actually: one is lack of support for
 authentication with FTP proxies (feature request), the other is that
 it's not clearly documented that proxy-user/proxy-password only apply to
 HTTP proxies.

 Can anyone tell me offhand, does Wget currently support the use of HTTP
 proxies for fetching FTP resources (GET ftp://foo/ HTTP/1.1)? It looks
 to me like the answer is no; if so, that should be filed as well.

This is in wget for long time and working well.

These sometimes called FTP-Proxies are not explicitly supported.
There also seem to exist various schemes how the authentication information for
the proxy and the ftp behind it is transferred, some combinations of USER, PASS
and AUTH, some of them can even be done with wget and a suitable crafted URL. I
think after login the FTP-client does a normal FTP session with the FTP-proxy.

Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany



Re: Myriad merges

2007-10-14 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Micah Cowan wrote:
  Jochen Roderburg wrote:
  Unfortunately, however, a new regression crept in:
  In the case timestamping=on, content-disposition=off, no local file
 present it
  does now no HEAD (correctly), but two (!!) GETS and transfers the file two
  times.
 
  Ha! Okay, gotta get that one fixed...

 That should now be fixed.

 It's hard to be confident I'm not introducing more issues, with the
 state of http.c being what it is. So please beat on it! :)

This time it survived the beating  ;-)
Seems that we are finally converging. The double GET is gone, and my other test
cases still work as expected, including the -c variants.

 One issue I'm still aware of is that, if -c and -e
 contentdisposition=yes are specified for a file already fully
 downloaded, HEAD will be sent for the contentdisposition, and yet a GET
 will still be sent to fetch the remainder of the -c (resulting in a 416
 Requested Range Not Satisfiable). Ideally, Wget should be smart enough
 to see from the HEAD that the Content-Length already matches the file's
 size, even though -c no longer requires a HEAD (again). We _got_ one, we
 should put it to good use.

 However, I'm not worried about addressing this before 1.11 releases;
 it's a minor complaint, and with content-disposition's current
 implementation, users are already going to be expecting an extra HEAD
 round-trip in the general case; what's a few extra?

Agreed. I can confirm this behaviour, too. And I would also consider this a
minor issue, at least the result is correct.

I have also not made many tests where content-disposition is really used for the
filename. Those few real-live cases that I have at hand do not send any
special headers like timestamnps and filelengths with it. At least the local
filename is set correctly and is correctly renamed if it exists.

Best regards and thanks again for the repair of all the issues that I found,

Jochen Roderburg



Re: Myriad merges

2007-10-07 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Jochen Roderburg wrote:
  Zitat von Micah Cowan [EMAIL PROTECTED]:
 
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA256
 
  Jochen Roderburg wrote:
  Yes, this one is still open, and the other one that wget -c always
 starts
  at 0
  again.
  Do you mean the (local 0) thing? That should have been fixed in
  674cc935f7c8 [subversion r2382]. Can you re-check?
 
  No, that is ok now.
  I saw my little patch for this included as of this weekend ;-)
 
  The one I mean is: wget -c continuation is not done in the HEADless
 cases
 
  http://www.mail-archive.com/wget%40sunsite.dk/msg10265.html   ff.

 This should be fixed now, along with the timestamping issues.


And now the test results of this weekend  ;-)

First the good news, the recent problems *are* fixed now, namely:

In the case with default options (timestamping=off, content-disposition=off) we
have now:

The timestamps on the downloaded files are set correctly.
Continued HTTP transfer (wget -c) is done correctly.

Unfortunately, however, a new regression crept in:
In the case timestamping=on, content-disposition=off, no local file present it
does now no HEAD (correctly), but two (!!) GETS and transfers the file two
times.

All other combinations of these options and conditions are OK.

Best regards,
J.Roderburg



Re: Myriad merges

2007-10-01 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Jochen Roderburg wrote:
  And now, for a change, a case, that works now (better)  ;-)
 
  This is an example where a HEAD request gets a 500 Error response.
 
  Wget default options again, but contentdisposition=yes to force a HEAD.
 
 
  wget.111-svn-0709 --debug -e contentdisposition = yes
  http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109
 
  Setting contentdisposition (contentdisposition) to yes
  DEBUG output created by Wget 1.10+devel on linux-gnu.
 
  --15:26:54--
 http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109
  Resolving www.eudora.com... 199.106.114.30
  Caching www.eudora.com = 199.106.114.30
  Connecting to www.eudora.com|199.106.114.30|:80... connected.
  Created socket 3.
  Releasing 0x080888d8 (new refcount 1).
 
  ---request begin---
  HEAD /cgi-bin/export.cgi?productid=EUDORA_win_7109 HTTP/1.0
  User-Agent: Wget/1.10+devel
  Accept: */*
  Host: www.eudora.com
  Connection: Keep-Alive
 
  ---request end---
  HTTP request sent, awaiting response...
  ---response begin---
  HTTP/1.1 500 Server Error
  Server: Netscape-Enterprise/6.0
  Date: Mon, 03 Sep 2007 13:26:54 GMT
  Content-length: 305
  Content-type: text/html
  Connection: keep-alive
 
  ---response end---
  500 Server Error
  Registered socket 3 for persistent reuse.
  --15:26:56--  (try: 2)
  http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109
  Disabling further reuse of socket 3.
  Closed fd 3
  Found www.eudora.com in host_name_addresses_map (0x80888d8)
  Connecting to www.eudora.com|199.106.114.30|:80... connected.
  Created socket 3.
  Releasing 0x080888d8 (new refcount 1).
 
  ---request begin---
  GET /cgi-bin/export.cgi?productid=EUDORA_win_7109 HTTP/1.0
  User-Agent: Wget/1.10+devel
  Accept: */*
  Host: www.eudora.com
  Connection: Keep-Alive
 
  ---request end---
  HTTP request sent, awaiting response...
  ---response begin---
  HTTP/1.1 302 Moved Temporarily
  Server: Netscape-Enterprise/6.0
  Date: Mon, 03 Sep 2007 13:26:55 GMT
  Location:
 http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe
  Content-length: 0
  Connection: keep-alive
 
  ---response end---
  302 Moved Temporarily
  Registered socket 3 for persistent reuse.
  Location:
 http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe
  [following]
  Skipping 0 bytes of body: [] done.
  --15:26:56--
  http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe
  Reusing existing connection to www.eudora.com:80.
  Reusing fd 3.
 
  ---request begin---
  HEAD /download/eudora/windows/7.1/Eudora_7.1.0.9.exe HTTP/1.0
  User-Agent: Wget/1.10+devel
  Accept: */*
  Host: www.eudora.com
  Connection: Keep-Alive
 
  ---request end---
  HTTP request sent, awaiting response...
  ---response begin---
  HTTP/1.1 200 OK
  Server: Netscape-Enterprise/6.0
  Date: Mon, 03 Sep 2007 13:26:56 GMT
  Content-type: application/octet-stream
  Last-modified: Thu, 05 Oct 2006 18:45:18 GMT
  Content-length: 17416184
  Accept-ranges: bytes
  Connection: keep-alive
 
  ---response end---
  200 OK
  Length: 17416184 (17M) [application/octet-stream]
  --15:26:56--
  http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe
  Reusing existing connection to www.eudora.com:80.
  Reusing fd 3.
 
  ---request begin---
  GET /download/eudora/windows/7.1/Eudora_7.1.0.9.exe HTTP/1.0
  User-Agent: Wget/1.10+devel
  Accept: */*
  Host: www.eudora.com
  Connection: Keep-Alive
 
  ---request end---
  HTTP request sent, awaiting response...
  ---response begin---
  HTTP/1.1 200 OK
  Server: Netscape-Enterprise/6.0
  Date: Mon, 03 Sep 2007 13:26:56 GMT
  Content-type: application/octet-stream
  Last-modified: Thu, 05 Oct 2006 18:45:18 GMT
  Content-length: 17416184
  Accept-ranges: bytes
  Connection: keep-alive
 
  ---response end---
  200 OK
  Length: 17416184 (17M) [application/octet-stream]
  Saving to: `Eudora_7.1.0.9.exe'
 
  100%[=] 17,416,184
 397K/s
  in 44s
 
  15:27:40 (386 KB/s) - `Eudora_7.1.0.9.exe' saved [17416184/17416184]
 
 
  ls -l Eudora_7.1.0.9.exe
  -rw-r- 1 a0045 RRZK 17416184 05.10.2006 20:45 Eudora_7.1.0.9.exe
 
 
  This seems also to use the only available source for the timestamp, the
 response
  to the GET request.

 Sorry to reproduce that in full, but I thought it might be helpful to
 see the full transcript again, since you sent this a while ago.

 I was going back through this thread to refresh my memory on some
 things. I noticed, and wanted to point out, that actually, the GET
 request was _not_ the only available source for the timestamp; HEAD was
 answered with a 500, but only the first one. The HEAD issued after the
 redirect gives a timestamp.

Yes indeed, you are right, I overlooked the second HEAD after the redirect ;-)

My main message here was of course that the changes regarding the 500 error
response to the HEAD

Re: wget -c wrong progress bar

2007-10-01 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Jochen Roderburg wrote:

 (abbreviated:)

  51% [=  ] 157,962
  5.37K/s   in 3m 45s

  Further examination of such cases showed that a Byte-Range transfer was
  requested by wget, but was not done by the server. On most normal cases
 this
  cannot happen, because wget first examines the headers and does these
 requests
  only when they are advertized by the server. The situation, where this does
  happen nevertheless, is very complicated and rare and envolves local and
 remote
  proxies again.

 Thanks for tracking this down, Jochen.

 Probably not something to worry too much about for 1.11, then; I've
 created the bug report and targeted it for 1.12.


Yes, I agree, it is not a severe error that needs immediate repair  ;-)
As I wrote, the situation where it occurs is very rare and it is more a
cosmetic issue than a real problem. I just happened to see it a few times
recently as one particular server began to develop very slow and bad
connections.

Best regards,
J.Roderburg



Re: Myriad merges

2007-10-01 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:

 The problem you pointed out that causes the failure to properly
 timestamp when HEADs aren't issued seems, to my reading, to be simply
 regressable for the fix. Mauro's fixes don't look as if they depend upon
 that line being there, but I'm waiting for him to have a chance to look
 over it before I commit to that as the fix (both he and I have been busy
 lately).

Yes, this one is still open, and the other one that wget -c always starts at 0
again.

On the other hand, with the combination of options that I usually use in my
daily wget practice (timestampng and content-disposition on) everything works
fine now  ;-)

 I've also got trying to deal with content-disposition issues for when
 HEAD fails, on my todo list.

I have not done real-life tests with content-disposition cases, but I have also
some feeling that not all combination with other options (like timestamping and
continuation) work with these yet. These may be minor issues again, as usually
content-disposition is used when the contents are generated somehow dynmically
and there are no static timestamps and filelengths at all.

Best regards,
J. Roderburg




Re: Myriad merges

2007-10-01 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Jochen Roderburg wrote:
  Yes, this one is still open, and the other one that wget -c always starts
 at 0
  again.

 Do you mean the (local 0) thing? That should have been fixed in
 674cc935f7c8 [subversion r2382]. Can you re-check?

No, that is ok now.
I saw my little patch for this included as of this weekend ;-)

The one I mean is: wget -c continuation is not done in the HEADless cases.

http://www.mail-archive.com/wget%40sunsite.dk/msg10265.html   ff.

Regards,
J.Roderburg



wget -c wrong progress bar

2007-09-23 Thread Jochen Roderburg

Hi,

Now I have eventually found an example again for the wrong progress bar output
when continued output is started again at the beginning. It isn't exactly the
case I had seen before with internal restarts, actually it is even simpler and
the wrong output is more obvious.

First the protocol:

wget -c http://www.somehost.com/somefile.jpg

--18:58:58--  http://www.somehost.com/somefile.jpg
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:3128... connected.
Proxy request sent, awaiting response... 200 OK
Length: 157962 (154K) [image/jpeg]
The sizes do not match (local 150780) -- retrieving.

--18:58:59--  http://www.somehost.com/somefile.jpg
Connecting to localhost|127.0.0.1|:3128... connected.
Proxy request sent, awaiting response... 200 OK
Length: 157962 (154K) [image/jpeg]
Saving to: `somefile.jpg'

51% [=  ] 157,962
5.37K/s   in 3m 45s

19:02:53 (702 B/s) - `somefile.jpg' saved [157962/157962]

One can see that it had only a rest of about 7K to transfer but started at the
beginning again. The file is transferred correctly, but the output looks
weird. it is neither the correct output for a continued transfer nor the
correct output for a complete transfer.

Further examination of such cases showed that a Byte-Range transfer was
requested by wget, but was not done by the server. On most normal cases this
cannot happen, because wget first examines the headers and does these requests
only when they are advertized by the server. The situation, where this does
happen nevertheless, is very complicated and rare and envolves local and remote
proxies again. The server here seems to do some load-balancing with several
hosts (multiple IP-numbers per DNS) and when also a local proxy is in between,
it can happen that every HTTP-request lands on a different host with possibly
different behaviour and it can happen that we have another case of a
discrepancy between the result of a HEAD and a later GET.

Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany



Re: wget -c problem with current svn version

2007-09-15 Thread Jochen Roderburg
Zitat von Jochen Roderburg [EMAIL PROTECTED]:


 Continued download (wget -c) is not done in the current svn version with
 default
 options (where no HEAD is used). The download starts instead at byte 0 again.
 When other options require a HEAD, it works ok again.

Another astonishing test result:

With wget -c -O file URL continuation works fine on the -O file (!!!), it even
makes a timestamp on that file. I think this raises several questions   ;-)

First, I think, -c should also be in the family of options which are not
compatible with -O (where we already have -r, -p, -N).

Second, it could give hints where the problem with -c lies.

Actually I can understand in the code what happens, but i do not understand the
intended logic and cannot correct it, that is now really for Mauro and Micah.
There is a variable got_name in http.c which seems to be used for different
purposes. One usage is as indicator that -O is used and the other has something
to do with the -c logic. I see also a conflict between older changes by Mauro
and the latest changes by Micah in this area.

Interesting code snippets:

http.c, line 2143 ff.

  /* Decide whether or not to restart.  */
  if (opt.always_rest
   got_name
   stat (hstat.local_file, st) == 0
   S_ISREG (st.st_mode))
/* When -c is used, continue from on-disk size.  (Can't use
   hstat.len even if count1 because we don't want a failed
   first attempt to clobber existing data.)  */
hstat.restval = st.st_size;

http.c, line 2634 ff.

  if (send_head_first)
{
  got_name = true;
  restart_loop = true;
}

in an older version this was

  if (opt.always_rest)
{
  got_name = true;
  restart_loop = true;
}


Regards, J.Roderburg




Re: Myriad merges

2007-09-13 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:

  And the only other code I found which parses the remote date is in the part
  which handles the logic around the timestamping option. In older versions
 this
  was a conditional block starting with  if (!got_head) ...  , now it starts
 with
   if (send_head_first  !got_head) ...   Could this mean that this code is
 now
  only executed when a HEAD response is examined ??

 Hm... that change came from the Content-Disposition fixes. I'll investigate.


OK, but I hope I am still allowed to help a little with the investigation  ;-)

I made a few more tests and some debugging now and I am convinced now that this
if send_head_first is definitely the immediate cause for the new problem
that the remote timestamp is not picked up on GET-only requests.

This change is relatively new, it had not been in the next-to-last svn version
that I compiled a month ago. Certainly there must have been a reason for this
but one sure side effect is that this if-block of code is not executed any
longer for the HEAD-less case. Btw, continued downloads (wget -c) are also
broken now in this case (probably for the same reason).

I meanwhile also believe that the primary issue we are trying to repair (first
found remote time-stamp is used for local and not last found) has always been
there. Only a year ago when the contentdisposition stuff was included and more
HEAD requests were made I really noticed it. I remember that it had always been
more difficult to get a newer file downloaded through the proxy-cache when a
local file was present, but as these cases were rare, I had never tried to
investigate this before  ;-)

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany



Re: Myriad merges

2007-09-13 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:

  Btw, continued downloads (wget -c) are also
  broken now in this case (probably for the same reason).

 Really? I've been using this Wget version for a bit, and haven't noticed
 this problem. Could you give an invocation that produces this problem?


I'll make a new thread for this problem, as it meanwhile looks like a different
case again   ;-)

J.Roderburg



Re: Myriad merges

2007-09-07 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:

  Zitat von Jochen Roderburg [EMAIL PROTECTED]:
 
  So it looks now to me, that the new error (local timestamp not set to
 remote)
  only occurs in the cases when no HEAD is used.
 
  This (new) piece of code in http.c (line 2666 ff.) looks very suspicious to
 me,
  especially the time_came_from_head bit:
 
/* Reparse time header, in case it's changed. */
if (time_came_from_head
 hstat.remote_time  hstat.remote_time[0])
  {
newtmr = http_atotm (hstat.remote_time);
if (newtmr != -1)
  tmr = newtmr;
  }

 The intent behind this code is to ensure that we parse the Last-Modified
 date again, even if we already parsed Last-Modified, if the last one we
 parsed came from the HEAD.

Hmm, yes, but that is not what it does  ;-)

I mean, it does not parse the date again even if it was already parsed, but
only if it was already parsed. So especially it does *not* parse it if there
had been no HEAD at all before.

And the only other code I found which parses the remote date is in the part
which handles the logic around the timestamping option. In older versions this
was a conditional block starting with  if (!got_head) ...  , now it starts with
 if (send_head_first  !got_head) ...   Could this mean that this code is now
only executed when a HEAD response is examined ??

Anyway, I think everything is ok again when you just eliminate this
time_came_from_head logic completely. The above piece of code then just sets
the local timestamp to the last remote timestamp which was seen and does not
care from which HEAD or GET requests it actually came.

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany



Re: Myriad merges

2007-09-06 Thread Jochen Roderburg
Zitat von Jochen Roderburg [EMAIL PROTECTED]:

 So it looks now to me, that the new error (local timestamp not set to remote)
 only occurs in the cases when no HEAD is used.

This (new) piece of code in http.c (line 2666 ff.) looks very suspicious to me,
especially the time_came_from_head bit:

  /* Reparse time header, in case it's changed. */
  if (time_came_from_head
   hstat.remote_time  hstat.remote_time[0])
{
  newtmr = http_atotm (hstat.remote_time);
  if (newtmr != -1)
tmr = newtmr;
}

Other than that I have used the current svn version now a few days more with all
my work and I would say all the issues that had bothered me in the recent
development cycles are corrected now.
I'll see, however, that I can make a few more systematic tests with some
combination of the relevant options which I usually do not use in my practice.

What I have seen new are some cosmetic issues in the program output when HTTP
restarts happen. Such restarts are normally rare these days, but I have some
sites far away where suddenly bad connections and timeouts reappeared. One
looks pretty simple, I think I can prepare a patch myself on the weekend when I
have access to my Linux development system at home again. I'll report details
in separate mail later, when I have examples for the cases.

Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany



Re: Myriad merges

2007-09-03 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:

 Hm, that should not be. It should definitely set the timestamp if it
 gets downloaded... I'll investigate.

 OOC, was there a specific resource you tested against (just in case I
 have difficulty reproducing)?


Not a very specific one, just used our university homepage for this test ;-)

Here a full protocol:


ls -l index.html

ls: cannot access index.html: No such file or directory


HEAD http://www.uni-koeln.de/index.html

200 OK
Connection: close
Date: Mon, 03 Sep 2007 11:44:59 GMT
Accept-Ranges: bytes
Server: Apache/2.0.59
Content-Language: de
Content-Type: text/html
Last-Modified: Mon, 03 Sep 2007 11:04:09 GMT
Client-Date: Mon, 03 Sep 2007 11:44:59 GMT
Client-Response-Num: 1


wget.111-svn-0709 --debug http://www.uni-koeln.de/index.html

DEBUG output created by Wget 1.10+devel on linux-gnu.

--13:45:12--  http://www.uni-koeln.de/index.html
Resolving www.uni-koeln.de... 134.95.19.39
Caching www.uni-koeln.de = 134.95.19.39
Connecting to www.uni-koeln.de|134.95.19.39|:80... connected.
Created socket 3.
Releasing 0x08088820 (new refcount 1).

---request begin---
GET /index.html HTTP/1.0
User-Agent: Wget/1.10+devel
Accept: */*
Host: www.uni-koeln.de
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Mon, 03 Sep 2007 11:45:12 GMT
Server: Apache/2.0.59
Last-Modified: Mon, 03 Sep 2007 11:04:09 GMT
Accept-Ranges: bytes
Content-Type: text/html
Content-Language: de
Connection: close

---response end---
200 OK
Length: unspecified [text/html]
Saving to: `index.html'

[ =  ] 9,131   --.-K/s  
in 0s

Closed fd 3
13:45:12 (207 MB/s) - `index.html' saved [9131]

ls -l index.html

-rw-r- 1 a0045 RRZK 9131 03.09.2007 13:45 index.html

date

Mon Sep  3 13:45:24 CEST 2007


Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany



Re: Myriad merges

2007-09-03 Thread Jochen Roderburg

And now, for a change, a case, that works now (better)  ;-)

This is an example where a HEAD request gets a 500 Error response.

Wget default options again, but contentdisposition=yes to force a HEAD.


wget.111-svn-0709 --debug -e contentdisposition = yes
http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109

Setting contentdisposition (contentdisposition) to yes
DEBUG output created by Wget 1.10+devel on linux-gnu.

--15:26:54--  http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109
Resolving www.eudora.com... 199.106.114.30
Caching www.eudora.com = 199.106.114.30
Connecting to www.eudora.com|199.106.114.30|:80... connected.
Created socket 3.
Releasing 0x080888d8 (new refcount 1).

---request begin---
HEAD /cgi-bin/export.cgi?productid=EUDORA_win_7109 HTTP/1.0
User-Agent: Wget/1.10+devel
Accept: */*
Host: www.eudora.com
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 500 Server Error
Server: Netscape-Enterprise/6.0
Date: Mon, 03 Sep 2007 13:26:54 GMT
Content-length: 305
Content-type: text/html
Connection: keep-alive

---response end---
500 Server Error
Registered socket 3 for persistent reuse.
--15:26:56--  (try: 2) 
http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109
Disabling further reuse of socket 3.
Closed fd 3
Found www.eudora.com in host_name_addresses_map (0x80888d8)
Connecting to www.eudora.com|199.106.114.30|:80... connected.
Created socket 3.
Releasing 0x080888d8 (new refcount 1).

---request begin---
GET /cgi-bin/export.cgi?productid=EUDORA_win_7109 HTTP/1.0
User-Agent: Wget/1.10+devel
Accept: */*
Host: www.eudora.com
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 302 Moved Temporarily
Server: Netscape-Enterprise/6.0
Date: Mon, 03 Sep 2007 13:26:55 GMT
Location: http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe
Content-length: 0
Connection: keep-alive

---response end---
302 Moved Temporarily
Registered socket 3 for persistent reuse.
Location: http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe
[following]
Skipping 0 bytes of body: [] done.
--15:26:56-- 
http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe
Reusing existing connection to www.eudora.com:80.
Reusing fd 3.

---request begin---
HEAD /download/eudora/windows/7.1/Eudora_7.1.0.9.exe HTTP/1.0
User-Agent: Wget/1.10+devel
Accept: */*
Host: www.eudora.com
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Server: Netscape-Enterprise/6.0
Date: Mon, 03 Sep 2007 13:26:56 GMT
Content-type: application/octet-stream
Last-modified: Thu, 05 Oct 2006 18:45:18 GMT
Content-length: 17416184
Accept-ranges: bytes
Connection: keep-alive

---response end---
200 OK
Length: 17416184 (17M) [application/octet-stream]
--15:26:56-- 
http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe
Reusing existing connection to www.eudora.com:80.
Reusing fd 3.

---request begin---
GET /download/eudora/windows/7.1/Eudora_7.1.0.9.exe HTTP/1.0
User-Agent: Wget/1.10+devel
Accept: */*
Host: www.eudora.com
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Server: Netscape-Enterprise/6.0
Date: Mon, 03 Sep 2007 13:26:56 GMT
Content-type: application/octet-stream
Last-modified: Thu, 05 Oct 2006 18:45:18 GMT
Content-length: 17416184
Accept-ranges: bytes
Connection: keep-alive

---response end---
200 OK
Length: 17416184 (17M) [application/octet-stream]
Saving to: `Eudora_7.1.0.9.exe'

100%[=] 17,416,184   397K/s  
in 44s

15:27:40 (386 KB/s) - `Eudora_7.1.0.9.exe' saved [17416184/17416184]


ls -l Eudora_7.1.0.9.exe
-rw-r- 1 a0045 RRZK 17416184 05.10.2006 20:45 Eudora_7.1.0.9.exe


This seems also to use the only available source for the timestamp, the response
to the GET request.

Best regards, J.Roderburg



Re: Myriad merges

2007-09-03 Thread Jochen Roderburg

And now, finally, the ultimate real-life test with proxy-cache, timestamping and
contentdisposition, where HEAD and GET have different timestamps.

And this is perfectly correct now !

So it looks now to me, that the new error (local timestamp not set to remote)
only occurs in the cases when no HEAD is used.

Best regards,  J.Roderburg


HEAD -p http://wwwcache.uni-koeln.de:8080
http://download.lavasoft.com/public/core.zip

200 OK
Date: Thu, 30 Aug 2007 09:31:35 GMT
Accept-Ranges: bytes
Age: 361684
ETag: 3014d-233e3c-cbcb000
Server: Apache/2.0.55 (Ubuntu) mod_ssl/2.0.55 OpenSSL/0.9.8a
Content-Length: 2309692
Content-Type: application/zip
Last-Modified: Mon, 27 Aug 2007 13:08:16 GMT
Client-Date: Mon, 03 Sep 2007 13:59:39 GMT
Client-Response-Num: 1
Proxy-Connection: close
X-Cache: HIT from wwwcache.uni-koeln.de


HEAD http://download.lavasoft.com/public/core.zip

200 OK
Connection: close
Date: Mon, 03 Sep 2007 14:00:48 GMT
Accept-Ranges: bytes
ETag: 3016f-275cfc-f35fc640
Server: Apache/2.0.55 (Ubuntu) mod_ssl/2.0.55 OpenSSL/0.9.8a
Content-Length: 2579708
Content-Type: application/zip
Last-Modified: Mon, 03 Sep 2007 08:28:01 GMT
Client-Date: Mon, 03 Sep 2007 14:00:48 GMT
Client-Response-Num: 1


wget.111-svn-0709 --debug http://download.lavasoft.com/public/core.zip

DEBUG output created by Wget 1.10+devel on linux-gnu.

--16:04:16--  http://download.lavasoft.com/public/core.zip
Resolving wwwcache.uni-koeln.de... 134.95.19.61
Caching wwwcache.uni-koeln.de = 134.95.19.61
Connecting to wwwcache.uni-koeln.de|134.95.19.61|:8080... connected.
Created socket 3.
Releasing 0x080889b8 (new refcount 1).

---request begin---
HEAD http://download.lavasoft.com/public/core.zip HTTP/1.0
User-Agent: Wget/1.10+devel
Accept: */*
Host: download.lavasoft.com

---request end---
Proxy request sent, awaiting response...
---response begin---
HTTP/1.0 200 OK
Date: Thu, 30 Aug 2007 09:31:35 GMT
Server: Apache/2.0.55 (Ubuntu) mod_ssl/2.0.55 OpenSSL/0.9.8a
Last-Modified: Mon, 27 Aug 2007 13:08:16 GMT
ETag: 3014d-233e3c-cbcb000
Accept-Ranges: bytes
Content-Length: 2309692
Content-Type: application/zip
Age: 361961
X-Cache: HIT from wwwcache.uni-koeln.de
Proxy-Connection: close

---response end---
200 OK
Length: 2309692 (2.2M) [application/zip]
Closed fd 3
--16:04:16--  http://download.lavasoft.com/public/core.zip
Found wwwcache.uni-koeln.de in host_name_addresses_map (0x80889b8)
Connecting to wwwcache.uni-koeln.de|134.95.19.61|:8080... connected.
Created socket 3.
Releasing 0x080889b8 (new refcount 1).

---request begin---
GET http://download.lavasoft.com/public/core.zip HTTP/1.0
User-Agent: Wget/1.10+devel
Accept: */*
Host: download.lavasoft.com

---request end---
Proxy request sent, awaiting response...
---response begin---
HTTP/1.0 200 OK
Date: Mon, 03 Sep 2007 14:04:16 GMT
Server: Apache/2.0.55 (Ubuntu) mod_ssl/2.0.55 OpenSSL/0.9.8a
Last-Modified: Mon, 03 Sep 2007 08:28:01 GMT
ETag: 2370002-275cfc-f35fc640
Accept-Ranges: bytes
Content-Length: 2579708
Content-Type: application/zip
X-Cache: MISS from wwwcache.uni-koeln.de
Proxy-Connection: close

---response end---
200 OK
Length: 2579708 (2.5M) [application/zip]
Saving to: `core.zip'

100%[=] 2,579,708155K/s  
in 15s

Closed fd 3
16:04:31 (169 KB/s) - `core.zip' saved [2579708/2579708]


ls -l core.zip

-rw-r- 1 a0045 RRZK 2579708 03.09.2007 10:28 core.zip



Re: Myriad merges

2007-09-02 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:

 I've just merged a bunch of things into the current trunk, including
 Mauro's latest changes related to when HEAD is sent (concerning which he
 recently sent an email). Please feel free to beat on it, and report any
 bugs here!

Ah, finally something to test again  ;-)

The ChangeLogs look interesting, all the issues I had seem to be repaired.

I did only a few first tests now, because the basic test already had a problem:
With default options the local timestamps are not set at all.

I still made one series of tests regarding the HEAD/GET logic.

Options:  no spider, no -O, no content-disposition:

no timestamping, no local file   no HEAD   but: local timestamp not set to
remote
no timestamping,local file   no HEAD   but: local timestamp not set to
remote
   timestamping, no local file   no HEAD   but: local timestamp not set to
remote
   timestamping,local file  HEAD   local timestamp set to remote

In these cases the HEAD is now used again only for the case where it is
necessary, but the timestamp ..
One could think that it is now taken only from the HEAD and not from GET.

I'll see what happens in the case where the two are different, this can not
easily be constructed, must way till such a case just comes along  ;-)

Best Regards,
Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany




Re: HEAD request logic summary

2007-08-18 Thread Jochen Roderburg
Zitat von Mauro Tortonesi [EMAIL PROTECTED]:


 here is a table resuming the behaviour of current wget version (soon to be
 1.11) and wget 1.10.2 regarding HTTP HEAD requests. i hope the table will be
 useful to determine whether the currently implemented logic is correct.


Sorry, this huge decision table looks far to complicate for me  ;-)

I think, there are only two cases where an extra HEAD request is necessary in
order to decide if the file should really be downloaded:

1) no -O, contentdisposition=no, timestamping=yes, local file existing
2) no -O, contentdisposition=yes, timestamping=yes

Regards, J.Roderburg



Re: Some test results with current svn version

2007-08-13 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Jochen Roderburg wrote:
  I have tried out again the current wget version from svn to see the
 progress on
  various discussed problems.
 
  Someone recently reported an inability to specify the prefix for libssl.
 
  Hmm, yes, I reported this a year ago  ;-)
 
  That works now again as expected. When I specify
 --with-libssl-prefix=/usr/local
  I get the correct libs in the Makefile(s):

 This has just recently been fixed; it had to do with the fact that we
 were using sh if where we should have been using autoconf AS_IF.
 This has unfortunate interactions with autoconf's mechanisms for
 automated dependency resolution. Sorry I didn't reply to my previous
 message to say so, but I wasn't sure anyone had paid attention to it ;)

And my mail was meant as a confirmation that this fix works  ;-)
(I had seen your previous message about this and saw in the code that something
had been done regarding this issue.)


  I see, however, no difference yet regarding Content-Disposition, despite
 the
  explanations in ChangeLogs and recent mails that there is now an option for
 it
  which is off as default.

 Mauro has just finished some code related to this, so you can try it out
 when that has gone into the trunk. :)


OK, I'll have another look next weekend.

Best Regards,
Jochen Roderburg



Some test results with current svn version

2007-08-12 Thread Jochen Roderburg

I have tried out again the current wget version from svn to see the progress on
various discussed problems.

 Someone recently reported an inability to specify the prefix for libssl.

Hmm, yes, I reported this a year ago  ;-)

That works now again as expected. When I specify --with-libssl-prefix=/usr/local
I get the correct libs in the Makefile(s):

LIBS = -lintl -ldl -lrt  /usr/local/lib/libssl.so /usr/local/lib/libcrypto.so
-Wl,-rpath -Wl,/usr/local/lib


I understand that the HEAD/GET issues are still in discussion and testing, the
current state that I see now is:

no timestamping, no local file   no HEAD
no timestamping,local file   no HEAD
   timestamping, no local file  HEAD
   timestamping,local file  HEAD

(and the file-transfer as such works again).

I see, however, no difference yet regarding Content-Disposition, despite the
explanations in ChangeLogs and recent mails that there is now an option for it
which is off as default.

Best Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany





Re: Manual and --help difference

2007-08-02 Thread Jochen Roderburg

Josh Williams schrieb:

On 8/2/07, dmitry over [EMAIL PROTECTED] wrote:

Hi,

In `man wget`  is see text
---[ cut ]---
 --http-user=user
   --http-password=password
[..]
but in `wget --help` is see

--http-user=USER  set http user to USER.
--http-passwd=PASSset http password to PASS.

check --http-passwd and --http-password and fix it please.


What version of wget are you using? I don't see this problem in 1.10.2
_or_ in the trunk.


These parameter names were changed in version 1.10. And the help outputs 
from all 1.1x versions that I have show the new names. But all these 
wget versions still accept also the old names.


From the NEWS file:

** Wget now supports the --ftp-user and --ftp-password command
switches to set username and password for FTP, and the --user and
--password command switches to set username and password for both FTP
and HTTP.  The --http-passwd and --proxy-passwd command switches have
been renamed to --http-password and --proxy-password respectively, and
the related http_passwd and proxy_passwd .wgetrc commands to
http_password and proxy_password respectively.  The login and passwd
.wgetrc commands have been deprecated.

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany


Re: wget.dotsrc.org going away

2007-07-09 Thread Jochen Roderburg
Zitat von Micah Cowan [EMAIL PROTECTED]:


 Must be just in the README. Anywhere else that anyone knows of, speak up! :)


In my memory it used to be the other way around  ;-)

http://wget.sunsite.dk/ was the primary wget website for many years and nobody
cared about the gnu.org page. And many external references pointed to it.

See also Mauro's last words about it:
http://www.mail-archive.com/wget@sunsite.dk/msg09567.html

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany



Re: wget.dotsrc.org going away

2007-07-09 Thread Jochen Roderburg
 Zitat von Micah Cowan [EMAIL PROTECTED]:

 
  Must be just in the README. Anywhere else that anyone knows of, speak up!
 :)
 

wget.sunsite.dk is also mentioned in the wget FAQ

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany







Fwd: Re: fix: don't send HEAD if -O is given

2007-07-08 Thread Jochen Roderburg

Separately resent to the list because of mistyped address  ;-)

- Weitergeleitete Nachricht von Jochen Roderburg [EMAIL PROTECTED]
-
Datum: Sun, 08 Jul 2007 14:11:24 +0200
Von: Jochen Roderburg [EMAIL PROTECTED]
Antwort an: Jochen Roderburg [EMAIL PROTECTED]
 Betreff: Re: fix: don't send HEAD if -O is given
  An: Mauro Tortonesi [EMAIL PROTECTED]

Zitat von Mauro Tortonesi [EMAIL PROTECTED]:

 i've just committed to the trunk the patch included in attachment, which
 fixes bug #20323:

 https://savannah.gnu.org/bugs/?20323

 reported by Jochen Roderburg:

 http://www.mail-archive.com/wget@sunsite.dk/msg09312.html

 here is the ChangeLog:

 2007-07-04  Mauro Tortonesi  [EMAIL PROTECTED]

* http.c: Skip HEAD request and start immediately with GET if -O is
given.

 --
 Mauro Tortonesi [EMAIL PROTECTED]



Hello Mauro  Micah,

Sorry to report that this patch does not fix the bug, but instead creates new
worse bugs.

I was already a little puzzled about the title of the fix because my original
error report had nothing specifically to do with the -O case. I very rarely
use this option, but I know from numerous discussions here on the list that it
does something different than most people naively understand.


To sum my case up again about the usage of HEAD in the normal mode (no -O):

wget upto 1.10.2

no timestamping, no local file   no HEAD
no timestamping,local file   no HEAD
   timestamping, no local file   no HEAD
   timestamping,local file  HEAD

wget 1.11  svn 04/2007

no timestamping, no local file   no HEAD
no timestamping,local file   no HEAD
   timestamping, no local file  HEAD
   timestamping,local file  HEAD

So the little difference was that this version did a (IMHO) unnecessary HEAD
request in the case timestamping and no local file present. Not a big problem
as such, but a side effect was that it created more cases for the other
timestamp bug that I reported (timestamp for the local file is taken from the
HEAD request and not from the GET request).

Now after the new patch we have:

wget 1.11  svn 07/2007

no timestamping, no local file  HEAD, no file tranferred
no timestamping,local file  HEAD, no file tranferred
   timestamping, no local file  HEAD, no file tranferred
   timestamping,local file  HEAD, no file tranferred

Now it does the HEAD request really always, and the file transfer is totally
broken (someone else has already reported that).

In this state I did not make any further tests with the -O variants.

Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany

- Ende der weitergeleiteten Nachricht -





Fwd: Re: fix: don't send HEAD if -O is given

2007-07-08 Thread Jochen Roderburg

Separately resent to the list because of mistyped address  ;-)

- Weitergeleitete Nachricht von Jochen Roderburg [EMAIL PROTECTED]
-
Datum: Sun, 08 Jul 2007 16:22:59 +0200
Von: Jochen Roderburg [EMAIL PROTECTED]
Antwort an: Jochen Roderburg [EMAIL PROTECTED]
 Betreff: Re: fix: don't send HEAD if -O is given
  An: Jochen Roderburg [EMAIL PROTECTED]

Zitat von Jochen Roderburg [EMAIL PROTECTED]:

 So the little difference was that this version did a (IMHO) unnecessary HEAD
 request in the case timestamping and no local file present. Not a big problem
 as such, but a side effect was that it created more cases for the other
 timestamp bug that I reported (timestamp for the local file is taken from the
 HEAD request and not from the GET request).

I think I remember now the motivation for this unnecessary HEAD request.
It *is* necessary as part of the support for getting the filename from the
Content-Disposition header. To know with which local file to compare you have
to get the headers from the server.
So I think the erroneous patch can simply be retracted. I don't know if there
was any other additional issue with the -O option. In older versions there
was the case, that with -O wget still looked at local files of the same name
than the remote file, but I think this does not happen anymore because it
already does no longer allow -O and timestamping together.


The remaining real bug is that the timestamp for the local file is taken from
the HEAD request and not from the GET request. Details for that are in the list
archive in http://www.mail-archive.com/wget@sunsite.dk/msg09303.html


Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany




- Ende der weitergeleiteten Nachricht -





Fwd: Re: fix: don't send HEAD if -O is given

2007-07-08 Thread Jochen Roderburg

Still trying to keep the list informed about the progress in this case ;-)

- Weitergeleitete Nachricht von Mauro Tortonesi [EMAIL PROTECTED]
-
Datum: Sun, 8 Jul 2007 17:29:15 +0200
Von: Mauro Tortonesi [EMAIL PROTECTED]
Antwort an: Mauro Tortonesi [EMAIL PROTECTED]
 Betreff: Re: fix: don't send HEAD if -O is given
  An: Jochen Roderburg [EMAIL PROTECTED]

On Sun, 08 Jul 2007 16:22:59 +0200
Jochen Roderburg [EMAIL PROTECTED] wrote:

 Zitat von Jochen Roderburg [EMAIL PROTECTED]:

  So the little difference was that this version did a (IMHO) unnecessary HEAD
  request in the case timestamping and no local file present. Not a big
problem
  as such, but a side effect was that it created more cases for the other
  timestamp bug that I reported (timestamp for the local file is taken from
the
  HEAD request and not from the GET request).

 I think I remember now the motivation for this unnecessary HEAD request.
 It *is* necessary as part of the support for getting the filename from the
 Content-Disposition header. To know with which local file to compare you
have
 to get the headers from the server.

yes, support for Content-Disposition HTTP header was the reason for which i had
to change wget's behaviour to send a HEAD HTTP request before the actual file
retrieval. so, one can't claim that the preliminary HEAD request is
unnecessary. however, it can (and should in fact be) skipped if -O or
--no-content-disposition are given.


 So I think the erroneous patch can simply be retracted. I don't know if there
 was any other additional issue with the -O option. In older versions there
 was the case, that with -O wget still looked at local files of the same name
 than the remote file, but I think this does not happen anymore because it
 already does no longer allow -O and timestamping together.

no, as i mentioned above, we should avoid sending HEAD if -O or
--no-content-disposition are given. therefore, we can't simply get rid of the
changes introduced by the buggy patch i submitted, but we need to fix and keep
them.

anyway, thank you very much for your bugreport. i just fixed the problem with my
buggy patch in my local repository. i will clean up the changes and commit them
tomorrow at most.


 The remaining real bug is that the timestamp for the local file is taken
from
 the HEAD request and not from the GET request. Details for that are in the
list
 archive in http://www.mail-archive.com/wget@sunsite.dk/msg09303.html

i am working on this issue as well.


-- 
Mauro Tortonesi [EMAIL PROTECTED]

- Ende der weitergeleiteten Nachricht -





Re: fix: don't send HEAD if -O is given

2007-07-08 Thread Jochen Roderburg
Zitat von Mauro Tortonesi [EMAIL PROTECTED]:

 On Sun, 08 Jul 2007 16:22:59 +0200
 Jochen Roderburg [EMAIL PROTECTED] wrote:

  I think I remember now the motivation for this unnecessary HEAD request.
  It *is* necessary as part of the support for getting the filename from the
  Content-Disposition header. To know with which local file to compare you
 have
  to get the headers from the server.

 yes, support for Content-Disposition HTTP header was the reason for which i
 had to change wget's behaviour to send a HEAD HTTP request before the actual
 file retrieval. so, one can't claim that the preliminary HEAD request is
 unnecessary. however, it can (and should in fact be) skipped if -O or
 --no-content-disposition are given.

Yes, that is what I also wanted to express: I thought first it was unnecessary
but now I understand it is only so in certain more specialized cases. Thank you
for more info, slowly I begin to get the whole picture, and see why my old
report was answered by you with a patch which at first glance looked like it
solved a different problem  ;-)

  So I think the erroneous patch can simply be retracted. I don't know if
 there
  was any other additional issue with the -O option. In older versions
 there
  was the case, that with -O wget still looked at local files of the same
 name
  than the remote file, but I think this does not happen anymore because it
  already does no longer allow -O and timestamping together.

 no, as i mentioned above, we should avoid sending HEAD if -O or
 --no-content-disposition are given. therefore, we can't simply get rid of the
 changes introduced by the buggy patch i submitted, but we need to fix and
 keep them.

 anyway, thank you very much for your bugreport. i just fixed the problem with
 my buggy patch in my local repository. i will clean up the changes and commit
 them tomorrow at most.

I'll keep an eye on it.
Somehow now I think that is wasn't so bad at all that the first attempt with the
patch did not succeed. If it would have done what it should I think now I would
have understood even less what it had to do with my case  ;-)

  The remaining real bug is that the timestamp for the local file is taken
 from
  the HEAD request and not from the GET request. Details for that are in the
 list
  archive in http://www.mail-archive.com/wget@sunsite.dk/msg09303.html

 i am working on this issue as well.

Fine, this one really concerns me more. It has always been a very important
feature for me that wget sets the timestamp on the downloaded files to the
timestamp of the server files, and it is really confusing when this doesn't
work  like expected in some cases.

Best regards,
J.Roderburg





Re: New wget maintainer

2007-06-30 Thread Jochen Roderburg
Zitat von Mauro Tortonesi [EMAIL PROTECTED]:


 IMVHO, the code in the trunk is ready to be released.

Sure ??  ;-)

It would be fine when at least the bugs would be eliminated which appeared new
during the development of version 1.11.

I, e.g., had reported a few issues after the last public beta in 08/2006, which
so far are still there.

Ref.:

http://www.mail-archive.com/wget@sunsite.dk/msg09302.html
http://www.mail-archive.com/wget@sunsite.dk/msg09303.html
http://www.mail-archive.com/wget@sunsite.dk/msg09312.html

Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany







Re: ftp through a proxy server

2007-06-11 Thread Jochen Roderburg
Zitat von Phillip Griffith [EMAIL PROTECTED]:

 I think I was wrong about the downloaded file.  It probably contains
 the dialog between wget and my proxy server, not between wget and the
 remote FTP server.

Yes, looks the same to me.

 I think it's a SOCKS5 proxy.  For the Solaris FTP client, I supply
 [EMAIL PROTECTED] proxy-user at the user prompt.

I don't know much about Socks, but this does not look like Socks to me.
It it were wget would need special support for it which it has not.

It should be no problem to specify such a double username to wget, either in
the URL or with the ftp-user parameter (commandline or wgetrc). But then you
will also need two passwords, how are these entered in your ftp-client?

Perhaps you can mail a complete dialog with your ftp-client. If possible also in
a debug mode where the internal commands are shown.

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany



Re: ftp through a proxy server

2007-06-08 Thread Jochen Roderburg
Zitat von Phillip Griffith [EMAIL PROTECTED]:

 The problem, briefly, is that wget is sending HTTP GET commands to an
 FTP server when there's a proxy server in the middle.

 My .wgetrc file contains the following entries.  I've redacted all the
 values, except for the port number on the FTP proxy, which seems to be
 important:

 ftp_proxy = ftpgate.yoyodyne.com:21
 proxy_user =
 proxy_passwd =
 ftp_user =
 ftp_passwd =

 (There is some confusion on my end over the spelling of proxy_passwd
 and ftp_passwd, since my first instinct was to spell them
 proxy_password and ftp_password.)

 Here we go with invoking wget.  Again, I have redacted things like
 hostnames, user names, and passwords:

 wget --debug ftp://some-vendor.com/some-directory/some.pdf
 DEBUG output created by Wget 1.10.2 on solaris2.9.

 --12:20:20--  ftp://some-vendor.com/some-directory/some.pdf
= `some.pdf.1'
 Resolving ftpgate.yoyodyne.com... 192.168.2.4
 Caching ftpgate.yoyodyne.com = 192.168.2.4
 Connecting to ftpgate.yoyodyne.com|192.168.2.4|:21... connected.
 Created socket 4.
 Releasing 0x0005c7d0 (new refcount 1).

 ---request begin---
 GET ftp://some-vendor.com/some-directory/some.pdf HTTP/1.0
 User-Agent: Wget/1.10.2
 Accept: */*
 Proxy-Authorization: Basic Yadda yadda [redacted] =
 Host: some-vendor.com

 ---request end---
 Proxy request sent, awaiting response...
 ---response begin---
 ---response end---
 200 No headers, assuming HTTP/0.9
 Length: unspecified

 [ = ] 498   --.--K/s
  ^

 At this point the download stalls and I hit ^C.
 Here are the contents of the downloaded file, some.pdf.1:

 220 Secure Gateway FTP server ready.
 500 Syntax error, command unrecognized: 'GET
 ftp://some-vendor.com/some-directory/some.pdf HTTP/1.0'
 500 Syntax error, command unrecognized: 'User-Agent: Wget/1.10.2'
 500 Syntax error, command unrecognized: 'Accept: */*'
 500 Syntax error, command unrecognized: 'Proxy-Authorization: Basic
 Yadda yadda [redacted] ='
 500 Syntax error, command unrecognized: 'Host: some-vendor.com'
 500 Syntax error, command unrecognized: ''

 I can retrieve files just fine through this proxy server using the
 Solaris FTP client, which makes me think the problem is with wget.  Or
 else I'm making some rookie mistake.


Thanks for providing more details. I think we are getting nearer now ;-)
This proxy/gateway does not look at all like the http-proxy I described in an
earlier message and what wget and browsers use when you tell them something
about a proxy for ftp.

How do you specify the real ftp-host then when you use the Solaris ftp client?
If it is something simple like login with [EMAIL PROTECTED] then it should be
possible with wget also (without mentioning a proxy to wget). If it is some
special internal mechanism, however, then I fear it will not be possible with
wget (because it only knows the http-proxy type).

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany



Re: ftp through a proxy server

2007-06-07 Thread Jochen Roderburg
Zitat von Phillip Griffith [EMAIL PROTECTED]:

 I have a problem talking to an FTP server through my proxy server at
 the office.  I'm getting through the proxy server OK, and I'm sure I'm
 talking to the FTP server on the other end, but wget insists on
 sending an HTTP GET instead of an FTP GET.

 The reason I know this is because the download hangs, and the
 downloaded file consists of the dialog between wget and the FTP
 server.  First the FTP server announces itself, followed by the HTTP
 GET request from wget, and then the error messages from the FTP
 server.

 I'm invoking wget with an ftp:// URL.  How do I persuade wget to use
 FTP commands instead of HTTP?  Or is the proxy server I need to
 persuade?

 This seems to be a problem only with a proxy server in the middle.  A
 direct connection from home (through a NAT firewall, that is) is no
 problem.


Well, what you describe, is just, how ftp through a (http-)proxy works. The
client (wget or e.g. a browser) talks HTTP with the proxy and the proxy does
the real FTP protocol with the ftp server. From the wget end you see only the
dialog with the proxy server and not the dialgo with the ftp server in the
background.

Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany





Re: How to preserve file (rights+date) with wget

2006-12-08 Thread Jochen Roderburg

Jean-Philippe BATTU schrieb:


I use wget to download file from an HTTP server.
I notice the file downloaded is created with the current date once the tranfert
has been done
I didn't find any option to keep the date of the original file.
Is there any ways to preserve the original date of the file, like tar(1) do as
default and scp(1) -p do


timestamping  is the magic phrase to look for ;-)

Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany


Re: wget: ignores Content-Disposition header

2006-09-15 Thread Jochen Roderburg

Noèl Köthe schrieb:

Hello,

I can reproduce the following with 1.10.2 and 1.11.beta1:

Wget ignores Content-Disposition header described in RFC 2616,
19.5.1 Content-Disposition.

an example URL is:

http://bugs.debian.org/cgi-bin/bugreport.cgi/%252Ftmp%252Fupdate-grub.patch?bug=168715;msg=5;att=1




Sorry, I don't see any Content-Disposition header in this example URL  ;-)

Result of a HEAD request:

200 OK
Connection: close
Date: Fri, 15 Sep 2006 12:58:14 GMT
Server: Apache/1.3.33 (Debian GNU/Linux)
Content-Type: text/html; charset=utf-8
Last-Modified: Mon, 04 Aug 2003 21:18:10 GMT
Client-Date: Fri, 15 Sep 2006 12:58:14 GMT
Client-Response-Num: 1


My own experience is that the 1.11 alpha/beta versions (where this 
feature was introduced) worked fine with the examples I encountered.


Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany



Re: REST - error for files bigger than 4GB

2006-09-06 Thread Jochen Roderburg

Petr Kras schrieb:

When transfer is broken and restoration is required
it doesnt work for files greater than 4GB (not checked for 2GB)
and brake is behind 4GB (2GB) limit.



--13:58:54--  ftp://streamlib.pan.eu/Streams/TVDC_SS_01100.ts
   = `/opt/streams/Stream1/TVDC_SS_01100.ts'
== CWD not required.
== PORT ... done.== REST 4998699942 ... 
REST failed, starting from scratch.


== RETR TVDC_SS_01100.ts ... done.
Length: 5,632,104,188 (5.2G), 633,404,246 (604M) remaining



Looks more like your FTP server does not support the REST command which 
is needed for partial transfers.

Maybe a run with wget debug option (-d) shows more.

I tried a visit to the mentioned site, but got only unknown host.
Is streamlib.pan.eu the real hostname?

Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany



Fwd: Re: REST - error for files bigger than 4GB

2006-09-06 Thread Jochen Roderburg

Seems this wasn't sent to the list  ;-)

- Weitergeleitete Nachricht von Petr Kras [EMAIL PROTECTED] -
Datum: Wed, 6 Sep 2006 14:57:32 +0200
Von: Petr Kras [EMAIL PROTECTED]
Antwort an: Petr Kras [EMAIL PROTECTED]
 Betreff: Re: REST - error for files bigger than 4GB
  An: Jochen Roderburg [EMAIL PROTECTED]

Thanks for answer, it helped me to find out the reason.

In Wget it looks that this server supports REST command, see samples
below.
For files under 4G it works fine, over 4G it does'nt.
The server itself doesn't accept value over 4GB (2^32).

--
ftp REST
?Invalid command.
ftp quote REST 4294967295
350 Restarting at 4294967295.
ftp quote REST 4294967296
501 Reply marker is invalid.
ftp
--

I'm sorry but this server is located in our intranet.

Regards,
Petr Kras



Here is listing of Wget behaviour under and behind 4GB limit. I think it
is correct.

CORRECT - just under 4G

== PORT ... done.== REST 4099397567 ... done.
== RETR TVDC_SS_01103.ts ... done.
Length: 5,632,104,188 (5.2G), 1,532,706,621 (1.4G) remaining

 [ skipping 400K ]
400K ,, ,, .. .. .. 72% 427.40
KB/s
401K .. .. .. .. .. 73% 411.05
KB/s


another file
CORRECT - restoration under 4G

== PORT ... done.== REST 2818997399 ... done.
== RETR TVDC_SS_01100.ts ... done.
Length: 5,632,104,188 (5.2G), 2,813,106,789 (2.6G) remaining

 [ skipping 275K ]
275K ,, .. .. .. .. 50% 407.44
KB/s
276K .. .. .. .. .. 50% 318.09
KB/s

INCORRECT - restoration behind 4G

== PORT ... done.== REST 4998699942 ...
REST failed, starting from scratch.

== RETR TVDC_SS_01100.ts ... done.
Length: 5,632,104,188 (5.2G), 633,404,246 (604M) remaining

0K .. .. .. .. ..  0%  378.26
KB/s
1K .. .. .. .. ..  0%  312.72
KB/s



Jochen Roderburg [EMAIL PROTECTED] wrote on 06.09.2006 09:21:35:

 Petr Kras schrieb:
  When transfer is broken and restoration is required
  it doesnt work for files greater than 4GB (not checked for 2GB)
  and brake is behind 4GB (2GB) limit.

  --13:58:54--  ftp://streamlib.pan.eu/Streams/TVDC_SS_01100.ts
 = `/opt/streams/Stream1/TVDC_SS_01100.ts'
  == CWD not required.
  == PORT ... done.== REST 4998699942 ...
  REST failed, starting from scratch.
 
  == RETR TVDC_SS_01100.ts ... done.
  Length: 5,632,104,188 (5.2G), 633,404,246 (604M) remaining
 

 Looks more like your FTP server does not support the REST command which
 is needed for partial transfers.
 Maybe a run with wget debug option (-d) shows more.

 I tried a visit to the mentioned site, but got only unknown host.
 Is streamlib.pan.eu the real hostname?

 Best regards,

 Jochen Roderburg
 ZAIK/RRZK
 University of Cologne
 Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
 D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
 Germany




..
Confidentiality Notice
The information contained in this Email, and any attachments, is intended for
the named recipients only. It may contain confidential and/or legally
privileged information. If you are not the intended recipient, you must not
copy, store, distribute or take any action in reliance on it. Any views
expressed do not necessarily reflect the views of the company.

If you receive this Email by mistake, please advise the sender by using the
reply facility in your Email software and then delete it.
.

- Ende der weitergeleiteten Nachricht -





Re: wget 1.11 beta1 another time-stamping problem

2006-08-30 Thread Jochen Roderburg
Zitat von Jochen Roderburg [EMAIL PROTECTED]:

 In the time-stamping mode wget always issued first a HEAD request when there
 was
 a local file, and later a GET request when after inspecting the HEAD outpout
 it
 found out that it should do so.

 The wget 1.11 now *always* does the HEAD request, so this problem may be a
 little related to the other just-repaired problem.

Now I even stumbled over a case where this behaviour leads to an error, namely
when the server doesn't like the HEAD request and responds with an error.
Therefore my additional question: Is this HEAD request intended or is it an
error? Has it perhaps to do with the new Content-Disposition stuff?

I encountered the new problem when downloading a new Eudora Beta. This is
delivered via a cgi which makes a redirection to the real file link.
A HEAD request for the original link is answered with 500 Server Error.


wget.111b1 -d http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7106

DEBUG output created by Wget 1.11-beta-1 on linux-gnu.

--10:53:19--  http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7106
Resolving www.eudora.com... 199.106.114.30
Caching www.eudora.com = 199.106.114.30
Connecting to www.eudora.com|199.106.114.30|:80... connected.
Created socket 3.
Releasing 0x08086920 (new refcount 1).

---request begin---
HEAD /cgi-bin/export.cgi?productid=EUDORA_win_7106 HTTP/1.0
User-Agent: Wget/1.11-beta-1
Accept: */*
Host: www.eudora.com
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 500 Server Error
Server: Netscape-Enterprise/6.0
Date: Wed, 30 Aug 2006 08:53:20 GMT
Content-length: 305
Content-type: text/html
Connection: keep-alive

---response end---
500 Server Error
Registered socket 3 for persistent reuse.
10:53:21 ERROR 500: Server Error.



wget.1102 -d http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7106

DEBUG output created by Wget 1.10.2 on linux-gnu.

--10:51:22--  http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7106
   = `export.cgi?productid=EUDORA_win_7106'
Resolving www.eudora.com... 199.106.114.30
Caching www.eudora.com = 199.106.114.30
Connecting to www.eudora.com|199.106.114.30|:80... connected.
Created socket 3.
Releasing 0x08084e60 (new refcount 1).

---request begin---
GET /cgi-bin/export.cgi?productid=EUDORA_win_7106 HTTP/1.0
User-Agent: Wget/1.10.2
Accept: */*
Host: www.eudora.com
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 302 Moved Temporarily
Server: Netscape-Enterprise/6.0
Date: Wed, 30 Aug 2006 08:51:21 GMT
Location:
http://www.eudora.com/download/eudora/windows/7.1/beta/Eudora_7.1.0.6_beta.exe
Content-length: 0
Connection: keep-alive

---response end---
302 Moved Temporarily
Registered socket 3 for persistent reuse.
Location:
http://www.eudora.com/download/eudora/windows/7.1/beta/Eudora_7.1.0.6_beta.exe
[
following]
Skipping 0 bytes of body: [] done.
--10:51:22-- 
http://www.eudora.com/download/eudora/windows/7.1/beta/Eudora_7.1.0.6_beta.e
xe
   = `Eudora_7.1.0.6_beta.exe'
Reusing existing connection to www.eudora.com:80.
Reusing fd 3.

---request begin---
GET /download/eudora/windows/7.1/beta/Eudora_7.1.0.6_beta.exe HTTP/1.0
User-Agent: Wget/1.10.2
Accept: */*
Host: www.eudora.com
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Server: Netscape-Enterprise/6.0
Date: Wed, 30 Aug 2006 08:51:21 GMT
Content-type: application/octet-stream
Last-modified: Mon, 28 Aug 2006 21:29:37 GMT
Content-length: 17403352
Accept-ranges: bytes
Connection: keep-alive

---response end---
200 OK
Length: 17,403,352 (17M) [application/octet-stream]

100%[==] 17,403,352   322.84K/s   
ETA 00:00

10:52:29 (256.51 KB/s) - `Eudora_7.1.0.6_beta.exe' saved [17403352/17403352]


Regards, J.Roderburg



wget 1.11 beta1 SSL configuration problem

2006-08-27 Thread Jochen Roderburg

There seems to a configure problem with the options to specify the directories
where the SSL installation resides.

I have the SSL that I want in /usr/local and in wget 1.10.2 the configure option
--with-libssl-prefix=/usr/local worked.

Part of configure output:

checking for libssl... yes
checking how to link with libssl... /usr/local/lib/libssl.so
/usr/local/lib/libcrypto.so -Wl,-rpath
-Wl,/usr/local/lib
configure: compiling in support for SSL

and in the Makefiles I have:

LIBS = -lintl -ldl -lrt  /usr/local/lib/libssl.so /usr/local/lib/libcrypto.so
-Wl,-rpath -Wl,/usr/local/lib

With wget-1.11-beta-1 however I get the configure output:

checking how to link with libssl... -lssl -lcrypto
configure: compiling in support for SSL via OpenSSL

and

LIBS = -lintl -ldl -lrt  -lssl -lcrypto

Somehow the specified directory /usr/local seems to be ignored  ;-)
I *have* a SSL under /usr from the base system installation, but the current and
correctly configured version that I actually use is in /usr/local.

J.Roderburg



wget 1.11 beta1 another time-stamping problem

2006-08-27 Thread Jochen Roderburg

Unfortunately the time-stamping saga continues  ;-)

In the time-stamping mode wget always issued first a HEAD request when there was
a local file, and later a GET request when after inspecting the HEAD outpout it
found out that it should do so.

The wget 1.11 now *always* does the HEAD request, so this problem may be a
little related to the other just-repaired problem.
The error, however, is that it now uses the time-data from the HEAD output as
timestamp for the local file and not the time-data from the GET request.

This could theoretically even be a problem with a direct site-transfer, when the
remote file changes between the HEAD and the GET, but the pratical case where it
occured is in connection with a proxy-cache. When the proxy-cache has a cached
file-copy which is older than the file on the orginal site, the HEAD delivers
the data from the cached file, but when upon the GET the proxy itself decides
to retrieve the newer version (or is forced to do that with the wget --no-cache
option) we get the discrepancy: we get the *newer* file downloaded but with the
*older* time-stamp.

And a real-life example to illustrate the issue:


HEAD -p http://wwwcache.uni-koeln.de:8080
http://www.extractnow.com/extractnow.exe

200 OK
Date: Wed, 23 Aug 2006 12:15:42 GMT
Accept-Ranges: bytes
Age: 165431
ETag: 98caa15d43c4c61:4da
Server: Microsoft-IIS/6.0
Content-Length: 981504
Content-Type: application/octet-stream
Last-Modified: Sun, 20 Aug 2006 10:28:23 GMT
Client-Date: Sun, 27 Aug 2006 10:03:17 GMT
Client-Response-Num: 1
Proxy-Connection: close
X-Cache: HIT from wwwcache.uni-koeln.de
X-Powered-By: ASP.NET

HEAD http://www.extractnow.com/extractnow.exe

200 OK
Date: Sun, 27 Aug 2006 10:05:10 GMT
Accept-Ranges: bytes
ETag: 4e9432fc57c9c61:4da
Server: Microsoft-IIS/6.0
Content-Length: 983005
Content-Type: application/octet-stream
Last-Modified: Sat, 26 Aug 2006 21:38:35 GMT
Client-Date: Sun, 27 Aug 2006 10:05:09 GMT
Client-Response-Num: 1
X-Powered-By: ASP.NET


The two HEAD (HEAD utility from the lwp-package) requests show that the
cache has a file version from 20 Aug 2006 and the site has a file version from
26 Aug 2006


wget.111b1 -d http://www.extractnow.com/extractnow.exe

DEBUG output created by Wget 1.11-beta-1 on linux-gnu.

--12:06:18--  http://www.extractnow.com/extractnow.exe
Resolving wwwcache.uni-koeln.de... 134.95.19.61
Caching wwwcache.uni-koeln.de = 134.95.19.61
Connecting to wwwcache.uni-koeln.de|134.95.19.61|:8080... connected.
Created socket 3.
Releasing 0x08086950 (new refcount 1).

---request begin---
HEAD http://www.extractnow.com/extractnow.exe HTTP/1.0
User-Agent: Wget/1.11-beta-1
Accept: */*
Host: www.extractnow.com

---request end---
Proxy request sent, awaiting response...
---response begin---
HTTP/1.0 200 OK
Content-Length: 981504
Content-Type: application/octet-stream
Last-Modified: Sun, 20 Aug 2006 10:28:23 GMT
Accept-Ranges: bytes
ETag: 98caa15d43c4c61:4da
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Wed, 23 Aug 2006 12:15:42 GMT
Age: 165612
X-Cache: HIT from wwwcache.uni-koeln.de
Proxy-Connection: close

---response end---
200 OK
Length: 981504 (958K) [application/octet-stream]
Closed fd 3
--12:06:18--  http://www.extractnow.com/extractnow.exe
Found wwwcache.uni-koeln.de in host_name_addresses_map (0x8086950)
Connecting to wwwcache.uni-koeln.de|134.95.19.61|:8080... connected.
Created socket 3.
Releasing 0x08086950 (new refcount 1).

---request begin---
GET http://www.extractnow.com/extractnow.exe HTTP/1.0
User-Agent: Wget/1.11-beta-1
Accept: */*
Host: www.extractnow.com

---request end---
Proxy request sent, awaiting response...
---response begin---
HTTP/1.0 200 OK
Content-Length: 983005
Content-Type: application/octet-stream
Last-Modified: Sat, 26 Aug 2006 21:38:35 GMT
Accept-Ranges: bytes
ETag: 4e9432fc57c9c61:4da
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Sun, 27 Aug 2006 10:06:20 GMT
X-Cache: MISS from wwwcache.uni-koeln.de
Proxy-Connection: close

---response end---
200 OK
Length: 983005 (960K) [application/octet-stream]
Saving to: `extractnow.exe'

100%[] 983,005
265K/s   in 3.6
s

Closed fd 3
12:06:22 (265 KB/s) - `extractnow.exe' saved [983005/983005]


And the result on the local disk:

...983005 20.08.2006 12:28 extractnow.exe

The filesizes show that the newer version was downloaded but it got the
time-stamp of the older one.


Btw, a quick work-around is to download it a second time, the cache has now the
newer file with newer file data, wget requests it new because it now sees the
local file as older, the file is retrieved directly from the cache and gets the
correct time-stamp now  ;-)


Best regards,
Jochen Roderburg




Re: Large file sizes

2006-08-25 Thread Jochen Roderburg
Johan Kohler kohlerj at ukzn.ac.za writes:

 
 I'm trying to d/l a dvd image via ftp. It was going quite well yesterday   
 The lenght is reported as negative presumably because of the large size  
 3.4 Gb.  Is it the negative size that caused the resume to fail?
 
 me at mymachine:~$ wget -c  

ftp://ftp.ukc.mirrorservice.org/sites/cdimage.ubuntu.com/cdimage/kubuntu/
releases/6.06/release/kubuntu-6.06-dvd-i386.iso
 --02:22:04--   

ftp://ftp.ukc.mirrorservice.org/sites/cdimage.ubuntu.com/cdimage/kubuntu/
releases/6.06/release/kubuntu-6.06-dvd-i386.iso
 = `kubuntu-6.06-dvd-i386.iso'
 Resolving ftp.ukc.mirrorservice.org... 212.219.56.134, 212.219.56.132,  
 212.219.56.133
 Connecting to ftp.ukc.mirrorservice.org[212.219.56.134]:21... connected.
 Logging in as anonymous ... Logged in!
 == SYST ... done.== PWD ... done.
 == TYPE I ... done.  == CWD  
 /sites/cdimage.ubuntu.com/cdimage/kubuntu/releases/6.06/release ... done.
 == SIZE kubuntu-6.06-dvd-i386.iso ... done.
 == PASV ... done.== REST -637825024 ...
 REST failed, starting from scratch.
 == RETR kubuntu-6.06-dvd-i386.iso ... done.
 Length: -637,825,024 (unauthoritative)
 ..
 

Works fine here with wget 1.10.2 on a Linux system.

Perhaps you have a wget version without large file support compiled in?
On what system are you doing this and where does the wget binary come from?

Regards, J.Roderburg




Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]

2006-08-20 Thread Jochen Roderburg
Zitat von Jochen Roderburg [EMAIL PROTECTED]:

 Zitat von Hrvoje Niksic [EMAIL PROTECTED]:

  Mauro, you will need to look at this one.  Part of the problem is that
  Wget decides to save to index.html.1 although -c is in use.  That is
  solved with the patch attached below.  But the other part is that
  hstat.local_file is a NULL pointer when
  stat(hstat.local_file, st) is used to determine whether the file
  already exists in the -c case.  That seems to be a result of your
  changes to the code -- previously, hstat.local_file would get
  initialied in http_loop.

 This looks as if if could also be the cause for the problems which I reported
 some weeks ago for the timestamping mode
 (http://www.mail-archive.com/wget@sunsite.dk/msg09083.html)


Hello Mauro,

The timestamping issues I reported in above mentioned message are now also
repaired by the patch you mailed last week here.
Only the small *cosmetic* issue remains that it *always* says:
   Remote file is newer, retrieving.
even if there is no local file yet.

J.Roderburg



Re: wget 1.11 alpha1 [Fwd: Bug#378691: wget --continue doesn't workwith HTTP]

2006-08-08 Thread Jochen Roderburg
Zitat von Hrvoje Niksic [EMAIL PROTECTED]:

 Mauro, you will need to look at this one.  Part of the problem is that
 Wget decides to save to index.html.1 although -c is in use.  That is
 solved with the patch attached below.  But the other part is that
 hstat.local_file is a NULL pointer when
 stat(hstat.local_file, st) is used to determine whether the file
 already exists in the -c case.  That seems to be a result of your
 changes to the code -- previously, hstat.local_file would get
 initialied in http_loop.

This looks as if if could also be the cause for the problems which I reported
some weeks ago for the timestamping mode
(http://www.mail-archive.com/wget@sunsite.dk/msg09083.html)

J.Roderburg



Re: wget 1.11 alpha1 - content disposition filename

2006-07-17 Thread Jochen Roderburg
Zitat von Hrvoje Niksic [EMAIL PROTECTED]:

 Jochen Roderburg [EMAIL PROTECTED] writes:

  E.g, a file which was supposed to have the name BW.txt came with the
 header:
  Content-Disposition: attachment; filename=Bamp;W.txt;
  All programs I tried (the new wget and several browsers and my own script
 ;-)
  seemed to stop parsing at the first semicolon and produced the filename
 Bamp.

 Unfortunately, if it doesn't work in web browsers, how can it be
 expected to work in Wget?  The server-side software should be fixed.


I mainly wanted to hear from some HTTP/HTML-Experts that I was correct with my
assumption that the problem here is at the server side  ;-)
Thank you, Mauro and Hrvoje, for confirming that.

Regards, J.Roderburg




wget 1.11 alpha1 - bug with timestamping option

2006-06-17 Thread Jochen Roderburg
 (new refcount 1).

---request begin---
HEAD / HTTP/1.0
User-Agent: Wget/1.11-alpha-1
Accept: */*
Host: www.uni-koeln.de
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Sat, 17 Jun 2006 18:49:01 GMT
Server: Apache/2.0.52
Last-Modified: Wed, 14 Jun 2006 06:47:06 GMT
Accept-Ranges: bytes
Content-Type: text/html; charset=iso-8859-1
Connection: close

---response end---
200 OK
hs-local_file is: index.html (existing)
TEXTHTML is on.
Length: unspecified [text/html]
Closed fd 3
Remote file is newer, retrieving.

--20:49:01--  http://www.uni-koeln.de/
Found www.uni-koeln.de in host_name_addresses_map (0x8086440)
Connecting to www.uni-koeln.de|134.95.19.39|:80... connected.
Created socket 3.
Releasing 0x08086440 (new refcount 1).

---request begin---
GET / HTTP/1.0
User-Agent: Wget/1.11-alpha-1
Accept: */*
Host: www.uni-koeln.de
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Sat, 17 Jun 2006 18:49:01 GMT
Server: Apache/2.0.52
Last-Modified: Wed, 14 Jun 2006 06:47:06 GMT
Accept-Ranges: bytes
Content-Type: text/html; charset=iso-8859-1
Connection: close

---response end---
200 OK
hs-local_file is: index.html.1 (not existing)
TEXTHTML is on.
Length: unspecified [text/html]
Saving to: `index.html.1'

[ =
] 20,703  --.-K/s   in 0.1s

Closed fd 3
20:49:02 (165 KB/s) - `index.html.1' saved [20703]


It starts similar as in the first case:
  HTTP HEAD
and says:
  local_file is: index.html (existing)
which is correct now.
Then again it says:
  Remote file is newer, retrieving.
which is wrong now, as local and remote are the same.
Then it goes ahead and downloads the file and saves it to
index.html.1, as if the timestamping option is not set at all,
wrong again.


Best Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany







Re: Problems downloading from specific site

2005-08-09 Thread Jochen Roderburg
Zitat von Reginaldo O. Andrade [EMAIL PROTECTED]:

I would like to friendly offer a challenge to you. Can you download
 something from the site www.babene.ru using wget? I always receive the
 message ERROR 403: Forbidden, but using Firefox or IE, I download the
 pictures without any problem. I already tried some user-agent strings,
 but without success.

Not an uncommon problem ;-)
They check the referer, which a browser usually sends and which points to the
page you are coming from.  You can do the following with wget:

wget --referer=http://www.babene.ru/  http://www.babene.ru/.

Best regards,
J.Roderburg



Re: ftp bug in 1.10

2005-06-15 Thread Jochen Roderburg

Herold Heiko schrieb:

I have a reproducable report (thanks Igor Andreev) about a little verbouse
log problem with ftp with my windows binary, is this reproducable on other
platforms, too ?

wget -v ftp://garbo.uwasa.fi/pc/batchutil/buf01.zip
ftp://garbo.uwasa.fi/pc/batchutil/rbatch15.zip  

(seems to happen with any ftp download I tried though)

Last line of output is:

Downloaded:  bytes in 2 files

Note missing number of bytes.
Heiko



I see the same under Windows, but with a Linux version the output is 
correct. And it happens not only with ftp, also with multiple http requests.


Regards, J.Roderburg





Re: wget 1.10 release candidate 1

2005-06-04 Thread Jochen Roderburg
Zitat von Oliver Schulze L. [EMAIL PROTECTED]:

 Hi Mauro,
 do you know if the regex patch from Tobias was applied to this release?

 Thanks
 Oliver


The last words on this topic that I remember were here:

http://www.mail-archive.com/wget@sunsite.dk/msg07436.html

Regards,
J.Roderburg



Re: wget 1.10 release candidate 1

2005-06-04 Thread Jochen Roderburg
Zitat von Oliver Schulze L. [EMAIL PROTECTED]:

 Neither, rc1 or alpha2 have prce patch included.
 I think that prce is a very usefull patch, and it should be
 added to CVS and not enabled by default in the ./configure script.
 So, if you want to use prce, just ./configure --with-prce
 and everybody is happy.

Hmmm, you mean everybody who has prce is happy?
Did you not read the message that I pointed you to ;-) ??
It said that the developers do not want to include a regex patch in wget until
they find a solution that is portable enough to all systems that wget is
supposed to run on.
And no, I'm not involved in this, just wanted to remind that this has been
discussed already a few times on the list ;-)

J.Roderburg



Re: wget bug: spaces in directories mapped to %20

2005-01-17 Thread Jochen Roderburg
Zitat von Tony O'Hagan [EMAIL PROTECTED]:

 Original path:  abc def/xyz pqr.gif
 After wget mirroring:   abc%20def/xyz pqr.gif   (broken link)

 wget --version  is GNU Wget 1.8.2


This was a well-known error in the 1.8 versions of wget, which is already
corrected in the 1.9 versions.

Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany



Re: Wget: cannot fetch files from password protected ftp sites

2004-10-11 Thread Jochen Roderburg
Zitat von Graham Leggett [EMAIL PROTECTED]:

 In v1.9.1 of wget, it is not possible to retrieve files from an ftp
 server that requires a username and password.

Hmm, worked always fine here (anonymous and non-anonymous) with

ftp://user:[EMAIL PROTECTED]/path-to-file

Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany



Re: trouble with encoded filename

2004-04-07 Thread Jochen Roderburg
none none wrote:
$ wget -S --referer=http://5.6.7.8/index.htm \
  --user-agent=Mozilla \
  http://1.2.3.4/?.file
--00:00:00--  http://1.2.3.4/%E9.file
   = `?.file'
Connecting to 1.2.3.4:80... connected.
HTTP request sent, awaiting response...
 1 HTTP/1.1 403 Forbidden
 2 Date: Wed, 07 Apr 2004 00:00:00 GMT
 3 Server: Apache/2.0.48 (Win32)
 4 Content-Length: 323
 5 Keep-Alive: timeout=15, max=100
 6 Connection: Keep-Alive
 7 Content-Type: text/html; charset=iso-8859-1
00:00:00 ERROR 403: Forbidden.
Hi,

Are you really sure that the file you want has the name ?.file with a 
question mark character??
I see from the headers that the server involved runs under Windows and 
the question mark is most certainly a character which is not possible in 
a filename under DOS/Windows.

Regards, J.Roderburg


Re: wget-ftp-Problem

2003-12-24 Thread Jochen Roderburg
Zitat von Daniel Daboul [EMAIL PROTECTED]:

 On Tue, Dec 09, 2003 at 11:30:54PM +0100, Hrvoje Niksic wrote:
  You're not making a mistake, recursive download over FTP proxies is
  currently broken.
 
 That is probably the single feature I'd want most. Is there an older
 version of wget, where it works (didn't find it in the ChangeLog)?
 

New bugs are rarely documented in ChangeLogs, unless they are implemented
deliberately ;-)

This one appeared in v1.8, older versions work as expected.

Best regards, J.Roderburg




Re: wget 1.9 - behaviour change in recursive downloads

2003-10-07 Thread Jochen Roderburg
Zitat von Hrvoje Niksic [EMAIL PROTECTED]:

 Jochen Roderburg [EMAIL PROTECTED] writes:
 
  Zitat von Hrvoje Niksic [EMAIL PROTECTED]:
 
  It's a feature.  `-A zip' means `-A zip', not `-A zip,html'.  Wget
  downloads the HTML files only because it absolutely has to, in order
  to recurse through them.  After it finds the links in them, it deletes
  them.
 
  Hmm, so it has really been an undetected error over all the years
  ;-) ?
 
 s/undetected/unfixed/
 
 At least I've always considered it an error.  I didn't know people
 depended on it.

Well, *depend* is a rather strong expression for that ;-)
It worked that way always, I got used to it, I never really thought if it was
correct or not, because I had a use for it. So I was astonished, when these
files suddenly disappeared.

As I wrote already, I will mention them explicitly now. I think, the worst that
will happen is that I get a few more of them than before.

Perhaps the whole thing could be mentioned in the documentation of the
accept/reject option. Current there is only this sentence there:

 Note that these two options do not affect the downloading of HTML
 files; Wget must load all the HTMLs to know where to go at
 all--recursive retrieval would make no sense otherwise.

J. Roderburg





wget 1.9 - behaviour change in recursive downloads

2003-10-03 Thread Jochen Roderburg

Hi,

I've found a situation where the new version 1.9beta behaves differently than
earlier version. I'm not sure if this is an corrected error or a new bug, I
personally would prefer the old behaviour.

When I do a recursive download with an accept list like

  wget -r -l1 -nd -A zip http://some.host.com/index.htm

it downloads the index.htm file and all the zip files mentioned therein.
With older versions the start file index.htm itself stays there in the end.

Version 1.9 downloads the index.htm and deletes it immediately with the message 
   

  Removing index.htm since it should be rejected.

The recursion is then done correctly.

Best Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany




Re: wget 1.9 - behaviour change in recursive downloads

2003-10-03 Thread Jochen Roderburg
Zitat von Hrvoje Niksic [EMAIL PROTECTED]:

 It's a feature.  `-A zip' means `-A zip', not `-A zip,html'.  Wget
 downloads the HTML files only because it absolutely has to, in order
 to recurse through them.  After it finds the links in them, it deletes
 them.

Hmm, so it has really been an undetected error over all the years ;-) ?

Ok, I see, if adding explicit html im my scripts helps, I like to keep those
files  because they show me the date when the last change has occured in a
directory.

Regards, J.Roderburg






Re: More flexible URL file name generation

2003-09-19 Thread Jochen Roderburg
Hrvoje Niksic wrote:

This patch makes URL file name generation a bit more flexible and,
hopefully, better for the end-user.  
Hi Hrvoje,

I've tried out that patch under Linux. To be precisely I used the 
complete source code from Heiko Herold's website which had this patch 
already incorporated.

I have used it a few days now for all my normal downloads and so far all 
local filenames came out correctly again.
Thanks for making these changes, this makes the current version usuable 
for me.

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany




Re: Apology for absence

2002-07-26 Thread Jochen Roderburg

Zitat von Hrvoje Niksic [EMAIL PROTECTED]:

 Only the bare minimum of characters should be encoded.  The ones that
 come to mind are '/' (illegal), '~' (rm -r ~foo dangerous), '*' and
 '?' (used in wildcards), control characters 0-31 (controls), and chars
 128-159 (non-printable).
  

I hope, you didn't really mean this.
You don't intend to encode the international characters, do you ?

Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10 Tel.:   +49-221/478-7024
D-50931 Koeln   E-Mail: [EMAIL PROTECTED]
Germany







Re: Wget 1.8.2 Windows Binaries

2002-05-30 Thread Jochen Roderburg

Hello Heiko,

Could you perhaps also offer a current wget version without SSL support on your
website?
For those that don't want SSL and/or don't want to fight with additional
(sometimes incompatible) DLLs ;-)

Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany


-
This mail sent through IMP: http://horde.org/imp/




Re: Continue getting a partially-downloaded file in Windows 98

2002-04-11 Thread Jochen Roderburg

Hrvoje Niksic wrote:
 Uncle [EMAIL PROTECTED] writes:
I have a problem with subject. WGet tells me that Continued download failed
on this file, which conflicts with `-c'. Refusing to truncate existing
file. How to solve this problem.
System environment: Windows98 SE, VC++ 6.0, Wingate 4.2.0 proxy.
 
 That message indicates that the HTTP server does not support continued
 downloading, which means that a new download would start from scratch.
 However, you already had a file and you specified `-c', meaning you
 want existing data intact.  That is the conflict.

That might have been the intended reason for this error message, but I 
have also seen it several times with HTTP servers that *did* support
partial downloads. In my case it was on Unix platforms with wget 
versions = 1.7. I usually then went back to a version 1.6, which I have 
kept for this and other special cases, and this always did the continued 
download without problems.

Best Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10   Tel.:   +49-221/478-7024
D-50931 Koeln E-Mail: [EMAIL PROTECTED]
Germany




Re: Continue getting a partially-downloaded file in Windows 98

2002-04-11 Thread Jochen Roderburg

Hrvoje Niksic wrote:
 
 Can you show me a sample URL where continued download works with 1.6,
 but not with 1.8.1?

I tried to find one, and found something else instead:
The problem seems to occur only when a proxy is involved.
Therefore you can't test my example cases yourself, because
you can't use our local proxy.
I have kept debug outputs, but they don't show any interesting 
difference between the two versions before the error message.

Regards, J.Roderburg




Re: Wget 1.8.1 continuous transfer doesn't work on Compaq Tru64 UNIX 5.1A

2002-03-08 Thread Jochen Roderburg

On Mar 8,  1:11pm, Rodriguez, Julian wrote:
 And these are the error messages:

 Continued download failed on this file, which conflicts with `-c'.
 Refusing to truncate existing file `suse-axp-CD2-20010509-2.iso'.


This is a problem I have also seen often with newer wget versions.
I've seen it with http-Downloads, with and without proxy.
More details: I have it with wget 1.7.x under IRIX and Linux.
Older version 1.6 always works fine in this situation.
Therefore I did not delete 1.6, also the old http parsing in
version 1.6 is sometimes more useful than the newer one.
I do not yet use version 1.8.x, because this has the not yet
repaired problem with the special character translation
in local filenames ;-)

Best Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany



Re: unsafe characters in filenames, was:handling of non-ascii characters in URL

2001-12-09 Thread Jochen Roderburg

Hello Hrovje,

 
 But: a character being unsafe for URL doesn't mean that the same
 character must be unsafe for the file name.  Wget currently contains
 the two, and that's a bug.  I'll try to fix that bug by adding another
 bitflag to the table, e.g. F which means reserved for file name,
 i.e. the character is unsafe, but don't touch it when encoding for
 file names.
 

Yes, I think, that is the point. The 'safe encoding' of 'unsafe characters'
is necessary and correct on the wire, but should not be used on local
filenames, where other constraints apply.

I know that various DOS and Windows versions have their own set of charcters
which are not allowed in filenames and cannot be used. I think this has been
discussed several times on the mailing list. Unix filesystems, on the other
hand, usually allow every character in a filename with the only exception,
the directory delimiter '/'.

Meanwhile I have found more and more ocurrences of the problem. The brackets
are relatively rare, that was just the first case I saw. Other cases are the
@-sign and the space. I can't say if this a severe problem for everybody,
at least for me the program is not usable in the present form, and I have
stopped my testing of this version meanwhile.

Unfortunately this is still the case with the 1.8 'release version' of
today, which I just downloaded and tried.

Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany



wget 1.8beta - handling of non-ascii characters in URL

2001-12-06 Thread Jochen Roderburg

Hello wget developers,

Found a new bug in wget 1.8 beta:

wget.17 http://www.polscan.hg.pl/scans/[GmbH]_Scenery_9.csv
--21:39:00--  http://www.polscan.hg.pl/scans/%5BGmbH%5D_Scenery_9.csv
   = `[GmbH]_Scenery_9.csv'

wget.18 http://www.polscan.hg.pl/scans/[GmbH]_Scenery_9.csv
--21:40:51--  http://www.polscan.hg.pl/scans/%5BGmbH%5D_Scenery_9.csv
   = `%5BGmbH%5D_Scenery_9.csv'

And the local filename is indeed then %5BGmbH%5D_Scenery_9.csv

Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany



Re: wget 1.8beta - handling of non-ascii characters in URL

2001-12-06 Thread Jochen Roderburg

 
 You could be describing a feature here - on how WGETn handles unsafe
 characters.  
 
 wget.18 http://www.polscan.hg.pl/scans/[GmbH]_Scenery_9.csv
 --21:40:51--  http://www.polscan.hg.pl/scans/%5BGmbH%5D_Scenery_9.csv
= `%5BGmbH%5D_Scenery_9.csv'
 

Do I understand you right? You mean it is a new feature of the wget version
1.8 that the brackets are now promoted to 'unsafe characters' ?

Regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany




Unexpected feature in wget 1.8 betas

2001-12-02 Thread Jochen Roderburg

Hello wget developers,

As a long time satisfied user of wget on various platforms I tried out the
announced new betas (on a Linux system). As one test I downloaded the new
beta2 of today with the just compiled beta1 with the following results:


wget.18 ftp://gnjilux.srk.fer.hr/pub/unix/util/wget/.betas/wget-1.8-beta2.tar.gz
--08:47:50--  ftp://gnjilux.srk.fer.hr/pub/unix/util/wget/.betas/wget-1.8-beta2.tar.gz
   = `wget-1.8-beta2.tar.gz/.listing'
Resolving gnjilux.srk.fer.hr... done.
Connecting to gnjilux.srk.fer.hr[161.53.70.141]:21... connected.
Logging in as anonymous ... Logged in!
== SYST ... done.== PWD ... done.
== TYPE I ... done.  == CWD /pub/unix/util/wget/.betas ... done.
== PORT ... done.== LIST ... done.

[ =] 0   --.--K/s
 

08:47:56 (0.00 B/s) - `wget-1.8-beta2.tar.gz/.listing' saved [0]

Removed `wget-1.8-beta2.tar.gz/.listing'.
--08:47:56--  ftp://gnjilux.srk.fer.hr/pub/unix/util/wget/.betas/wget-1.8-beta2.tar.gz
   = `wget-1.8-beta2.tar.gz.1'
== CWD not required.
== PORT ... done.== RETR wget-1.8-beta2.tar.gz ... done.
Length: 1,059,802 (unauthoritative)

100%[===] 1,059,802   15.72K/s 
ETA 00:00   

08:49:03 (15.72 KB/s) - `wget-1.8-beta2.tar.gz.1' saved [1059802]


You see, it created a directory with the filename of the file to be
downloaded and saved the temporary .listing file in that directory.
This file was deleted but the directory stayed there, so that the file
itself was saved under a new name.

Moreover, if the file already existed under the correct name, it was
silently deleted in the process and replaced by that directory.

This behaviour is still present in the second beta.


Best regards and thank you once more for this really useful utility,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany