Re: Abort trap

2007-09-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Josh Williams wrote:
> On 9/13/07, Hex Star <[EMAIL PROTECTED]> wrote:
>> wget 1.9+cvs-dev
> 
> Try it in either the latest release or (preferably) the subversion
> trunk and let us know if you still have the same problem. The version
> you're using is an old trunk version, so we can safely assume that it
> has plenty of fixed bugs anyways.

Current development trunk would be much preferable; many bugs since
1.10.2 have been fixed as well.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG6ecG7M8hyUobTrERCHYjAJ4zHwIY/YKZHhptDXNE1AzU9qM1CQCfRbKO
aKYIYm41TOhXBbKTyzX4FpA=
=vsZC
-END PGP SIGNATURE-


Re: Abort trap

2007-09-13 Thread Josh Williams
On 9/13/07, Hex Star <[EMAIL PROTECTED]> wrote:
> wget 1.9+cvs-dev

Try it in either the latest release or (preferably) the subversion
trunk and let us know if you still have the same problem. The version
you're using is an old trunk version, so we can safely assume that it
has plenty of fixed bugs anyways.


Re: Abort trap

2007-09-13 Thread Hex Star
On 9/13/07, Micah Cowan <[EMAIL PROTECTED]> wrote:
>
>
>
> One crucial bit of information you've left out, is which version of Wget
> you're running. :)
>
>

Oops sorry about that, the version is...

wget 1.9+cvs-dev


Re: Wget automatic download from RSS feeds

2007-09-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Josh Williams wrote:
> On 9/12/07, Erik Bolstad <[EMAIL PROTECTED]> wrote:
>> Hi!
>> I'm doing a master thesis on online news at the University of Oslo,
>> and need a software that can download html pages based on RSS feeds.
>>
>> I suspect that Wget could be modified to do this.
>>
>> - Do you know if there are any ways to get Wget to read RSS files and
>> download new files every hour or so?
>> - If not: Have you heard about software that can do this?
>>
>> I am very grateful for all help and tips.
> 
> Wget does not do this. That would be a great feature, but I don't
> believe parsing the RSS feed is Wget's job. Wget just fetches the
> files.
> 
> I recommend you look for a program that simply parses the RSS feed and
> dumps the URLs to a file for Wget to fetch. Piping.. that's what UNIX
> is all about ;-)

Might make a very interesting plugin, though, once we've added that
functionality in "Wget 2.0".

That won't be for quite some time, though.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG6b3v7M8hyUobTrERCNqQAJ91+Wsetv7LMoCdGrAN9txlQxbikwCfZp7L
kRPNYcf7VPaW/QouNXVFEHc=
=y81P
-END PGP SIGNATURE-


Re: Wget automatic download from RSS feeds

2007-09-13 Thread Josh Williams
On 9/12/07, Erik Bolstad <[EMAIL PROTECTED]> wrote:
> Hi!
> I'm doing a master thesis on online news at the University of Oslo,
> and need a software that can download html pages based on RSS feeds.
>
> I suspect that Wget could be modified to do this.
>
> - Do you know if there are any ways to get Wget to read RSS files and
> download new files every hour or so?
> - If not: Have you heard about software that can do this?
>
> I am very grateful for all help and tips.

Wget does not do this. That would be a great feature, but I don't
believe parsing the RSS feed is Wget's job. Wget just fetches the
files.

I recommend you look for a program that simply parses the RSS feed and
dumps the URLs to a file for Wget to fetch. Piping.. that's what UNIX
is all about ;-)

I don't have any recommendations, unfortunately. If you aren't able to
find one, let me know, and I'll try to come up with one.

Josh


Wget automatic download from RSS feeds

2007-09-13 Thread Erik Bolstad
Hi!
I'm doing a master thesis on online news at the University of Oslo,
and need a software that can download html pages based on RSS feeds.

I suspect that Wget could be modified to do this.

- Do you know if there are any ways to get Wget to read RSS files and
download new files every hour or so?
- If not: Have you heard about software that can do this?

I am very grateful for all help and tips.

Thanks a lot!
Erik Bolstad


wget -c problem with current svn version

2007-09-13 Thread Jochen Roderburg

Continued download (wget -c) is not done in the current svn version with default
options (where no HEAD is used). The download starts instead at byte 0 again.
When other options require a HEAD, it works ok again. Perhaps the correction is
as easy as adding the '-c' case to those options that need a HEAD request.  ;-)

Regards, J.Roderburg

Log outputs for various versions:

Version 1.10.2 does no HEAD, but immediately a GET with Range

wget.1102 --debug -c http://ftp.uni-koeln.de/files.lst.gz

Setting --continue (continue) to 1
DEBUG output created by Wget 1.10.2 on linux-gnu.

--22:48:58--  http://ftp.uni-koeln.de/files.lst.gz
   => iles.lst.gz'
Resolving ftp.uni-koeln.de... 134.95.19.35
Caching ftp.uni-koeln.de => 134.95.19.35
Connecting to ftp.uni-koeln.de|134.95.19.35|:80... connected.
Created socket 3.
Releasing 0x08084a00 (new refcount 1).

---request begin---
GET /files.lst.gz HTTP/1.0
Range: bytes=6033568-
User-Agent: Wget/1.10.2
Accept: */*
Host: ftp.uni-koeln.de
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 206 Partial Content
Date: Thu, 13 Sep 2007 20:48:59 GMT
Server: Apache/2.0.46 (Red Hat)
Last-Modified: Wed, 12 Sep 2007 04:08:33 GMT
ETag: "1b7500ba-1524e9d-60055240"
Accept-Ranges: bytes
Content-Length: 16137725
Content-Range: bytes 6033568-22171292/22171293
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: application/x-gzip
Content-Encoding: x-gzip

---response end---
206 Partial Content
Registered socket 3 for persistent reuse.
Length: 22,171,293 (21M), 16,137,725 (15M) remaining [application/x-gzip]

59% [+++=>] 13,095,904
1.61M/sETA 00:05


SVN version from a month ago does a HEAD and a GET with Range

wget.111-svn-0708 --debug -c http://ftp.uni-koeln.de/files.lst.gz

Setting --continue (continue) to 1
DEBUG output created by Wget 1.10+devel on linux-gnu.

--22:52:40--  http://ftp.uni-koeln.de/files.lst.gz
Resolving ftp.uni-koeln.de... 134.95.19.35
Caching ftp.uni-koeln.de => 134.95.19.35
Connecting to ftp.uni-koeln.de|134.95.19.35|:80... connected.
Created socket 3.
Releasing 0x080884c8 (new refcount 1).

---request begin---
HEAD /files.lst.gz HTTP/1.0
User-Agent: Wget/1.10+devel
Accept: */*
Host: ftp.uni-koeln.de
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Thu, 13 Sep 2007 20:52:40 GMT
Server: Apache/2.0.46 (Red Hat)
Last-Modified: Wed, 12 Sep 2007 04:08:33 GMT
ETag: "1b7500ba-1524e9d-60055240"
Accept-Ranges: bytes
Content-Length: 22171293
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: application/x-gzip
Content-Encoding: x-gzip

---response end---
200 OK
Registered socket 3 for persistent reuse.
Length: 22171293 (21M) [application/x-gzip]
--22:52:40--  http://ftp.uni-koeln.de/files.lst.gz
Reusing existing connection to ftp.uni-koeln.de:80.
Reusing fd 3.

---request begin---
GET /files.lst.gz HTTP/1.0
Range: bytes=6033568-
User-Agent: Wget/1.10+devel
Accept: */*
Host: ftp.uni-koeln.de
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 206 Partial Content
Date: Thu, 13 Sep 2007 20:52:40 GMT
Server: Apache/2.0.46 (Red Hat)
Last-Modified: Wed, 12 Sep 2007 04:08:33 GMT
ETag: "1b7500ba-1524e9d-60055240"
Accept-Ranges: bytes
Content-Length: 16137725
Content-Range: bytes 6033568-22171292/22171293
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Content-Type: application/x-gzip
Content-Encoding: x-gzip

---response end---
206 Partial Content
Length: 22171293 (21M), 16137725 (15M) remaining [application/x-gzip]
Saving to: iles.lst.gz'
58% [==> ] 13,030,816 
1.50M/s  eta 6s


Current SVN version does no HEAD and a GET for the whole file again

wget.111-svn-0709 --debug -c http://ftp.uni-koeln.de/files.lst.gz
Setting --continue (continue) to 1
DEBUG output created by Wget 1.10+devel on linux-gnu.

--22:56:39--  http://ftp.uni-koeln.de/files.lst.gz
Resolving ftp.uni-koeln.de... 134.95.19.35
Caching ftp.uni-koeln.de => 134.95.19.35
Connecting to ftp.uni-koeln.de|134.95.19.35|:80... connected.
Created socket 3.
Releasing 0x080884c8 (new refcount 1).

---request begin---
GET /files.lst.gz HTTP/1.0
User-Agent: Wget/1.10+devel
Accept: */*
Host: ftp.uni-koeln.de
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Thu, 13 Sep 2007 20:56:39 GMT
Server: Apache/2.0.46 (Red Hat)
Last-Modified: Wed, 12 Sep 2007 04:08:33 GMT
ETag: "1b7500ba-1524e9d-60055240"
Accept-Ranges: bytes
Content-Length: 22171293
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: application/x-gzip
Content-Encoding: x-gzip

---response end---
200 OK
Registered socket 3 for persistent reuse.
Length: 22171293 (21M) [application/x-gzip]
Saving to: iles.lst.gz'

26% [

Re: Timeout workaround?

2007-09-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Micah Cowan wrote:
> Todd Plessel wrote:
>> Q2. If not, then could the PERL-CGI script be modified to spawn a
>> thread that writes an ack to stderr to keep the httpd from timing-out?
>> If so, can you point me to some sample code?
> 
> This would be the better solution; but I don't know how it's done. I
> think some servers will automatically send an ack if you write
> something, anything, to stderr, but I'm not sure. You'll have to check
> in your server's documentation.

You could possibly hack the server source (if you have access to it) to
set the SO_KEEPALIVE socket option.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG6aXW7M8hyUobTrERCKLmAJsHxEDjacLwkLOkZn9XBgGZypIWrgCfTg19
Z2irIp3baPgMI0iAD++Kh5M=
=+R6Z
-END PGP SIGNATURE-


Re: Different exit status for 404 error?

2007-09-13 Thread Alex Owen
On 13/09/2007, Micah Cowan <[EMAIL PROTECTED]> wrote:
>
> Alex Owen wrote:
> >
> > I think it would be nice if the exit code of wget could be inspected
> > to determin if wget failed because of a 404 error or some other
> > reason.
>
> Hi Alex,
>
> We do plan to look evaluate differentiation of exit statuses at some
> point in the future;

Hi Micah,
Thanks for your prompt reply!
Glad it is on the ToDo list even if a low priotity.

> it just needs a final agreement on how exit codes
> should be divided, which means discussion. Actual implementation will be
> trivial. But, in the meantime, I'm not sure I want to introduce
> different exit codes on an individual basis.

Fair comment!

For my Debian usage case we would in fact want to use "busybox
wget"... however we would want to keep that compatible with GNU-wget
for obvious reasons.
So thanks for the bug pointer... I will track that to see how the
"final agreement on how exit codes should be divided" works out then I
help patch busybox wget to match your spec!


Regards
Alex Owen


Re: Timeout workaround?

2007-09-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Todd Plessel wrote:
> Q1. Is there a way that I can run wget that somehow avoids this
> timeout. For example, by sending an out-of-band ack to stderr every
> 30 seconds so httpd does not disconnect.
> By out-of-band, I mean it cannot be included in the result bytes
> streamed to stdout since these are specific binary data formats,
> images, etc.

I don't think that changing Wget for this would be appropriate. It's not
Wget's responsibility to ensure the server doesn't time out; it's the
server's. Of course, you're welcome to make such a change yourself
(that's what Free Software is all about!), but I can't tell you how it
might be done, and it may be system-dependent.

> Q2. If not, then could the PERL-CGI script be modified to spawn a
> thread that writes an ack to stderr to keep the httpd from timing-out?
> If so, can you point me to some sample code?

This would be the better solution; but I don't know how it's done. I
think some servers will automatically send an ack if you write
something, anything, to stderr, but I'm not sure. You'll have to check
in your server's documentation.

It seems to me, though, that the infrastructure should be rearchitected
a bit to avoid such extremely large waiting periods; it strikes me as
very inefficient.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG6aOQ7M8hyUobTrERCKDFAJ0ej/1f/6MOAV/ziEPD8rc15lejLACZAR5a
wUsiaWXaBxbp0M7ydD72jLI=
=LGVB
-END PGP SIGNATURE-


Re: Different exit status for 404 error?

2007-09-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Alex Owen wrote:
> Hello,
> 
> If i run :
> wget http://server.domain/file
> 
> How can I differentiate between a network problem that made wget fail
> of the server sending back a HTTP 404 error?
> 
> ( I have a use case described in debian bug  http://bugs.debian.org/422088 )
> 
> I think it would be nice if the exit code of wget could be inspected
> to determin if wget failed because of a 404 error or some other
> reason.

Hi Alex,

We do plan to look evaluate differentiation of exit statuses at some
point in the future; however, it's not one of our very highest
priorities for the moment, and it's currently targeted for Wget 1.13
(the bug report is at https://savannah.gnu.org/bugs/index.php?20333, but
there's really not much description there). We are about to release Wget
1.11, hopefully within a month.

It is possible that this item will be targeted for a sooner release, in
Wget 1.12; mostly it just needs a final agreement on how exit codes
should be divided, which means discussion. Actual implementation will be
trivial. But, in the meantime, I'm not sure I want to introduce
different exit codes on an individual basis.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG6aGF7M8hyUobTrERCO03AJ9FllUtvfZf3+aUX0a+PP1h5EBILgCdEwj8
yDjpiVmkBf/3OQ2IRSILCTs=
=BVBE
-END PGP SIGNATURE-


Re: Myriad merges

2007-09-13 Thread Jochen Roderburg
Zitat von Micah Cowan <[EMAIL PROTECTED]>:

> > Btw, continued downloads (wget -c) are also
> > broken now in this case (probably for the same reason).
>
> Really? I've been using this Wget version for a bit, and haven't noticed
> this problem. Could you give an invocation that produces this problem?
>

I'll make a new thread for this problem, as it meanwhile looks like a different
case again   ;-)

J.Roderburg



Different exit status for 404 error?

2007-09-13 Thread Alex Owen
Hello,

If i run :
wget http://server.domain/file

How can I differentiate between a network problem that made wget fail
of the server sending back a HTTP 404 error?

( I have a use case described in debian bug  http://bugs.debian.org/422088 )

I think it would be nice if the exit code of wget could be inspected
to determin if wget failed because of a 404 error or some other
reason.

May I be so bold as to propose that an exit code of 4 be used for 404 errors?

Thankyou for your time in considering this.

Alex Owen

PS: please CC as I am not subscribed to the list.


Re: Abort trap

2007-09-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hex Star wrote:
> Oh and the configuration on which wget was running is: PowerBook G4
> 1.5ghz (PowerPC), 768mb ram, Mac OS X 10.4.10

One crucial bit of information you've left out, is which version of Wget
you're running. :)

Sorry if it took a while to respond to your message; the mailing list
went down about five days ago... :/

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG6Xo67M8hyUobTrERCF8EAJ488clSEHr6SonWics5IYQil1BzgwCdETKB
JuLemhwNonbhIPJ3mcL+wPU=
=xYpF
-END PGP SIGNATURE-


Re: Services down last night

2007-09-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Micah Cowan wrote:
> I haven't discovered why yet, but all of addictivecode.org's internet
> services went down last night around 7:30 pm PDT (02:30 UTC).

Note that the addictivecode.org failure was completely unrelated to the
main Wget mailing list going down for about five days; just coincidental
(addictivecode.org apparently ran out of memory and OOM-killed almost
everything). I haven't discovered what the cause of that was, yet.

I discovered yesterday that I had failed to bring up mailman on
addictivecode.org, so wget-notify (which receives SVN commits and
Savannah bug changes) was down until I realized and brought it back up.

For information on the dotsrc.org (sunsite.dk) issues, see
http://www.dotsrc.org/news/. They seem not to have announced the servers
coming back up; perhaps not all of them have. Dotsrc's issues appear to
be the result of an upgrade gone wrong last weekend. :/

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG6Xj27M8hyUobTrERCJbOAJ4or+ienl71aoEyiSfxuz3om+LSbQCff2Ix
MnEudLhX7VtfVv/fYQnVcZo=
=uvVr
-END PGP SIGNATURE-


Re: Abort trap

2007-09-13 Thread Hex Star
Oh and the configuration on which wget was running is: PowerBook G4
1.5ghz(PowerPC), 768mb ram, Mac OS X
10.4.10


Re: Abort trap

2007-09-13 Thread Hex Star
On 9/13/07, Josh Williams <[EMAIL PROTECTED]> wrote:
>
>
>
> "failed assertion" means that at some point along the line, one of the
> variables's value was not what it should have been.
>
> I'll check into it. Thanks!
>

Ok great, thanks :)


Re: Abort trap

2007-09-13 Thread Josh Williams
On 9/11/07, Hex Star <[EMAIL PROTECTED]> wrote:
> When I try to execute the command (minus quotes) "wget -P ftp.usask.ca -r
> -np -passive-ftp ftp://ftp.usask.ca/pub/mirrors/apple/";
> wget works for a bit and then terminates with the following error:
>
> xmalloc.c:186: failed assertion `ptr !=NULL'
> Abort trap
>
> What causes this error? What does this error mean? Is this due to a server
> misconfiguration? Thanks! :)
>
> P.S. I am not subscribed to this list, please cc all replies to me...thanks!
> :)

"failed assertion" means that at some point along the line, one of the
variables's value was not what it should have been.

I'll check into it. Thanks!


Abort trap

2007-09-13 Thread Hex Star
When I try to execute the command (minus quotes) "wget -P ftp.usask.ca -r
-np -passive-ftp ftp://ftp.usask.ca/pub/mirrors/apple/"; wget works for a bit
and then terminates with the following error:

xmalloc.c:186: failed assertion `ptr !=NULL'
Abort trap

What causes this error? What does this error mean? Is this due to a server
misconfiguration? Thanks! :)

P.S. I am not subscribed to this list, please cc all replies to me...thanks!
:)


Timeout workaround?

2007-09-13 Thread Todd Plessel

Problem:

I'm using

wget -q -T 0 -O - 'http://some.remote.host/cgi-bin/some_script?...'

to access a PERL-CGI script on a remote
computer running Apache httpd that is configured with a 300 second
timeout.
The script sometimes takes more than 300 seconds to begin sending
data (because there is often significant data processing required
before any bytes can even begin streaming).
Consequently, it disconnects at exactly 300 seconds and no bytes
are received.

Q1. Is there a way that I can run wget that somehow avoids this
timeout. For example, by sending an out-of-band ack to stderr every
30 seconds so httpd does not disconnect.
By out-of-band, I mean it cannot be included in the result bytes
streamed to stdout since these are specific binary data formats,
images, etc.

Q2. If not, then could the PERL-CGI script be modified to spawn a
thread that writes an ack to stderr to keep the httpd from timing-out?
If so, can you point me to some sample code?

Thanks,

Todd




Re: forum download, cookies?

2007-09-13 Thread Josh Williams
On 9/12/07, Juhana Sadeharju <[EMAIL PROTECTED]> wrote:
>
> A forum has topics which are available only for members.
> How to use wget for downloading copy of the pages in that
> case? How to get the proper cookies and how to get wget to
> use them correctly? I use IE in PC/Windows and wget in
> a unix computer. I could use Lynx in the unix computer
> if needed.
>
> (PC/Windows has Firefox but I cannot install anything new.
> If Firefox has a downloader plugin suitable for forum
> downloading, that would be ok.)
>
> Juhana

Firefox stores a cookies.txt file in the profile directory. In
Windows, I believe this is located in "C:/Documents and
Settings/{username}/Application
Data/Mozilla/firefox/profiles/PROFILE/cookies.txt".

GNU Wget is compatible with this cookies file. Just use the
`--load-cookies file` option.


Re: Wrong log output for wget -c

2007-09-13 Thread Josh Williams
On 9/9/07, Jochen Roderburg <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> This is now an easy case for a change  ;-)
>
> In the log output for "wget -c" we have the line:
>
>The sizes do not match (local 0) -- retrieving.
>
> This shows always 0 as local size in the current svn version.
>
> The variable which is printed here is "local_size" which is initialized to 0 
> and
> used nowhere else. I think this variable was just "forgotten" on a recent code
> reorganization. Comparing an old version with the current I think the
> information is now in "hstat.orig_file_size", I attach my little patch for
> this.
>
> I have also seen another much more complicated and rare log output problem 
> with
> restarted requests, but so far I was not able to reconstruct a real-life
> example for it again. It happens when on multiple retries the "Range" request
> is not hnoured by the server and transfer starts again at byte 0. It looked
> like not all variables for the display of the progress bar are correctly
> adjusted to this situation. I'll keep on trying  ;-)

Hi! Thanks for your contribution. I just looked over your patch and it
looks good. I've committed the changes to:


After Micah (the maintainer) inspects it, it should go right into the
trunk. Thanks!


Wrong log output for wget -c

2007-09-13 Thread Jochen Roderburg

Hi,

This is now an easy case for a change  ;-)

In the log output for "wget -c" we have the line:

   The sizes do not match (local 0) -- retrieving.

This shows always 0 as local size in the current svn version.

The variable which is printed here is "local_size" which is initialized to 0 and
used nowhere else. I think this variable was just "forgotten" on a recent code
reorganization. Comparing an old version with the current I think the
information is now in "hstat.orig_file_size", I attach my little patch for
this.

I have also seen another much more complicated and rare log output problem with
restarted requests, but so far I was not able to reconstruct a real-life
example for it again. It happens when on multiple retries the "Range" request
is not hnoured by the server and transfer starts again at byte 0. It looked
like not all variables for the display of the progress bar are correctly
adjusted to this situation. I'll keep on trying  ;-)


Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany



http.c.diff
Description: Binary data


Services down last night

2007-09-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I haven't discovered why yet, but all of addictivecode.org's internet
services went down last night around 7:30 pm PDT (02:30 UTC). The web
and ssh services were brought back up in response to an email query,
around 2:30 am PDT (09:30 UTC), but it wasn't until I checked again this
morning around 10:30 am, that I was able to log in and restore the
remaining services.

This means that the Wget Wgiki was down for about 7 hours, and the
Subversion repository and addictivecode.org-hosted mailing lists for
about 15 hours. Sorry for the interruption; I'll be working with the
provider to help ensure this doesn't happen again.

Addictivecode.org is hosted on a VPS; the VPS itself didn't go down, as
it was pingable, and the logs show cron firing normally. Somehow, all
internet-connected services (don't know whether any non-internet
services were affected) were apparently killed without producing logs... :/

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG5aJ87M8hyUobTrERCMdwAJ0UfxaD3bLupnBir5YHeyLodLL/lQCfXzac
48QwZcPP7lLj3e2DlmJadTA=
=SgA4
-END PGP SIGNATURE-


Re: GNU wget suggestion

2007-09-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Sean Walmsley wrote:
> Micah:

Hi Sean,

I prefer to keep discussions on the main Wget mailing list where
possible, so we benefit from the input of anyone else who's able to
contribute; I've Cc'd the list with my response.

>>From the GNU webpage, I understand that you are the current
> wget maintainer.
> 
> We're currently having some trouble preventing wget from displaying
> the --http-password and --http-user arguments in the output of the
> ps command.
> 
> Unfortunately, we can't use the workaround suggested in the wget FAQ
> (include the username/password in the URL and pass it via the -i flag)
> because our corporate proxy server blocks http traffic containing
> passwords. this blocking is done because, according to recent
> versions of the URL specification

Have you considered the other workaround suggested by the FAQ (though, I
see now, it was poorly written; I'll fix that after this send): to put
the password information in a wgetrc?

You could place it in a separate wgetrc than your normal one, and use
the WGETRC environment variable to give wget it's location (I'd like to
add a --wgetrc option at some point).

> My suggestion is to have wget check for the presence of the
> following environment variables and use them if they
> are present and the corresponding command line arguments are
> not:
> 
>   http_user   (--http-user arg and .wgetrc value would override)
>   http_password   (--http-password and .wgetrc value arg would 
> override)
> 
> These would be analagous to http_proxy which is read from the
> $http_proxy environment variable unless it is overridden by the
> .wgetrc file.
> 
> This would allow each calling process to set the values of these
> variables and then call wget without having the values show up in
> the output of ps.

Well, the currently planned ask-user-to-input-password feature (which
the FAQ mentions, but only in the Wgiki, which you may not have been
reading, since I just announced/started redirecting to it) would also
accomplish that, and I'm not sure I see the utility in creating
environment variables for this purpose... plus, if I add these, I'd
really need to add user, password, ftp_user, and ftp_password as well.

For now, I'd recommend placing them in a file you specify as your
WGETRC; then, when Wget 1.12 is release, you can switch to having Wget
ask for the password up-front.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG4hcy7M8hyUobTrERCLrFAJ4uKJjKWH7wr6bI01PkqwQX7rzMdgCdGDGL
NSeVeLloGslGEI6CRwxtnAU=
=QSq5
-END PGP SIGNATURE-


ohloh

2007-09-13 Thread Daniel Stenberg

Hi guys,

ohloh.net keeps track of FLOSS authors and projects and do some interesting 
stats and numbers. Wget is listed too:


http://www.ohloh.net/projects/7947?p=Wget

(No I'm not involved with the site in any way but as a happy visitor and 
registered user.)


Re: Myriad merges

2007-09-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Jochen Roderburg wrote:
> Zitat von Micah Cowan <[EMAIL PROTECTED]>:
> 
>> Hm... that change came from the Content-Disposition fixes. I'll investigate.
>>
> 
> OK, but I hope I am still allowed to help a little with the investigation  ;-)

Oh, I'm always very, _very_ happy to get help. :D

> I made a few more tests and some debugging now and I am convinced now that 
> this
> "if send_head_first" is definitely the "immediate cause" for the new problem
> that the remote timestamp is not picked up on GET-only requests.



> Btw, continued downloads (wget -c) are also
> broken now in this case (probably for the same reason).

Really? I've been using this Wget version for a bit, and haven't noticed
this problem. Could you give an invocation that produces this problem?

> I meanwhile also believe that the primary issue we are trying to repair (first
> found remote time-stamp is used for local and not last found) has always been
> there. Only a year ago when the contentdisposition stuff was included and more
> HEAD requests were made I really noticed it. I remember that it had always 
> been
> more difficult to get a newer file downloaded through the proxy-cache when a
> local file was present, but as these cases were rare, I had never tried to
> investigate this before  ;-)

I'm not surprised to hear this; it didn't look like it had ever been
working before... and it's not a common situation, so I'm not surprised
it wasn't caught earlier, either.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG49HJ7M8hyUobTrERCJvXAJ0QHN8/8e9EcWUFV10RIWOIisRrnwCggzqI
62SZmq7si3/p3be41IVIjj0=
=TBid
-END PGP SIGNATURE-


Re: Myriad merges

2007-09-13 Thread Jochen Roderburg
Zitat von Micah Cowan <[EMAIL PROTECTED]>:

> > And the only other code I found which parses the remote date is in the part
> > which handles the logic around the timestamping option. In older versions
> this
> > was a conditional block starting with  if (!got_head) ...  , now it starts
> with
> >  if (send_head_first && !got_head) ...   Could this mean that this code is
> now
> > only executed when a HEAD response is examined ??
>
> Hm... that change came from the Content-Disposition fixes. I'll investigate.
>

OK, but I hope I am still allowed to help a little with the investigation  ;-)

I made a few more tests and some debugging now and I am convinced now that this
"if send_head_first" is definitely the "immediate cause" for the new problem
that the remote timestamp is not picked up on GET-only requests.

This change is relatively new, it had not been in the next-to-last svn version
that I compiled a month ago. Certainly there must have been a reason for this
but one sure side effect is that this if-block of code is not executed any
longer for the HEAD-less case. Btw, continued downloads (wget -c) are also
broken now in this case (probably for the same reason).

I meanwhile also believe that the primary issue we are trying to repair (first
found remote time-stamp is used for local and not last found) has always been
there. Only a year ago when the contentdisposition stuff was included and more
HEAD requests were made I really noticed it. I remember that it had always been
more difficult to get a newer file downloaded through the proxy-cache when a
local file was present, but as these cases were rare, I had never tried to
investigate this before  ;-)

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany



forum download, cookies?

2007-09-13 Thread Juhana Sadeharju

A forum has topics which are available only for members.
How to use wget for downloading copy of the pages in that
case? How to get the proper cookies and how to get wget to
use them correctly? I use IE in PC/Windows and wget in
a unix computer. I could use Lynx in the unix computer
if needed.

(PC/Windows has Firefox but I cannot install anything new.
If Firefox has a downloader plugin suitable for forum
downloading, that would be ok.)

Juhana