Re: Only follow paths with /res/ in them

2008-11-19 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Brian wrote:
 I would like to follow all the urls on a site that contain /res/ in the
 path. I've tried using -I and -A, with values such as res, *res*,
 */res/*, etc.. Here is an example that downloads pretty much the entire
 site, rather than what I appear  (to me) to have specified:
 
 wget -O- -q http://img.site.org/b/imgboard.html | wget -q -r -l1 -O- -I
 '*res*' -A '*res*' --force-html -B http://img.site.org/b/ -i-
 
 The urls I would like to follow and output to the command line are of
 the form:
 
 http://img.site.org/b/res/97867797.html

- -A isn't useful here: it's applied only against the filename portion
of the URL.

- -I is what you want; the trouble is that the * wildcard doesn't match
slashes (there's plans to introduce a ** wildcard, probably in 1.13). So
unfortunately you gotta do -I'res,*/res,*/*/res' etc as needed.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkkk7awACgkQ7M8hyUobTrG2wgCeMUN3EnnY2VsmNzQTWOleZKqg
ZQYAn1CYoQ7JVc4OYfwLzcPVkai93UQc
=3I6Z
-END PGP SIGNATURE-


Re: Only follow paths with /res/ in them

2008-11-19 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Oh! Please don't use this list (wget@sunsite.dk) any more; I'm trying to
get the dotsrc folks to make it go away/forward to bug-wget (I need to
ping 'em on this again). The official list for Wget is now [EMAIL PROTECTED]

Micah Cowan wrote:
 Brian wrote:
 I would like to follow all the urls on a site that contain /res/ in the
 path. I've tried using -I and -A, with values such as res, *res*,
 */res/*, etc.. Here is an example that downloads pretty much the entire
 site, rather than what I appear  (to me) to have specified:
 
 wget -O- -q http://img.site.org/b/imgboard.html | wget -q -r -l1 -O- -I
 '*res*' -A '*res*' --force-html -B http://img.site.org/b/ -i-
 
 The urls I would like to follow and output to the command line are of
 the form:
 
 http://img.site.org/b/res/97867797.html
 
 -A isn't useful here: it's applied only against the filename portion
 of the URL.
 
 -I is what you want; the trouble is that the * wildcard doesn't match
 slashes (there's plans to introduce a ** wildcard, probably in 1.13). So
 unfortunately you gotta do -I'res,*/res,*/*/res' etc as needed.
 

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkkk7j0ACgkQ7M8hyUobTrH+CACbBzcO4vM6qHIumBeDS2ZyAdfq
ONYAnjX7SHAOvEJylkbjjq7IsDXEv+27
=3Hrq
-END PGP SIGNATURE-


Re: --mirror and --cut-dirs=2 bug?

2008-11-03 Thread Brock Murch
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Micah,

Many thanks with all your very timely help. I have had no issues since 
following you instructions to upgrade to 1.11.4 and installing it in the /opt 
directory. I used:

$ ./configure --prefix=/opt/wget

And point to ist specifically:

/opt/wget/bin/wget  --tries=10 -r -N -l inf --wait=1\
-nH --cut-dirs=2 ftp://oceans.gsfc.nasa.gov/MODISA/ATTEPH/ \
-o /home1/software/modis/atteph/mirror_a.log \
--directory-prefix=/home1/software/modis/atteph

Thanks again.

Brock


On Monday 27 October 2008 3:06 pm, Micah Cowan wrote:
 Brock Murch wrote:
  Sorry, 1 quick question? Do you know of anyone providing rpm's of 1.11.4
  for CentOS?

 Not offhand. It may not yet be available; it was only packaged for
 Fedora Core a couple months ago, I think. RPMfind.net just lists 1.11.4
 sources for fc9 and fc10.

  If not, would you recommend uninstalling the current one? Before
  installing from your src? Many thanks.

 I'd advise against that: I believe various important components of Red
 Hat/CentOS rely on wget to fetch things. Sometimes minor changes in the
 output/interface of wget cause problems for automated scripts that form
 an integral part of an operating system. Though really, I think most of
 the changes that would pose such a danger are actually already in the
 Red Hat modified 1.10.2 sources (taken from the development sources
 for what was later released as 1.11).

 What I tend to do on my systems, is to configure the sources like:

   $ ./configure --prefix=$HOME/opt/wget

 and then either add $HOME/opt to my $PATH, or invoke it directly as
 $HOME/opt/wget/bin/wget.

 Note that if you want to build wget with support for HTTPS, you'll need
 to have the development package for openssl installed.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFJDwveMAkzD2qY/pURAmvuAJ9XG784Djq0mwcTu/nN56tPSM+AMQCgm2KX
dzPQ263FF7Gaw4qtE1X0wTI=
=CC9T
-END PGP SIGNATURE-



Re: MAILING LIST IS MOVING: [EMAIL PROTECTED]

2008-11-02 Thread Maciej W. Rozycki
On Sat, 1 Nov 2008, Micah Cowan wrote:

   I am puzzled.  You mean you declare wget@sunsite.dk retired and 
  [EMAIL PROTECTED] is to be used from now on for the purpose the former 
  list instead?  And [EMAIL PROTECTED] will most likely be retired 
  as well soon with the replacement to be [EMAIL PROTECTED] as well?
 
 Yup, that's what I mean.

 Thanks a lot -- good to know my brain has not completely rotted yet.

  Maciej


Re: MAILING LIST IS MOVING: [EMAIL PROTECTED]

2008-11-01 Thread Maciej W. Rozycki
On Fri, 31 Oct 2008, Micah Cowan wrote:

 I will ask the dotsrc.org folks to set up this mailing list as a
 forwarding alias to [EMAIL PROTECTED] (the reverse of recent history). At
 that time, no further mails will be sent to subscribers of this list.
 Please subscribe to [EMAIL PROTECTED] instead.
 
 At this time, I'm thinking of merging wget@sunsite.dk and
 [EMAIL PROTECTED]; there isn't really enough traffic to justify
 separate lists, IMO; and often discussions come up on submitted patches
 that are of interest to everyone.

 I am puzzled.  You mean you declare wget@sunsite.dk retired and 
[EMAIL PROTECTED] is to be used from now on for the purpose the former 
list instead?  And [EMAIL PROTECTED] will most likely be retired 
as well soon with the replacement to be [EMAIL PROTECTED] as well?

  Maciej


Re: MAILING LIST IS MOVING: [EMAIL PROTECTED]

2008-11-01 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Maciej W. Rozycki wrote:
 On Fri, 31 Oct 2008, Micah Cowan wrote:
 
 I will ask the dotsrc.org folks to set up this mailing list as a
 forwarding alias to [EMAIL PROTECTED] (the reverse of recent history). At
 that time, no further mails will be sent to subscribers of this list.
 Please subscribe to [EMAIL PROTECTED] instead.

 At this time, I'm thinking of merging wget@sunsite.dk and
 [EMAIL PROTECTED]; there isn't really enough traffic to justify
 separate lists, IMO; and often discussions come up on submitted patches
 that are of interest to everyone.
 
  I am puzzled.  You mean you declare wget@sunsite.dk retired and 
 [EMAIL PROTECTED] is to be used from now on for the purpose the former 
 list instead?  And [EMAIL PROTECTED] will most likely be retired 
 as well soon with the replacement to be [EMAIL PROTECTED] as well?

Yup, that's what I mean.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJDIA77M8hyUobTrERAkr4AJwK7uoprV2Am1j9dAzNkLgQLZz8FwCdEM2q
2AMuQCNzrZzsVaz1UxvBCuk=
=WiLZ
-END PGP SIGNATURE-


MAILING LIST IS MOVING: [EMAIL PROTECTED]

2008-10-31 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

[EMAIL PROTECTED] is now back in business as a full-fledged mailing list,
and not just a forwarding alias to here. Please subscribe using the
interface at http://lists.gnu.org/mailman/listinfo/bug-wget/ at your
earliest convenience.

I had hoped to leave forwarding still enabled during the transition; I
subscribed wget@sunsite.dk but that did not seem to do the trick. So
mails at [EMAIL PROTECTED] will not show up here at the present time.

I will ask the dotsrc.org folks to set up this mailing list as a
forwarding alias to [EMAIL PROTECTED] (the reverse of recent history). At
that time, no further mails will be sent to subscribers of this list.
Please subscribe to [EMAIL PROTECTED] instead.

At this time, I'm thinking of merging wget@sunsite.dk and
[EMAIL PROTECTED]; there isn't really enough traffic to justify
separate lists, IMO; and often discussions come up on submitted patches
that are of interest to everyone.

Please avoid continued use of this list if possible. The gmane and
mail-archive.com sites will be asked to use the new list for archiving
purposes (and of course, bug-wget will also be archived via GNU's
pipermail setup).

Some of the reasons for this migration may be found at
http://article.gmane.org/gmane.comp.web.wget.general/8200/
In addition, people have recently been having difficulties with spam
blocking preventing their unsubscription(!), subscription, or even
contacting dotsrc.org staff about resolving subscription problems.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJC9/37M8hyUobTrERAuaMAJ9ByOhOnpQr81q6BJO/ytA4wUQkdgCfcPq0
3q88DFI/PL3LtcIx6ky9Vd8=
=czx7
-END PGP SIGNATURE-


Re: MAILING LIST IS MOVING: [EMAIL PROTECTED]

2008-10-31 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Micah Cowan wrote:
 [EMAIL PROTECTED] is now back in business as a full-fledged mailing list,
 and not just a forwarding alias to here. Please subscribe using the
 interface at http://lists.gnu.org/mailman/listinfo/bug-wget/ at your
 earliest convenience.

Email interface: send an email to [EMAIL PROTECTED]

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJC+vL7M8hyUobTrERAmEsAJ49xkwHMv75li+ihHV38NIP44ho4QCfaAue
hUPMKQbmpdrYqPO8M8CSrzE=
=CwYx
-END PGP SIGNATURE-


Re: -m alias

2008-10-29 Thread hraban
Michelle Konzack wrote:
 Am 2008-10-14 01:20:16, schrieb Hraban Luyat:
 Hi,

 Considering the -m switch (--mirror): the man page says it is currently
 equivalent to -r -N -l inf --no-remove-listing. I was wondering, though:
 why does this not also include -k? When mirroring a website it seems
 useful to convert the links for appropriate viewing in a browser. That
 
 When mirroring a Website, I WANT A IDENTICAL MIRROR.  But IF I  want  to
 have a mirror for Off-Line reading I can choose the additional -k otion.

So your interpretation of the word mirror means byte-by-byte copy
(also called a backup or an archive). Another common interpretation,
however, is an alternative location, suitable for off-site (which I
assume you mean, here, too, instead of off-line) viewing, as in If
that website is unavailable, try one of the following mirrors: 

 is, if mirroring here means what it usually means: provide an
 alternative location to view the same content.. if it's more like a
 backup, then of course -k is not a good option. But in that case, maybe
 it's worth mentioning...?
 
 No!  ;-)

My point was that the meaning of mirror is very ambiguous,
/especially/ in the context of fetching a live website in this fashion
(as one could expect a backup to occur on the server-side instead). I am
not arguing that the -k switch should be added as much as that I'm just
saying it might very well be worth mentioning.

 PS: I would like to be CC'ed (not subscribed).
 
 ???  --  How can you post without being subscribed?  My posts  went  all
 definitively rejected when I tried to post to this list.

http://wget.addictivecode.org/MailingLists


Greetings,

Hraban Luyat


Re: -m alias

2008-10-28 Thread Michelle Konzack
Am 2008-10-14 01:20:16, schrieb Hraban Luyat:
 Hi,
 
 Considering the -m switch (--mirror): the man page says it is currently
 equivalent to -r -N -l inf --no-remove-listing. I was wondering, though:
 why does this not also include -k? When mirroring a website it seems
 useful to convert the links for appropriate viewing in a browser. That

When mirroring a Website, I WANT A IDENTICAL MIRROR.  But IF I  want  to
have a mirror for Off-Line reading I can choose the additional -k otion.

 is, if mirroring here means what it usually means: provide an
 alternative location to view the same content.. if it's more like a
 backup, then of course -k is not a good option. But in that case, maybe
 it's worth mentioning...?

No!  ;-)

 PS: I would like to be CC'ed (not subscribed).

???  --  How can you post without being subscribed?  My posts  went  all
definitively rejected when I tried to post to this list.

Thanks, Greetings and nice Day/Evening
Michelle Konzack
Systemadministrator
24V Electronic Engineer
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
+49/177/935194750, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: Special Website / Software One On One Personalized Consultancy

2008-10-28 Thread Michelle Konzack
N.C.   ;-D


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
+49/177/935194750, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: -m alias

2008-10-28 Thread Micah Cowan
Michelle Konzack wrote:
 ???  --  How can you post without being subscribed?  My posts  went  all
 definitively rejected when I tried to post to this list.

Strange. People are definitely posting to the list without having to be
subscribed.

However, folks have been known to be rejected as spam, even for
unsubscription requests. :\

I've been considering a move to gnu servers; but I'm not sure their spam
filters are better (though at least they wouldn't reject unsubscriptions
I think). But mostly, I'm not motivated enough to get off my lazy butt
yet. If we start having more serious problems, perhaps the motivation
will increase sufficiently...

-- 
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/


Re: wget re-download fully downloaded files

2008-10-27 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Maksim Ivanov wrote:
 I'm trying to download the same file from the same server, command line
 I use:
 wget --debug -o log  -c -t 0 --load-cookies=cookie_file
 http://rapidshare.com/files/153131390/Blind-Test.rar
 
 Below attached 2 files: log with 1.9.1 and log with 1.10.2
 Both logs are made when Blind-Test.rar was already on my HDD.
 Sorry for some mess in logs, but russian language used on my console.

Thanks very much for providing these, Maksim; they were very helpful.
(Sorry for getting back to you so late: it's been busy lately).

I've confirmed this behavioral difference (though I compared the current
development sources against 1.8.2, rather than 1.10.2 to 1.9.1). Your
logs involve a 302 redirection before arriving at the real file, but
that's just a red herring.

The difference is that when 1.9.1 encountered a server that would
respond to a byte-range request with 200 (meaning it doesn't know how
to send partial contents), but with a Content-Length value matching the
size of the local file, then wget would close the connection and not
proceed to redownload. 1.10.2, on the other hand, would just re-download it.

Actually, I'll have to confirm this, but I think that current Wget will
re-download it, but not overwrite the current content, until it arrives
at some content corresponding to bytes beyond the current content.

I need to investigate further to see if this change was somehow
intentional (though I can't imagine what the reasoning would be); if I
don't find a good reason not to, I'll revert this behavior. Probably for
the 1.12 release, but I might possibly punt it to 1.13 on the grounds
that it's not a recent regression (however, it should really be a quick
fix, so most likely it'll be in for 1.12).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJBfOj7M8hyUobTrERAjNTAJ9ayaKLvN4bYS/7o0kYcQywDvfwNgCfcGzz
P9aAwVD6Q/xQuACjU7KF1ng=
=m5QO
-END PGP SIGNATURE-


Re: --mirror and --cut-dirs=2 bug?

2008-10-27 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Brock Murch wrote:
 I try to keep a mirror of NASA atteph ancilliary data for modis processing. I 
 know that means little, but I have a cron script that runs 2 times a day. 
 Sometimes it works, and others, not so much. The sh script is listed at the 
 end of this email below. As is the contents of the remote ftp server's root 
 and portions fo the log. 
 
 I don't need all the data on the remote server, only some thus I use 
 --cut-dirs.To make matters stranger, the software (also from NASA) that uses 
 these files, looks for them in a single place on the client machine where the 
 software runs, but needs data from 2 different directories on the remote ftp 
 server. If the data is not on the client machine, the software kindly ftp's 
 the files to the local directory. However, I don't allow write access to that 
 directory as many people use the software and when it is d/l'ed it has the 
 wrong perms for others to use it, thus I mirror the data I need from the ftp 
 site locally. In the script below, there are 2 wget commands, but they are to 
 slightly different directories (MODISA  MODIST).

I wouldn't recommend that. Using the same output directory for two
different source directories seems likely to lead to problems. You'd
most likely be better off by pulling to two locations, and then
combining them afterwards.

I don't know for sure that it _will_ cause problems (except if they
happen to have same-named files), as long as .listing files are being
properly removed (there were some recently-fixed bugs related to that, I
think? ...just appending new listings on top of existing files).

 It appears to me that the problem occurs if there is a ftp server error, and 
 wget starts a retry. wget goes to the server root, gets the .listing from 
 there for some reason (as opposed to the directory it should go to on the 
 server), and then goes to the dir it needs to mirror and can't find the files 
 (that are listed in the root dir) and creates dirs, and then I get No such 
 file errors and recursive directories created. Any advice would be 
 appreciated.

This snippet seems to be the source of the problem:

 Error in server response, closing control connection.
 Retrying.
 
 - --14:53:53--  ftp://oceans.gsfc.nasa.gov/MODIST/ATTEPH/2002/110/
   (try: 2) = `/home1/software/modis/atteph/2002/110/.listing'
 Connecting to oceans.gsfc.nasa.gov|169.154.128.45|:21... connected.
 Logging in as anonymous ... Logged in!
 == SYST ... done.== PWD ... done.
 == TYPE I ... done.  == CWD not required.
 == PASV ... done.== LIST ... done.

That CWD not required bit is erroneous. I'm 90% sure we fixed this
issue recently (though I'm not 100% sure that it went to release: I
believe so).

I believe we made some related fixes more recently. You provided a great
amount of useful information, but one thing that seems to be missing (or
I missed it) is the Wget version number. Judging from the log, I'd say
it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could
you please try to verify whether Wget continues to exhibit this problem
in the latest release version?

I'll also try to look into this as I have time (but it might be awhile
before I can give it some serious attention; it'd be very helpful if you
could do a little more legwork).

- --
Thanks very much,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJBgNh7M8hyUobTrERAuGoAKCCUoBN0sURKA/51x0o4HN59K8+AACfUYuj
i8XW58MvjvbS3oy4OsOmbpc=
=4kpD
-END PGP SIGNATURE-


Re: --mirror and --cut-dirs=2 bug?

2008-10-27 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Micah Cowan wrote:
 I believe we made some related fixes more recently. You provided a great
 amount of useful information, but one thing that seems to be missing (or
 I missed it) is the Wget version number. Judging from the log, I'd say
 it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could
 you please try to verify whether Wget continues to exhibit this problem
 in the latest release version?

This problem looks like the one that Mike Grant fixed in October of
2006: http://hg.addictivecode.org/wget/1.11/rev/161aa64e7e8f, so it
should definitely be fixed in 1.11.4. Please let me know if it isn't.

- --
Regards,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJBgY+7M8hyUobTrERArrRAJ4p4Y7jwWfic0Wul7UBnBXlSzD2XQCePifc
kWs00JOULkzJmzozK7lmcfA=
=iSL3
-END PGP SIGNATURE-


Re: --mirror and --cut-dirs=2 bug?

2008-10-27 Thread Brock Murch
Micah,

Thanks for your quick attention to this. Yous, I probably forgot to include 
the version #

[EMAIL PROTECTED] atteph]# wget --version
GNU Wget 1.10.2 (Red Hat modified)

Copyright (C) 2005 Free Software Foundation, Inc.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

Originally written by Hrvoje Niksic [EMAIL PROTECTED].

I will see if I can get the newest version for:
[EMAIL PROTECTED] atteph]# cat /etc/redhat-release
CentOS release 4.2 (Final)

I'll let you know how that goes.

Brock

On Monday 27 October 2008 2:19 pm, Micah Cowan wrote:
 Micah Cowan wrote:
  I believe we made some related fixes more recently. You provided a great
  amount of useful information, but one thing that seems to be missing (or
  I missed it) is the Wget version number. Judging from the log, I'd say
  it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could
  you please try to verify whether Wget continues to exhibit this problem
  in the latest release version?

 This problem looks like the one that Mike Grant fixed in October of
 2006: http://hg.addictivecode.org/wget/1.11/rev/161aa64e7e8f, so it
 should definitely be fixed in 1.11.4. Please let me know if it isn't.



More on query matching [Re: Need Design Documents]

2008-10-27 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

kalpana ravi wrote:
 Hi Everybody,

Hi kalpana,

You sent this message to me and [EMAIL PROTECTED]; you
wanted [EMAIL PROTECTED]

 My name is kalpana Ravi.I am planning to contribute to add one of the
 features listed in https://savannah.gnu.org/bugs/?22089. For that i need to
 know the design diagrams to understand better. Does anybody know where the
 UML diagrams are there?

We don't have UML diagrams for wget: you'll just have to read the
sources (which, unfortunately, are messy). I have some rough-draft
diagrams of how I _want_ wget to look eventually, but I'm not done with
those, and anyway they wouldn't help you with wget now. Even if you had
the UML diagrams for the current state, you'd still need to understand
the sources; I really don't think they'd help you much.

More important than understanding the design, is understanding what
needs to be done; we're still getting a grip on that. My current thought
is that there should be a --query-reject (and probably --query-accept,
though the former seems far more useful) that should be matched against
key/value pairs; thus, --query-reject 'foo=baraction=edit' would reject
anything that has foo=bar and action=edit as the key/value pairs in
the query string, even if they're not actually next to each other; an
example rejected URL might be
http://example.com/index.php?a=baction=edittoken=blahfoo=barhergle.

Not all query strings are in the key=value format, so --query-reject
'abc1254' would be allowed, and match against the entire query string.

For an idea how URL filename matching is currently done, you might check
out acceptable src/util.c and the functions it calls, to get an idea
of how query matching might be implemented. However, I'll probably
tackle this bug myself pretty soon if no one else has managed it yet, as
I'm very interested in getting Wget 1.12 finished before long into the
new year (ideally, _before_ the new year, but that probably ain't gonna
happen).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJBgt77M8hyUobTrERAnqrAJ921WjEax0kMFf5Ls70Lvvq6LBItgCeL6wj
UWA/2b+kVMw8L8IsVjIAGhI=
=WKJk
-END PGP SIGNATURE-


Re: wget re-download fully downloaded files

2008-10-27 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Maksim Ivanov wrote:
 I'm trying to download the same file from the same server, command line
 I use:
 wget --debug -o log  -c -t 0 --load-cookies=cookie_file
 http://rapidshare.com/files/153131390/Blind-Test.rar
 
 Below attached 2 files: log with 1.9.1 and log with 1.10.2
 Both logs are made when Blind-Test.rar was already on my HDD.
 Sorry for some mess in logs, but russian language used on my console.

This is currently being tracked at https://savannah.gnu.org/bugs/?24662

A similar and related bug report is at
https://savannah.gnu.org/bugs/?24642 in which the logs show that
rapidshare.com issues also issues erroneous Content-Range information
when it responds with a 206 Partial Content, which exercised a different
regression* introduced in 1.11.x.

* It's not really a regression, since it's desirable behavior: we now
determine the size of the content from the content-range header, since
content-length is often missing or erroneous for partial content.
However, in this instance of server error, it resulted in less-desirable
behavior than the previous version of Wget. Anyway...

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJBhvA7M8hyUobTrERAty1AKCEscXut6FDXvXlxpuSBtKkii1/awCeJH0M
+JcJ5xG67K7CxHBEcV1x/zY=
=D2uE
-END PGP SIGNATURE-


RE: wget re-download fully downloaded files

2008-10-27 Thread Tony Lewis
Micah Cowan wrote:

 Actually, I'll have to confirm this, but I think that current Wget will
 re-download it, but not overwrite the current content, until it arrives
 at some content corresponding to bytes beyond the current content.

 I need to investigate further to see if this change was somehow
 intentional (though I can't imagine what the reasoning would be); if I
 don't find a good reason not to, I'll revert this behavior.

One reason to keep the current behavior is to retain all of the existing
content in the event of another partial download that is shorter than the
previous one. However, I think that only makes sense if wget is comparing
the new content with what is already on disk.

Tony




[bug] wrong speed calculation in (--output-file) logfile

2008-10-25 Thread Peter Volkov
Hello.

During download with wget I've redirected output into file with the
following command: 

$ LC_ALL=C wget -o output 
'ftp://mirror.yandex.ru/gentoo-distfiles/distfiles/OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz'

I've set LC_ALL and LANG explicitly to be sure that this is not locale
related problem. The output I saw in output file was:


--2008-10-25 14:51:17--  
ftp://mirror.yandex.ru/gentoo-distfiles/distfiles/OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz
   = `OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz.13'
Resolving mirror.yandex.ru... 77.88.19.68
Connecting to mirror.yandex.ru|77.88.19.68|:21... connected.
Logging in as anonymous ... Logged in!
== SYST ... done.== PWD ... done.
== TYPE I ... done.  == CWD /gentoo-distfiles/distfiles ... done.
== SIZE OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz ... 13633213
== PASV ... done.== RETR 
OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz ... done.
Length: 13633213 (13M)

 0K .. .. .. .. ..  0%  131K 1m41s
50K .. .. .. .. ..  0%  132K 1m40s
   100K .. .. .. .. ..  1%  135K 99s
   150K .. .. .. .. ..  1%  132K 99s
   200K .. .. .. .. ..  1%  130K 99s
   250K .. .. .. .. ..  2% 45.9K 2m9s
   300K .. .. .. .. ..  2% 64.3M 1m50s
[snip]
 13250K .. .. .. .. .. 99%  131K 0s
 13300K .. ...100%  134K=1m41s

2008-10-25 14:52:58 (132 KB/s) - 
`OOo_3.0.0rc4_20080930_LinuxIntel_langpack_en-GB.tar.gz.13' saved [13633213]


Note the line above snip:
   300K ..  2% 64.3M 1m50s

This is impossible to download so much Mbytes as file is much less. I
don't know why sometimes this number jumps, but in some cases it cause
the following output at the end of download:

 13300K .. ...  100% 26101G=1m45s

Obviously I don't have possibility to download with such high
(26101G=1m45s) speed. This is reproducible with wget 1.11.4.

-- 
Peter.



Re: re-mirror + no-clobber

2008-10-25 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jonathan Elsas wrote:
...
 I've issued the command
 
 wget -nc -r -l inf -H -D www.example.com,www2.example.com
 http://www.example.com
 
 but, I get the message:
 
 
 file 'www.example.com/index.html' already there; not retrieving.
 
 
 and the process exits.   According to the man page files with .html
 suffix will be loaded off disk and parsed but this does not appear to
 be happening.   Am I missing something?

Yes. It has to download the files before they can be loaded from the
disk and parsed. When it encounters a file at a given location, it
doesn't have any way to know that that file corresponds to the one it's
trying to download. Timestamping with -N may be more what you want,
rather than -nc?

I'm open to suggestions on clarifying the documentation.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJA7Ds7M8hyUobTrERAsONAJ0dqYh0av7rQ80F8JIcvxhZ1ee7fwCdFG+y
AJJxMPVzHpmqAy7iGVRWmCU=
=wwns
-END PGP SIGNATURE-


re-mirror + no-clobber

2008-10-24 Thread Jonathan Elsas

Hi --

I'm using wget 1.10.2

I'm trying to mirror a web site with the following command:

wget -m http://www.example.com

After this process finished, I realized that I also needed pages from  
a subdomain (eg. www2)


To re-start the mirror process without downloading the same pages  
again, I've issued the command


wget -nc -r -l inf -H -D www.example.com,www2.example.com http://www.example.com

but, I get the message:


file 'www.example.com/index.html' already there; not retrieving.


and the process exits.   According to the man page files with .html  
suffix will be loaded off disk and parsed but this does not appear to  
be happening.   Am I missing something?


thanks in advance for your help


--mirror and --cut-dirs=2 bug?

2008-10-24 Thread Brock Murch
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I try to keep a mirror of NASA atteph ancilliary data for modis processing. I 
know that means little, but I have a cron script that runs 2 times a day. 
Sometimes it works, and others, not so much. The sh script is listed at the 
end of this email below. As is the contents of the remote ftp server's root 
and portions fo the log. 

I don't need all the data on the remote server, only some thus I use 
- --cut-dirs. To make matters stranger, the software (also from NASA) that uses 
these files, looks for them in a single place on the client machine where the 
software runs, but needs data from 2 different directories on the remote ftp 
server. If the data is not on the client machine, the software kindly ftp's 
the files to the local directory. However, I don't allow write access to that 
directory as many people use the software and when it is d/l'ed it has the 
wrong perms for others to use it, thus I mirror the data I need from the ftp 
site locally. In the script below, there are 2 wget commands, but they are to 
slightly different directories (MODISA  MODIST).

It appears to me that the problem occurs if there is a ftp server error, and 
wget starts a retry. wget goes to the server root, gets the .listing from 
there for some reason (as opposed to the directory it should go to on the 
server), and then goes to the dir it needs to mirror and can't find the files 
(that are listed in the root dir) and creates dirs, and then I get No such 
file errors and recursive directories created. Any advice would be 
appreciated.

Brock Murch

Here is an example of the bad type of dir structure I end up with (there 
should be no EO1 and below):

[EMAIL PROTECTED] atteph]# find . -type d -name * | grep EO1
./2002/110/EO1
./2002/110/EO1/CZCS
./2002/110/EO1/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS
./2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS/CZCS

Or:
[EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/
CZCS  README
[EMAIL PROTECTED] atteph]# ls /home1/software/modis/atteph/2002/110/EO1/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/
COMMON
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ls 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/
CZCS  README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/CZCS/CZCS/CZCS/

And

[EMAIL PROTECTED] atteph]# ll /home1/software/modis/atteph/2002/110/EO1/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/CZCS/README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/README
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/README 
ls: /home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/README: No 
such file or directory
[EMAIL PROTECTED] atteph]# ll 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/README 
- -rw-r--r--  1 root root 9499 Aug 20 10:12 
/home1/software/modis/atteph/2002/110/EO1/CZCS/CZCS/CZCS/CZCS/COMMON/README


All the README files are all the same, and the same as the one is the ftp 
server 

RE: [PATCH] Enable wget to download from given offset and just a given amount of bytes

2008-10-23 Thread Tony Lewis
Juan Manuel wrote:

 

 OK, you are right, I`ll try to make it better on my free time. I

 supposed that it would have been more polite with one option, but

 thought it was easier with two (and since this is my first

 approach to C I took the easy way) because one option would have

 to deal with two parameters.

 

It's clearly easier to deal with options that wget is already programmed to
support. For a primer on wget options, take a look at this page on the wiki:
http://wget.addictivecode.org/OptionsHowto

 

I suspect you will need to add support for a new action (perhaps cmd_range).

 

Tony

 

 



RE: A/R matching against query strings

2008-10-22 Thread Tony Lewis
Micah Cowan wrote:

 Would hash really be useful, ever?

Probably not as long as we strip off the hash before we do the comparison.

Tony




accept/reject rules based on querysting

2008-10-21 Thread Gustavo Ayala
Any ideas about when this option (or an acceptable workaround) will be 
implemented ?
 
I need to include/exclude based on querysting (with regular expression of 
course). File name is not enough.
 
Thanks.
 
 


__
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis! 
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar


accept/reject rules based on querysting

2008-10-21 Thread Gustavo Ayala
Any ideas about when this option (or an acceptable workaround) will be 
implemented ?
 
I need to include/exclude based on querysting (with regular expression of 
course). File name is not enough.
 
Thanks.
 


__
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis! 
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar


Re: accept/reject rules based on querysting

2008-10-21 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Gustavo Ayala wrote:
 Any ideas about when this option (or an acceptable workaround) will be 
 implemented ?
  
 I need to include/exclude based on querysting (with regular expression of 
 course). File name is not enough.

I consider it an important feature, and currently expect to implement it
for 1.12.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI/faT7M8hyUobTrERApXLAJsFFMsVcibgLlptVhJoMwZeLYg02wCfTLSs
ayyryt3wCnkwtAStESYp7cs=
=dB6e
-END PGP SIGNATURE-


Re: A/R matching against query strings

2008-10-21 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I sent the following last month but didn't get any feedback. I'm trying
one more time. :)
- -M

Micah Cowan wrote:
 On expanding current URI acc/rej matches to allow matching against query
 strings, I've been considering how we might enable/disable this
 functionality, with an eye toward backwards compatibility.
 
 It seems to me that one usable approach would be to require the ?
 query string to be an explicit part of rule, if it's expected to be
 matched against query strings. So -A .htm,.gif,*Action=edit* would all
 result in matches against the filename portion only, but -A
 '\?*Action=edit*' would look for Action=edit within the query-string
 portion. (The '\?' is necessary because otherwise '?' is a wildcard
 character; [?] would also work.)
 
 The disadvantage of that technique is that it's harder to specify that a
 given string should be checked _anywhere_, regardless of whether it
 falls in the filename or query-string portion; but I can't think offhand
 of any realistic cases where that's actually useful. We could also
 supply a --match-queries option to turn on matching of wildcard rules
 for anywhere (non-wildcard suffix rules should still match only at the
 end of the filename portion).
 
 Another option is to use a separate -A-like option that does what -A
 does for filenames, but matches against query strings. I like this idea
 somewhat less.
 
 Thoughts?
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI/fhT7M8hyUobTrERAgvtAJ0daQEub5GS4EFc7BuGT0pG1E1n0wCgjbnx
zb1QK0suZx0woMauqfL0qZI=
=5mdh
-END PGP SIGNATURE-


RE: A/R matching against query strings

2008-10-21 Thread Tony Lewis
Micah Cowan wrote:

 On expanding current URI acc/rej matches to allow matching against query
 strings, I've been considering how we might enable/disable this
 functionality, with an eye toward backwards compatibility.

What about something like --match-type=TYPE (with accepted values of all,
hash, path, search)?

For the URL http://www.domain.com/path/to/name.html?a=true#content

all would match against the entire string
hash would match against content
path would match against path/to/name.html
search would match against a=true

For backward compatibility the default should be --match-type=path.

I thought about having host as an option, but that duplicates another
option.

Tony



Re: A/R matching against query strings

2008-10-21 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tony Lewis wrote:
 Micah Cowan wrote:
 
 On expanding current URI acc/rej matches to allow matching against query
 strings, I've been considering how we might enable/disable this
 functionality, with an eye toward backwards compatibility.
 
 What about something like --match-type=TYPE (with accepted values of all,
 hash, path, search)?
 
 For the URL http://www.domain.com/path/to/name.html?a=true#content
 
 all would match against the entire string
 hash would match against content
 path would match against path/to/name.html
 search would match against a=true
 
 For backward compatibility the default should be --match-type=path.
 
 I thought about having host as an option, but that duplicates another
 option.

As does path (up to the final /).

Would hash really be useful, ever? It's never part of the request to
the server, so it's really more context to the URL than a real part of
the URL, as far as requests go. Perhaps that sort of thing could best
wait for when we allow custom URL-parsers/filters.

Also, I don't like the name search overly much, as that's a very
limited description of the much more general use of query strings.

But differentiating between three or more different match types tilts me
much more strongly toward some sort of shorthand, like the explicit need
for \?; with three types, perhaps we'd just use some special prefix for
patterns to indicate which sort of match we want (:q: query strings,
:a: for all, or whatever), to save on prefix each different type of
match with --match-type (or just using all for everything).

OTOH, regex support is easy enough to add to Wget, now that we're using
gnulib; we could just leave wildcards the way they are, and introduce
regexes that match everything. Then query strings are '\?.*foo=bar' (or,
for the really pedantic, '\?([^?]*)?foo=bar([^?]*)?$')

That last one, though, highlights how cumbersome it is to do proper
matching against typical HTML form-generated query strings (it's not
really even possible with wildcards). Perhaps a more appropriate
pattern-matcher specifically for query strings would be a good idea.
It's probably enough to do something like --query-='action=Edit', where
there's an implied '\?([^?]*)?' before, and '([^?]*)?$' after.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI/qLZ7M8hyUobTrERAmRdAJsH+9p+mTafoxqeVOstTPKrZP31CACdECCa
vQ1lZnncrdHd8SSbXevK02Y=
=YC2A
-END PGP SIGNATURE-


Can't wget anidb.net

2008-10-20 Thread zanzi
Hi

I have tried to wget http://anidb.net/perl-bin/animedb.pl?show=main but all
I seem to get is a file with unreadable characters (and not the HTML file
I'm after).
Is it because of some perl-script on the site?

Thanks!
ZZ


Re: Can't wget anidb.net

2008-10-20 Thread Saint Xavier
Hi,

* zanzi ([EMAIL PROTECTED]) wrote:
 I have tried to wget http://anidb.net/perl-bin/animedb.pl?show=main but all
 I seem to get is a file with unreadable characters (and not the HTML file
 I'm after).
 Is it because of some perl-script on the site?

This perl script assume HTTP/1.1 and gzip support for any request :(

  HTTP/1.1 200 OK
  Date: Mon, 20 Oct 2008 15:25:09 GMT
  Server: Apache/1.3.41 (Unix) mod_perl/1.30
  Set-Cookie: adbuin=1224516309-mqMQ; path=/; expires=Thu, 18-Oct-2018 15:25:09 
GMT
  Cache-control: no-cache
  Pragma: no-cache
  Content-Type: text/html; charset=UTF-8
  Expires: Mon, 20 Oct 2008 15:25:09 GMT
  X-Cache: MISS from anidb.net
  Connection: close
  Content-Encoding: gzip
   ^
  Content-Length: 5489


You can manually decompress the data:
 $ wget http://anidb.net/perl-bin/animedb.pl?show=main;  -O page.gz
 $ gzip -d page.gz  page.html


Sincerly,
Saint Xavier.


Special Website / Software One On One Personalized Consultancy

2008-10-17 Thread Web Promotions
Sir/ Madam,

We would like to offer you a F R E E one hour personalized consultancy on 
how best the Internet can help your buiness (in terms of website designing, 
software development and internet marketing).

As part of this promotional campaign, one of our senior marketing managers 
will be specifically understanding Y O U R business and online/ software 
setup. He/ she will then set up a meeting with you to recommend on the B E S T 
way that the Internet and web based software can help your business.

To make the most of this unique offer, register N O W at:
http://www.pegasusinfocorp.com/contact/promotions_consultancy.htm
This offer is for a limited period and is being sent to a representative 
sample. The offer will expire on October 15, 2008

Pegasus InfoCorp is a leading website, web based software development and 
internet marketing company head quartered in India that builds customised 
websites and software solutions for clients worldwide. 80% of our clients 
are small to mid sized businesses across more than 15 countries worldwide, 
and we also work with Fortune 500 blue chip companies such as eBay.com and 
Yahoo.com!

We have delivered on over a 100 clients over the years and over 75% of our 
company revenues come from repeat/ referential clients. Many of our 
associates come from some of India's premium engineering and design 
institutes, including the Indian Institutes of Technology (IITs) and the J J 
School of Arts. And we have well set reliable processes for offshore 
delivery. To know more about us and read about some of the work we have 
done, please visit: www.pegasusinfocorp.com

Register for this F R E E one hour consultancy at:
http://www.pegasusinfocorp.com/contact/promotions_consultancy.htm



Best regards,

Pegasus InfoCorp,
www.pegasusinfocorp.com

USA (voicemail): +1-425-906-5727 ; UK (voicemail): +44-20-3129-8455 ; 
Australia (voicemail): +61-2-8005-6455
India: +91-22-32961777, +91-22-28941595, +91-22-65286140

You can unsubscribe from future promotions anytime by visiting:
http://www.pegasusinfocorp.com/contact/email_preferences.htm


Message sent by: Pegasus InfoCorp Pvt Ltd, 602, Soni Shopping Center, 
Borivali (W), Mumbai (Bombay), 400092, India



-c option

2008-10-15 Thread Thomas Wolff
Hi,
I've just come across the following remark in the wget manual page (1.10.2),
about the -c option:
 Wget has no way of verifying that the local file is really a valid prefix of 
 the remote file.
This is not quite true. It could at least check the remote and local 
file time stamps for this purpose, and I think it should do this.
It could also, as an option, load a couple of random bytes as a heuristic 
quick check. (I wouldn't do this, though.)
In any case, the wrong claim no way should be removed from the man page.

Best regards,
Thomas Wolff


Re: wget re-download fully downloaded files

2008-10-13 Thread Maksim Ivanov
I'm trying to download the same file from the same server, command line I
use:
wget --debug -o log  -c -t 0 --load-cookies=cookie_file
http://rapidshare.com/files/153131390/Blind-Test.rar

Below attached 2 files: log with 1.9.1 and log with 1.10.2
Both logs are made when Blind-Test.rar was already on my HDD.
Sorry for some mess in logs, but russian language used on my console.

Yours faithfully, Maksim Ivanov



2008/10/13 Micah Cowan [EMAIL PROTECTED]

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Maksim Ivanov wrote:
  Hello!
 
  Starting version 1.10 wget has very annoying bug: if you trying download
  already fully downloaded file, wget begin download it over,
  but 1.9.1 says: Nothing to do as it must to be.

 It all depends on what options you specify. That's as true for 1.9 as it
 is for 1.10 (or the current release 1.11.4).

 It can also depend on the server; not all of them support timestamping
 or partial fetches.

 Please post the minimal log that exhibits the problem you're experiencing.

 - --
 Thanks,
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFI8mrL7M8hyUobTrERAqx4AJ9yQb+kPXGI2N7sv34krZLnYDuRvgCfWI2K
 nZYI8ER1PB3pkYC4neiTa9U=
 =JW3/
 -END PGP SIGNATURE-



log.1.9.1
Description: Binary data


log.1.10.2
Description: Binary data


Re: wget re-download fully downloaded files

2008-10-13 Thread Maksim Ivanov
I'm trying to download the same file from the same server, command line I
use:
wget --debug -o log  -c -t 0 --load-cookies=cookie_file
http://rapidshare.com/files/153131390/Blind-Test.rar

Below attached 2 files: log with 1.9.1 and log with 1.10.2
Both logs are made when Blind-Test.rar was already on my HDD.
Sorry for some mess in logs, but russian language used on my console.

Yours faithfully, Maksim Ivanov



2008/10/13 Micah Cowan [EMAIL PROTECTED]

-BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Maksim Ivanov wrote:
  Hello!
 
  Starting version 1.10 wget has very annoying bug: if you trying download
  already fully downloaded file, wget begin download it over,
  but 1.9.1 says: Nothing to do as it must to be.

 It all depends on what options you specify. That's as true for 1.9 as it
 is for 1.10 (or the current release 1.11.4).

 It can also depend on the server; not all of them support timestamping
 or partial fetches.

 Please post the minimal log that exhibits the problem you're experiencing.

 - --
 Thanks,
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFI8mrL7M8hyUobTrERAqx4AJ9yQb+kPXGI2N7sv34krZLnYDuRvgCfWI2K
 nZYI8ER1PB3pkYC4neiTa9U=
 =JW3/
 -END PGP SIGNATURE-



log.1.9.1
Description: Binary data


log.1.10.2
Description: Binary data


-m alias

2008-10-13 Thread Hraban Luyat
Hi,

Considering the -m switch (--mirror): the man page says it is currently
equivalent to -r -N -l inf --no-remove-listing. I was wondering, though:
why does this not also include -k? When mirroring a website it seems
useful to convert the links for appropriate viewing in a browser. That
is, if mirroring here means what it usually means: provide an
alternative location to view the same content.. if it's more like a
backup, then of course -k is not a good option. But in that case, maybe
it's worth mentioning...?

Thanks,

Hraban

PS: I would like to be CC'ed (not subscribed).


wget re-download fully downloaded files

2008-10-12 Thread Maksim Ivanov
Hello!

Starting version 1.10 wget has very annoying bug: if you trying download
already fully downloaded file, wget begin download it over,
but 1.9.1 says: Nothing to do as it must to be.

Yours faithfully, Maksim Ivanov


Re: wget re-download fully downloaded files

2008-10-12 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Maksim Ivanov wrote:
 Hello!
 
 Starting version 1.10 wget has very annoying bug: if you trying download
 already fully downloaded file, wget begin download it over,
 but 1.9.1 says: Nothing to do as it must to be.

It all depends on what options you specify. That's as true for 1.9 as it
is for 1.10 (or the current release 1.11.4).

It can also depend on the server; not all of them support timestamping
or partial fetches.

Please post the minimal log that exhibits the problem you're experiencing.

- --
Thanks,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI8mrL7M8hyUobTrERAqx4AJ9yQb+kPXGI2N7sv34krZLnYDuRvgCfWI2K
nZYI8ER1PB3pkYC4neiTa9U=
=JW3/
-END PGP SIGNATURE-


Incorrect transformation of newline's symbols

2008-10-07 Thread Александр Вильнин

Hello!

I've noticed some posible mistake in ftp-basic.c.

When I try to download a file from 
ftp://www.delorie.com/pub/djgpp/current/; (in my case it was 
ftp://www.delorie.com/pub/djgpp/current/FILES;) server responce error 
no.550. But this file actually exists.

I've used
(wget --verbose --debug --output-file=wget_djgpp_log 
--directory-prefix=djgpp ftp://www.delorie.com/pub/djgpp/current/FILES;)

cygwin command to get this file.

In function ftp_request (ftp-basic.c) newline's characters are 
substituted on ' ', but ftp-server doesn't understand such commands. 
SIZE and RETR commands do not pass.

I've insert debug log at the end of this message.

--restrict-file-names=[windows,unix] option brings no effect.

Yours faithfully, Alexander Vilnin ([EMAIL PROTECTED])

+ wget_djgpp_log +++
DEBUG output created by Wget 1.11.3 on cygwin.

--2008-10-06 17:06:43--  ftp://www.delorie.com/pub/djgpp/current/FILES%0D
   = `djgpp/FILES%0D'
Resolving www.delorie.com... 207.22.48.162
Caching www.delorie.com = 207.22.48.162
Connecting to www.delorie.com|207.22.48.162|:21... connected.
Created socket 4.
Releasing 0x006a0c88 (new refcount 1).
Logging in as anonymous ... 220 delorie.com FTP server (Version 
wu-2.8.0-prerelease(2) Fri Sep 5 11:24:18 EDT 2003) ready.


-- USER anonymous

331 Guest login ok, send your complete e-mail address as password.

-- PASS -wget@

230 Guest login ok, access restrictions apply.
Logged in!
== SYST ...
-- SYST

215 UNIX Type: L8
done.== PWD ...
-- PWD

257 / is current directory.
done.
== TYPE I ...
-- TYPE I

200 Type set to I.
done.  changing working directory
Prepended initial PWD to relative path:
   pwd: '/'
   old: 'pub/djgpp/current'
  new: '/pub/djgpp/current'
== CWD /pub/djgpp/current ...
-- CWD /pub/djgpp/current

250 CWD command successful.
done.
== SIZE FILES\015 ...
Detected newlines in SIZE FILES\015; changing to SIZE FILES 

-- SIZE FILES

550 FILES : not a plain file.
done.
== PASV ...
-- PASV

227 Entering Passive Mode (207,22,48,162,102,137)
trying to connect to 207.22.48.162 port 26249
Created socket 5.
done.== RETR FILES\015 ...
Detected newlines in RETR FILES\015; changing to RETR FILES 

-- RETR FILES

550 FILES : No such file or directory.

No such file `FILES\015'.

Closed fd 5
Closed fd 4
+ wget_djgpp_log +++


Re: Incorrect transformation of newline's symbols

2008-10-07 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Александр Вильнин wrote:
 Hello!
 
 I've noticed some posible mistake in ftp-basic.c.
 
 When I try to download a file from
 ftp://www.delorie.com/pub/djgpp/current/; (in my case it was
 ftp://www.delorie.com/pub/djgpp/current/FILES;) server responce error
 no.550. But this file actually exists.
 I've used
 (wget --verbose --debug --output-file=wget_djgpp_log
 --directory-prefix=djgpp ftp://www.delorie.com/pub/djgpp/current/FILES;)
 cygwin command to get this file.
 
 In function ftp_request (ftp-basic.c) newline's characters are
 substituted on ' ', but ftp-server doesn't understand such commands.
 SIZE and RETR commands do not pass.
 I've insert debug log at the end of this message.

The problem isn't that newlines are substituted. Newlines and carriage
returns are simply not safe within FTP file names.

However, how did the newline get there in the first place? The real file
name itself doesn't have a newline in it. The logs clearly show that
Wget was passed a URL with a carriage return (not newline) in it. This
strongly indicates that the shell you were using passed it that way to
Wget. Probably, the shell was given \r\n when you hit Enter to end
your command, and stripped away the \n but left the \r, which it passed
to Wget.

The bug you are encountering is in your Cygwin+shell environment; you'll
have to look to there. The only deficiency I'm seeing on Wget's part
from these logs, is that it's calling \015 a newline character, when
in fact the newline character is \012; it should say line-ending
character or some such.

- --
HTH,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI67fs7M8hyUobTrERArlfAJ0TurMdyGK0YR9UK263h8p2ZesqXQCfdQo3
Tn4oDFWJg9JIyTEQOJ2jrCE=
=Y/Sy
-END PGP SIGNATURE-


Can't fetch error messages

2008-10-05 Thread Hadmut Danisch
Hi,

I was trying to test the error messages of my server
(apache: ErrorDocument 404 )

but unfortunately could not download those error messages generated by
my server with wget since the server sends a 404 code if a page is
missing (that's what I wanted to test), wget does not save the page
then. so wget should have an option to download and save in any case
even when an error code was sent.

regards
Hadmut



Failure to build from Mercurial

2008-10-01 Thread Debarshi Ray
While working on https://savannah.gnu.org/bugs/?24346 I found that the
current code in Mercurial fails to build. This is what I am getting:

$ hg clone http://hg.addictivecode.org/wget/mainline wget
$ ./autogen.sh
$ ./configure --prefix=$HOME
$ make
[...]
/bin/sh ../ylwrap css.l lex.yy.c css.c -- flex
/u/debray/devel/wget/hg/wget-hacking/src/css.l:112: undefined definition {X}
/u/debray/devel/wget/hg/wget-hacking/src/css.l:113: undefined definition {X}
/u/debray/devel/wget/hg/wget-hacking/src/css.l:120: undefined definition {R}
/u/debray/devel/wget/hg/wget-hacking/src/css.l:121: undefined definition {R}
make[2]: *** [css.c] Error 1
[...]

Happy hacking,
Debarshi


Re: Support for file://

2008-09-27 Thread Petr Pisar

Michelle Konzack napsal(a):

Am 2008-09-20 22:05:35, schrieb Micah Cowan:

I'm confused. If you can successfully download the files from
HOSTINGPROVIDER in the first place, then why would a difference exist?
And if you can't, then this wouldn't be an effective way to find out.


I mean, IF you have a local (master) mirror and your  website  @ISP  and
you want to know, whether the two websites are  identical  and  have  no
cruft in it, you can


I didn't follow this thread, however, just FYI, there exist excellent
(not only) FTP client called lftp that has built-in command mirror.
The command has similar effect as rsync tool---i.e. it synchronize 
remote and local directories recursively.


-- Petr




signature.asc
Description: OpenPGP digital signature


Re: Support for file://

2008-09-26 Thread Michelle Konzack
Am 2008-09-20 22:05:35, schrieb Micah Cowan:
 I'm confused. If you can successfully download the files from
 HOSTINGPROVIDER in the first place, then why would a difference exist?
 And if you can't, then this wouldn't be an effective way to find out.

I mean, IF you have a local (master) mirror and your  website  @ISP  and
you want to know, whether the two websites are  identical  and  have  no
cruft in it, you can

  1)  fetch the website from your isp recursively with
  wget -r -nH -R /tmp/tmp_ISP http://website.isp.tld/

  2)  fetch the local mirror with
  wget -r -nH -R /tmp/tmp_LOC file://path/to/local/mirror/

where the full path in 2) would be the same as the website in 1) and
then compare it with

  3)  /path/to/local/mirror/

If you have edited the files local and remote, you  can  get  surprising
results.

Fetching recursive of /index.html mean, that ALL  files  are  downloaded
which are mentioned in ANY HTML files.  So if 1) differs from

ftp://website.isp.tld/

then there is something wrong in the site...


Thanks, Greetings and nice Day/Evening
Michelle Konzack
Systemadministrator
24V Electronic Engineer
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
+49/177/935194750, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: Suggested feature

2008-09-24 Thread Maciej W. Rozycki
On Wed, 24 Sep 2008, Oliver Hahn wrote:

 I think it would be a nice feature if wget could print in --spider mode all
 downloadable file urls into a text file, so that you can import this urls to
 another download manager.

 You can use the log file to retrieve this information from -- use the
usual text processing tools like `grep', `sed', etc. to filter out what
you need.  No need for a new feature as all you need is already in place.

  Maciej


Re: Big files

2008-09-24 Thread Michelle Konzack
Am 2008-09-16 15:22:22, schrieb Cristián Serpell:
 It is the latest Ubuntu's distribution, that still comes with the old  
 version.

Ehm, even Debian Etch comes with:

[EMAIL PROTECTED]:~] apt-cache policy wget
wget:
  Installiert:1.10.2-2
  Mögliche Pakete:1.10.2-2
  Versions-Tabelle:
 *** 1.10.2-2 0
500 file: etch/main Packages
100 /var/lib/dpkg/status

So Ubunti use AFAIK the latest version which is 1.11...

Thanks, Greetings and nice Day/Evening
Michelle Konzack
Systemadministrator
24V Electronic Engineer
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
+49/177/935194750, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: Big files

2008-09-24 Thread Michelle Konzack
There must be an other Bug, since I can  download small (:-) 18 GByte of
archive files...  Debian Etch:

[EMAIL PROTECTED]:~] apt-cache policy wget
wget:
  Installiert:1.10.2-2
  Mögliche Pakete:1.10.2-2
  Versions-Tabelle:
 *** 1.10.2-2 0
500 file: etch/main Packages
100 /var/lib/dpkg/status

Thanks, Greetings and nice Day/Evening
Michelle Konzack
Systemadministrator
24V Electronic Engineer
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
+49/177/935194750, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: Big files

2008-09-24 Thread Michelle Konzack
Am 2008-09-16 12:52:16, schrieb Tony Lewis:
 Cristián Serpell wrote:
 
  Maybe I should have started by this (I had to change the name of the  
  file shown):
 [snip]
  ---response begin---
  HTTP/1.1 200 OK
  Date: Tue, 16 Sep 2008 19:37:46 GMT
  Server: Apache
  Last-Modified: Tue, 08 Apr 2008 20:17:51 GMT
  ETag: 7f710a-8a8e1bf7-47fbd2ef
  Accept-Ranges: bytes
  Content-Length: -1970398217

Interesting Headrs, since here, I get

  HTTP/1.1 200 OK
  Date: Mon, 22 Sep 2008 21:58:11 GMT
  Server: Apache/2.2.3 (Debian) PHP/5.2.0-8+etch10
  X-Powered-By: PHP/5.2.0-8+etch10

which mean, he is running the old crapy apache 1.3.
 
 The problem is not with wget. It's with the Apache server, which told wget
 that the file had a negative length.

Because it is the old indian.

Thanks, Greetings and nice Day/Evening
Michelle Konzack
Systemadministrator
24V Electronic Engineer
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
+49/177/935194750, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Re: Support for file://

2008-09-22 Thread David

Hi Micah,

Your're right - this was raised before and in fact it was a feature Mauro 
Tortonesi intended to be implemented for the 1.12 release, but it seems to have 
been forgotten somewhere along the line. I wrote to the list in 2006 describing 
what I consider a compelling reason to support file://. Here is what I wrote 
then:

At 03:45 PM 26/06/2006, David wrote:
In replies to the post requesting support of the file:// scheme, requests 
were made for someone to provide a compelling reason to want to do this. 
Perhaps the following is such a reason.
I have a CD with HTML content (it is a CD of abstracts from a scientific 
conference), however for space reasons not all the content was included on the 
CD - there remain links to figures and diagrams on a remote web site. I'd like 
to create an archive of the complete content locally by having wget retrieve 
everything and convert the links to point to the retrieved material. Thus the 
wget functionality when retrieving the local files should work the same as if 
the files were retrieved from a web server (i.e. the input local file needs to 
be processed, both local and remote content retrieved, and the copies made of 
the local and remote files all need to be adjusted to now refer to the local 
copy rather than the remote content). A simple shell script that runs cp or 
rsync on local files without any further processing would not achieve this aim.
Regarding to where the local files should be copied, I suggest a default scheme 
similar to current http functionality. For example, if the local source was 
/source/index.htm, and I ran something like:
   wget.exe -m -np -k file:///source/index.htm
this could be retrieved to ./source/index.htm (assuming that I ran the command 
from anywhere other than the root directory). On Windows,  if the local source 
file is c:\test.htm,  then the destination could be .\c\test.htm. It would 
probably be fair enough for wget to throw up an error if the source and 
destination were the same file (and perhaps helpfully suggest that the user 
changes into a new subdirectory and retry the command).
One additional problem this scheme needs to deal with is when one or more /../ 
in the path specification results in the destination being above the current 
parent directory; then  the destination would have to be adjusted to ensure the 
file remained within the parent directory structure. For example, if I am in 
/dir/dest/ and ran
   wget.exe -m -np -k file://../../source/index.htm
this could be saved to ./source/index.htm  (i.e. /dir/dest/source/index.htm)
-David. 


At 08:49 AM 3/09/2008, you wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Petri Koistinen wrote:
 Hi,
 
 I would be nice if wget would also support file://.

Feel free to file an issue for this (I'll mark it Needs Discussion and
set at low priority). I'd thought there was already an issue for this,
but can't find it (either open or closed). I know this has come up
before, at least.

I think I'd need some convincing on this, as well as a clear definition
of what the scope for such a feature ought to be. Unlike curl, which
groks urls, Wget W(eb)-gets, and file:// can't really be argued to
be part of the web.

That in and of itself isn't really a reason not to support it, but my
real misgivings have to do with the existence of various excellent tools
that already do local-file transfers, and likely do it _much_ better
than Wget could hope to. Rsync springs readily to mind.

Even the system cp command is likely to handle things much better than
Wget. In particular, special OS-specific, extended file attributes,
extended permissions and the like, are among the things that existing
system tools probably handle quite well, and that Wget is unlikely to. I
don't really want Wget to be in the business of duplicating the system
cp command, but I might conceivably not mind file:// support if it
means simple _content_ transfer, and not actual file duplication.

Also in need of addressing is what recursion should mean for file://.
Between ftp:// and http://, recursion currently means different
things. In FTP, it means traverse the file hierarchy recursively,
whereas in HTTP it means traverse links recursively. I'm guessing
file:// should work like FTP (i.e., recurse when the path is a
directory, ignore HTML-ness), but anyway this is something that'd need
answering.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvcLq7M8hyUobTrERAl6YAJ9xeTINVkuvl8HkElYlQt7dAsUfHACfXRT3
lNR++Q0XMkcY4c6dZu0+gi4=
=mKqj
-END PGP SIGNATURE-


  Make the switch to the world#39;s best email. Get Yahoo!7 Mail! 
http://au.yahoo.com/y7mail

Re: Support for file://

2008-09-22 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

David wrote:
 
 Hi Micah,
 
 Your're right - this was raised before and in fact it was a feature
 Mauro Tortonesi intended to be implemented for the 1.12 release, but it
 seems to have been forgotten somewhere along the line. I wrote to the
 list in 2006 describing what I consider a compelling reason to support
 file:// file:///. Here is what I wrote then:
 
 At 03:45 PM 26/06/2006, David wrote:
 In replies to the post requesting support of the file:// scheme,
 requests were made for someone to provide a compelling reason to want to
 do this. Perhaps the following is such a reason.
 I have a CD with HTML content (it is a CD of abstracts from a scientific
 conference), however for space reasons not all the content was included
 on the CD - there remain links to figures and diagrams on a remote web
 site. I'd like to create an archive of the complete content locally by
 having wget retrieve everything and convert the links to point to the
 retrieved material. Thus the wget functionality when retrieving the
 local files should work the same as if the files were retrieved from a
 web server (i.e. the input local file needs to be processed, both local
 and remote content retrieved, and the copies made of the local and
 remote files all need to be adjusted to now refer to the local copy
 rather than the remote content). A simple shell script that runs cp or
 rsync on local files without any further processing would not achieve
 this aim.

Fair enough. This example at least makes sense to me. I suppose it can't
hurt to provide this, so long as we document clearly that it is not a
replacement for cp or rsync, and is never intended to be (won't handle
attributes and special file properties).

However, support for file:// will introduce security issues, care is needed.

For instance, file:// should never be respected when it comes from the
web. Even on the local machine, it could be problematic to use it on
files writable by other users (as they can then craft links to download
privileged files with upgraded permissions). Perhaps files that are only
readable for root should always be skipped, or wget should require a
--force sort of option if the current mode can result in more
permissive settings on the downloaded file.

Perhaps it would be wise to make this a configurable option. It might
also be prudent to enable an option for file:// to be disallowed for root.

https://savannah.gnu.org/bugs/?24347

If any of you can think of additional security issues that will need
consideration, please add them in comments to the report.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI19aE7M8hyUobTrERAt49AJ4irLGMd6OVRWeooKPqZxmX0+K2agCfaq2d
Mx9IgSo5oUDQgBPD01mcGcY=
=sdAZ
-END PGP SIGNATURE-


Post size limit?

2008-09-21 Thread DeVill
Hi!

I've been trying to send post variables with --post-file option of
wget. (I have two variables in the file, both urlencoded, one of them
is quite large.) It worked fine until it came across a file that was
4.7M in size: post variables just won't get through to the server... I
tried to do the same post with Mozilla Firefox, and it worked fine,
but I had the same results with curl :-(

Any ideas what could be the problem?

Please cc me, I'm not subscribed!

Thanks!

Bye
DeVill


Re: Post size limit?

2008-09-21 Thread mm w
Hi
what is the server log, I guess a boundary problem your headers are
wrong that's all I pretty sure if you look at the server error logs
you will get your answer, post files are not really post data... you
have to set up your http body correctly

Cheers!

On Sun, Sep 21, 2008 at 1:10 PM, DeVill [EMAIL PROTECTED] wrote:
 Hi!

 I've been trying to send post variables with --post-file option of
 wget. (I have two variables in the file, both urlencoded, one of them
 is quite large.) It worked fine until it came across a file that was
 4.7M in size: post variables just won't get through to the server... I
 tried to do the same post with Mozilla Firefox, and it worked fine,
 but I had the same results with curl :-(

 Any ideas what could be the problem?

 Please cc me, I'm not subscribed!

 Thanks!

 Bye
 DeVill




-- 
-mmw


Re: Support for file://

2008-09-20 Thread Michelle Konzack
Hello Micah,

Am 2008-09-02 15:49:15, schrieb Micah Cowan:
 I think I'd need some convincing on this, as well as a clear definition
 of what the scope for such a feature ought to be. Unlike curl, which
 groks urls, Wget W(eb)-gets, and file:// can't really be argued to
 be part of the web.

Right but...

 That in and of itself isn't really a reason not to support it, but my
 real misgivings have to do with the existence of various excellent tools
 that already do local-file transfers, and likely do it _much_ better
 than Wget could hope to. Rsync springs readily to mind.
 
 Even the system cp command is likely to handle things much better than
 Wget. In particular, special OS-specific, extended file attributes,
 extended permissions and the like, are among the things that existing
 system tools probably handle quite well, and that Wget is unlikely to. I
 don't really want Wget to be in the business of duplicating the system
 cp command, but I might conceivably not mind file:// support if it
 means simple _content_ transfer, and not actual file duplication.
 
 Also in need of addressing is what recursion should mean for file://.
 Between ftp:// and http://, recursion currently means different
 things. In FTP, it means traverse the file hierarchy recursively,
 whereas in HTTP it means traverse links recursively. I'm guessing
 file:// should work like FTP (i.e., recurse when the path is a
 directory, ignore HTML-ness), but anyway this is something that'd need
 answering.

Imagine you have a local mirror of your website and you want to know why
the site @HOSTINGPROVIDER has some files more or such.

You can spider the website @HOSTINGPROVIDER recursiv in a  local  tmp1
directory and then, with the same commandline, you can do the same  with
the local mirror and download the files recursive into tmp2 and  now
you and now you can make a recursive fs-diff and know  which  files  are
used...  on both, the local mirror and @HOSTINGPROVIDER

I was searching such feature several times and currently the only way is
to install a Webserver local which not always possibel.

Maybe this is a discussion worth?

Greetings
Michelle

-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/ 


signature.pgp
Description: Digital signature


Re: Support for file://

2008-09-20 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Michelle Konzack wrote:
 Imagine you have a local mirror of your website and you want to know why
 the site @HOSTINGPROVIDER has some files more or such.
 
 You can spider the website @HOSTINGPROVIDER recursiv in a  local  tmp1
 directory and then, with the same commandline, you can do the same  with
 the local mirror and download the files recursive into tmp2 and  now
 you and now you can make a recursive fs-diff and know  which  files  are
 used...  on both, the local mirror and @HOSTINGPROVIDER

I'm confused. If you can successfully download the files from
HOSTINGPROVIDER in the first place, then why would a difference exist?
And if you can't, then this wouldn't be an effective way to find out.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI1dYe7M8hyUobTrERAuuyAJ9m3ArCqxG4orhAQuEM010yWv6ScwCfaE9h
jXIjJ+XUjBYwyBdi8NB/rEY=
=NDnR
-END PGP SIGNATURE-


Re: Problem with libeay32.dll, ordinal 2253

2008-09-19 Thread Charles
On Wed, Sep 17, 2008 at 11:02 PM, Tobias Opialla
[EMAIL PROTECTED] wrote:
 Hey all,

 I hope this is the right adress, and you can help me.
 I'm currently trying to run a perlscript including some wget commands, but if 
 I try to run it, it says:
 The ordinal 2253 could not be located in the dynamic link library 
 LIBEAY32.dll.

Probably because of dll conflict between the version used by wget and
the version supplied by perl.
You could try renaming libeay32.dll found in perl/bin directory.


Problem with libeay32.dll, ordinal 2253

2008-09-17 Thread Tobias Opialla
Hey all,

I hope this is the right adress, and you can help me.
I'm currently trying to run a perlscript including some wget commands, but if I 
try to run it, it says:
The ordinal 2253 could not be located in the dynamic link library 
LIBEAY32.dll.

Any Ideas on that one? I couldn't find anythin on the web.

Regards, Tobias Opialla


Big files

2008-09-16 Thread Cristián Serpell

Hi

I would like to know if there is a reason for using a signed int for  
the length of the files to download. The thing is that I was trying to  
download a 2.3 GB file using wget, but then the length was printed as  
a negative number and wget said Aborted. Is it a bug or a design  
decision? Is there an option for downloading big files? In this case,  
I used curl.


Please CC replies, I'm not a suscriber

Thanks!
C S


Re: Big files

2008-09-16 Thread Doruk Fisek
Tue, 16 Sep 2008 11:19:50 -0400, Cristián Serpell
[EMAIL PROTECTED] :

 I would like to know if there is a reason for using a signed int for  
 the length of the files to download. The thing is that I was trying
 to download a 2.3 GB file using wget, but then the length was printed
 as a negative number and wget said Aborted. Is it a bug or a
 design decision?
Which version of wget are you using? It was a bug of older wget
versions. You can see it with the output of wget --version command
(latest version is 1.11.4).

I'm not having any trouble with downloading files bigger than 2G.

   Doruk

--
FISEK INSTITUTE - http://www.fisek.org.tr


RE: Big files

2008-09-16 Thread Tony Lewis
Cristián Serpell wrote:

 I would like to know if there is a reason for using a signed int for  
 the length of the files to download.

I would like to know why people still complain about bugs that were fixed
three years ago. (More accurately, it was a design flaw that originated from
a time when no computer OS supported files that big, but regardless of what
you call it, the change to wget was made to version 1.10 in 2005.)

Tony




Re: Big files

2008-09-16 Thread Cristián Serpell
It is the latest Ubuntu's distribution, that still comes with the old  
version.


Thanks anyway, that was the problem.

El 16-09-2008, a las 15:08, Tony Lewis escribió:


Cristián Serpell wrote:


I would like to know if there is a reason for using a signed int for
the length of the files to download.


I would like to know why people still complain about bugs that were  
fixed
three years ago. (More accurately, it was a design flaw that  
originated from
a time when no computer OS supported files that big, but regardless  
of what

you call it, the change to wget was made to version 1.10 in 2005.)

Tony






Re: Big files

2008-09-16 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Cristián Serpell wrote:
 It is the latest Ubuntu's distribution, that still comes with the old
 version.
 
 Thanks anyway, that was the problem.

I know that's untrue. Ubuntu comes with 1.10.2 at least, and has for
quite some time. If you're using that, then it's probably a different
bug than Doruk and Tony were thinking of (perhaps one of the cases of
content-length mishandling that were recently fixed in the 1.11.x series).

IIRC Intrepid Ibex (Ubuntu 8.10) will have 1.11.4.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI0AnI7M8hyUobTrERAqptAJoCj0VC46dBOhrr/A3HsHyicciKWQCffyFQ
bHhmuYHmf52Yz1M5lu7Yk5Y=
=Z+fN
-END PGP SIGNATURE-


Re: Big files

2008-09-16 Thread Cristián Serpell
Maybe I should have started by this (I had to change the name of the  
file shown):


[EMAIL PROTECTED]:/tmp# wget --version
GNU Wget 1.10.2

Copyright (C) 2005 Free Software Foundation, Inc.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

Originally written by Hrvoje Niksic [EMAIL PROTECTED].

[EMAIL PROTECTED]:/tmp# wget --debug http://program-linux64.tar.bz2
DEBUG output created by Wget 1.10.2 on linux-gnu.

--15:37:42--  http://program-linux64.tar.bz2
   = `program.tar.bz2'
Resolving www.ai.sri.com... 130.107.65.215
Caching www.ai.sri.com = 130.107.65.215
Connecting to www.ai.sri.com|130.107.65.215|:80... connected.
Created socket 3.
Releasing 0x0064a100 (new refcount 1).

---request begin---
GET /program-linux64.tar.bz2 HTTP/1.0
User-Agent: Wget/1.10.2
Accept: */*
Host: www.ai.sri.com
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Tue, 16 Sep 2008 19:37:46 GMT
Server: Apache
Last-Modified: Tue, 08 Apr 2008 20:17:51 GMT
ETag: 7f710a-8a8e1bf7-47fbd2ef
Accept-Ranges: bytes
Content-Length: -1970398217
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: application/x-tar

---response end---
200 OK
Registered socket 3 for persistent reuse.
Length: -1,970,398,217 [application/x-tar]

[ =]  
0 --.--K/s


Aborted

El 16-09-2008, a las 15:32, Micah Cowan escribió:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Cristián Serpell wrote:

It is the latest Ubuntu's distribution, that still comes with the old
version.

Thanks anyway, that was the problem.


I know that's untrue. Ubuntu comes with 1.10.2 at least, and has for
quite some time. If you're using that, then it's probably a different
bug than Doruk and Tony were thinking of (perhaps one of the cases of
content-length mishandling that were recently fixed in the 1.11.x  
series).


IIRC Intrepid Ibex (Ubuntu 8.10) will have 1.11.4.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI0AnI7M8hyUobTrERAqptAJoCj0VC46dBOhrr/A3HsHyicciKWQCffyFQ
bHhmuYHmf52Yz1M5lu7Yk5Y=
=Z+fN
-END PGP SIGNATURE-





RE: Big files

2008-09-16 Thread Tony Lewis
Cristián Serpell wrote:

 Maybe I should have started by this (I had to change the name of the  
 file shown):
[snip]
 ---response begin---
 HTTP/1.1 200 OK
 Date: Tue, 16 Sep 2008 19:37:46 GMT
 Server: Apache
 Last-Modified: Tue, 08 Apr 2008 20:17:51 GMT
 ETag: 7f710a-8a8e1bf7-47fbd2ef
 Accept-Ranges: bytes
 Content-Length: -1970398217

The problem is not with wget. It's with the Apache server, which told wget
that the file had a negative length.

Tony



Re: Hiding passwords found in redirect URLs

2008-09-13 Thread Thomas Corthals

Micah Cowan wrote:


Note: Saint Xavier has already written a fix for this, so it's not
actually a question of whether it's worth the bother, just whether it's
actually desired behavior.


Since it's desired in some situations but maybe not in others, the best 
solution would be to provide a switch for it that can be used in a 
user's .wgetrc and on the command line.


Now we only need to find out what's the desired default behaviour if the 
switch is missing. ;-)


Thomas Corthals



Re: Hiding passwords found in redirect URLs

2008-09-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Thomas Corthals wrote:
 Micah Cowan wrote:

 Note: Saint Xavier has already written a fix for this, so it's not
 actually a question of whether it's worth the bother, just whether it's
 actually desired behavior.
 
 Since it's desired in some situations but maybe not in others, the best
 solution would be to provide a switch for it that can be used in a
 user's .wgetrc and on the command line.

Well, yes, except I can't really imagining anyone ever _using_ such a
switch. Though I could envision people using the .wgetrc option. Still
seems like a lot of trouble to make a new option for such a little
thing. One could always use -nv in a pinch.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIzBiU7M8hyUobTrERAkchAJ9vajvughHFXR8yAJPPGt4YkaGY8ACfYXCR
vPCAZaYsRN6VcisBjDkmdzI=
=wMVt
-END PGP SIGNATURE-


A/R matching against query strings

2008-09-12 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On expanding current URI acc/rej matches to allow matching against query
strings, I've been considering how we might enable/disable this
functionality, with an eye toward backwards compatibility.

It seems to me that one usable approach would be to require the ?
query string to be an explicit part of rule, if it's expected to be
matched against query strings. So -A .htm,.gif,*Action=edit* would all
result in matches against the filename portion only, but -A
'\?*Action=edit*' would look for Action=edit within the query-string
portion. (The '\?' is necessary because otherwise '?' is a wildcard
character; [?] would also work.)

The disadvantage of that technique is that it's harder to specify that a
given string should be checked _anywhere_, regardless of whether it
falls in the filename or query-string portion; but I can't think offhand
of any realistic cases where that's actually useful. We could also
supply a --match-queries option to turn on matching of wildcard rules
for anywhere (non-wildcard suffix rules should still match only at the
end of the filename portion).

Another option is to use a separate -A-like option that does what -A
does for filenames, but matches against query strings. I like this idea
somewhat less.

Thoughts?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIyrXz7M8hyUobTrERAk+5AJ0ckiE4+bEMEFe9aD8bBNY3HH+IZACdERCs
wab0TyBLCbW/6DYm+8gAExM=
=pwb/
-END PGP SIGNATURE-


Hiding passwords found in redirect URLs

2008-09-12 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

https://savannah.gnu.org/bugs/index.php?21089

The report originator is copied in the recipients list for this message.

The situation is as follows: the user types wget
http://foo.com/file-i-want;. Wget asks the HTTP server for the
appropriate file, and gets a 302 redirection to the URL
ftp://spag:[EMAIL PROTECTED]. Wget will then issue to the log output, the line:

  Location: ftp://spag:[EMAIL PROTECTED]/mickie/file-you-want

with the password in plain view.

I'm uncertain that this is actually a problem. In this specific case,
it's a publicly-accessible URL redirecting to a password-protected file.
What's to hide, really?

Of course, the case gets more interesting when it's _not_ a
publicly-accessible URL. What about when the password is generated from
one the user supplied? That is, the original request was
http://spag:[EMAIL PROTECTED]/file-i-want, which resulted in a redirect
using the same username/password? Especially if it was an HTTPS request
rather than plain HTTP. A case could be made that it should be hidden in
that case.

On the other hands, in cases like the _original_ example given above,
I'd argue that hiding it could be the wrong thing: the user now has no
idea how to directly access the file, avoiding the redirect the next
time around.

Redirecting to a password-protected file on a different host or using a
different scheme seems broken to me in the first place, and I'm sorta
leaning towards not bothering about it. What are your thoughts, list?

Note: Saint Xavier has already written a fix for this, so it's not
actually a question of whether it's worth the bother, just whether it's
actually desired behavior.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIytyT7M8hyUobTrERAnC1AJ4pRpWx7z6wRt3Vg4LHyQalEfL3XQCdGTqg
LdK8lQ8tuPTlmCfURcjXPw4=
=ZPrY
-END PGP SIGNATURE-


small doc typo in 9.1 Robot Exclusion

2008-09-10 Thread Michael Kessler
9.1 Robot Exclusion

..
.
Although Wget is not a web robot in the strictest sense of the word, it
can downloads large parts of the site without the user's...
..
.

possibly meant:
...it can download large 

cheers 
michael



Re: Wget and Yahoo login?

2008-09-10 Thread Tony Godshall
And you'll probably have to do this again- I bet
yahoo expires the session cookies!


On Tue, Sep 9, 2008 at 2:18 PM, Donald Allen [EMAIL PROTECTED] wrote:
 After surprisingly little struggle, I got Plan B working -- logged into
 yahoo with wget, saved the cookies, including session cookies, and then
 proceeded to fetch pages using the saved cookies. Those pages came back
 logged in as me, with my customizations. Thanks to Tony, Daniel, and Micah
 -- you all provided critical advice in solving this problem.

 /Don

 On Tue, Sep 9, 2008 at 2:21 PM, Donald Allen [EMAIL PROTECTED] wrote:


 On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
 
 
  On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  Donald Allen wrote:
  I am doing the yahoo session login with firefox, not with wget,
  so I'm
  using the first and easier of your two suggested methods. I'm
  guessing
  you are thinking that I'm trying to login to the yahoo session with
  wget, and thus --keep-session-cookies and
  --save-cookies=foo.txt would
  make perfect sense to me, but that's not what I'm doing (yet --
  if I'm
  right about what's happening here, I'm going to have to resort to
  this).
  But using firefox to initiate the session, it looks to me like wget
  never gets to see the session cookies because I don't think firefox
  writes them to its cookie file (which actually makes sense -- if they
  only need to live as long as the session, why write them out?).
 
  Yes, and I understood this; the thing is, that if session cookies are
  involved (i.e., cookies that are marked for immediate expiration and
  are
  not meant to be saved to the cookies file), then I don't see how you
  have much choice other than to use the harder method, or else to fake
  the session cookies by manually inserting them to your cookies file or
  whatnot (not sure how well that may be expected to work). Or, yeah, add
  an explicit --header 'Cookie: ...'.
 
 
  Ah, the misunderstanding was that the stuff you thought I missed was
  intended to push me in the direction of Plan B -- log in to yahoo with
  wget.

 Yes; and that's entirely my fault, as I didn't explicitly say that.

 No problem.


  I understand now. I'll look at trying to make this work. Thanks
  for all the help, though I can't guarantee that you are done yet :-)
  But, hopefully, this exchange will benefit others.

 I was actually surprised you kept going after I pointed out that it
 required the Accept-Encoding header that results in gzipped content.

 That didn't faze me because the pages I'm after will be processed by a
 python program, so having to gunzip would not require a manual step.

 This behavior is a little surprising to me from Yahoo!. It's not
 surprising in _general_, but for a site that really wants to be as
 accessible as possible (I would think?), insisting on the latest
 browsers seems ill-advised.

 Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape,
 visit a site, and get a server-generated page that's empty other than
 the phrase You're not using Internet Explorer. :p

 And taking it one step further, I'm greatly enjoying watching Microsoft
 thrash around, trying to save themselves, which I don't think they will.
 Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not
 going to produce milk too much longer. I've just installed the Chrome beta
 on the Windows side of one of my machines (I grudgingly give it 10 Gb on
 each machine; Linux gets the rest), and it looks very, very nice. They've
 still got work to do, but they appear to be heading in a very good
 direction. These are smart people at Google. All signs seem to be pointing
 towards more and more computing happening on the server side in the coming
 years.

 /Don


 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik
 3HbbATyqnrm0hAJXqNTqpl4=
 =3XD/
 -END PGP SIGNATURE-






-- 
Best Regards.
Please keep in touch.
This is unedited.
P-)


Re: Wget and Yahoo login?

2008-09-09 Thread Daniel Stenberg

On Mon, 8 Sep 2008, Donald Allen wrote:

The page I get is what would be obtained if an un-logged-in user went to the 
specified url. Opening that same url in Firefox *does* correctly indicate 
that it is logged in as me and reflects my customizations.


First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts 
need. Then you read the capure and replay them as closely as possible using 
your tool.


As you will find out, sites like this use all sorts of funny tricks to figure 
out you and to make it hard to automate what you're trying to do. They tend to 
use javascripts for redirects and for fiddling with cookies just to make sure 
you have a javascript and cookie enabled browser. So you need to work hard(er) 
when trying this with non-browsers.


It's certainly still possible, even without using the browser to get the first 
cookie file. But it may take some effort.


--

 / daniel.haxx.se


Missing asprintf()

2008-09-09 Thread Gisle Vanem

Why the need for asprintf() in url.c:903? This function is
missing on DOS/Win32 and nowhere to be found in ./lib.

I suggest we replace with this:

--- hg-latest/src/url.c  Tue Sep 09 12:37:23 2008
+++ url.c   Tue Sep 09 13:01:33 2008
@@ -893,16 +893,18 @@

  if (error_code == PE_UNSUPPORTED_SCHEME)
{
-  char *error, *p;
+  char *p;
  char *scheme = xstrdup (url);
+  static char error[100];
+
  assert (url_has_scheme (url));

  if ((p = strchr (scheme, ':')))
*p = '\0';
  if (!strcasecmp (scheme, https))
-asprintf (error, _(HTTPS support not compiled in));
+sprintf (error, _(HTTPS support not compiled in));
  else
-asprintf (error, _(parse_errors[error_code]), quote (scheme));
+sprintf (error, _(parse_errors[error_code]), quote (scheme));
  xfree (scheme);

  return error;

---

Here 'error' is guaranteed to be big enough.

--gv


Where is program_name?

2008-09-09 Thread Gisle Vanem
'program_name' is used in lib/error.c, but it is not allocated 
anywhere. Should it be added to main.c and initialised to exec_name?


--gv


Re: Missing asprintf()

2008-09-09 Thread Hrvoje Niksic
Gisle Vanem [EMAIL PROTECTED] writes:

 Why the need for asprintf() in url.c:903? This function is missing
 on DOS/Win32 and nowhere to be found in ./lib.

Wget is supposed to use aprintf, which is defined in utils.c, and is
not specific to Unix.

It's preferable to use an asprintf-like functions than a static buffer
because it supports reentrance (unlike a static buffer) and imposes no
arbitrary limits on error output.


Re: Missing asprintf()

2008-09-09 Thread Gisle Vanem

Hrvoje Niksic [EMAIL PROTECTED] wrote:


Wget is supposed to use aprintf, which is defined in utils.c, and is
not specific to Unix.

It's preferable to use an asprintf-like functions than a static buffer
because it supports reentrance (unlike a static buffer) and imposes no
arbitrary limits on error output.


Fine by me. Here is an adjusted patch:

--- hg-latest/src/url.c  Tue Sep 09 12:37:23 2008
+++ url.c   Tue Sep 09 14:37:39 2008
@@ -900,9 +900,9 @@
  if ((p = strchr (scheme, ':')))
*p = '\0';
  if (!strcasecmp (scheme, https))
-asprintf (error, _(HTTPS support not compiled in));
+error = aprintf (_(HTTPS support not compiled in));
  else
-asprintf (error, _(parse_errors[error_code]), quote (scheme));
+error =aprintf (_(parse_errors[error_code]), quote (scheme));
  xfree (scheme);

  return error;

-

--gv


Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote:
 On Mon, 8 Sep 2008, Donald Allen wrote:

 The page I get is what would be obtained if an un-logged-in user went to
 the specified url. Opening that same url in Firefox *does* correctly
 indicate that it is logged in as me and reflects my customizations.

 First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts
 need. Then you read the capure and replay them as closely as possible using
 your tool.

 As you will find out, sites like this use all sorts of funny tricks to
 figure out you and to make it hard to automate what you're trying to do.
 They tend to use javascripts for redirects and for fiddling with cookies
 just to make sure you have a javascript and cookie enabled browser. So you
 need to work hard(er) when trying this with non-browsers.

 It's certainly still possible, even without using the browser to get the
 first cookie file. But it may take some effort.

I have not been able to retrieve a page with wget as if I were logged
in using --load-cookies and Micah's suggestion about 'Accept-Encoding'
(there was a typo in his message -- it's 'Accept-Encoding', not
'Accept-Encodings'). I did install livehttpheaders and tried
--no-cookies and --header cookie info from livehttpheaders and that
did work. Some of the cookie info sent by Firefox was a mystery,
because it's not in the cookie file. Perhaps that's the crucial
difference -- I'm speculating that wget isn't sending quite the same
thing as Firefox when --load-cookies is used, because Firefox is
adding stuff that isn't in the cookie file. Just a guess. Is there a
way to ask wget to print the headers it sends (ala livehttpheaders)?
I've looked through the options on the man page and didn't see
anything, though I might have missed it.


 --

  / daniel.haxx.se



Re: Where is program_name?

2008-09-09 Thread Saint Xavier

Hi,

* Gisle Vanem ([EMAIL PROTECTED]) wrote:
 'program_name' is used in lib/error.c, but it is not allocated anywhere. 
 Should it be added to main.c and initialised to exec_name?

$cd wget-mainline
$find . -name '*.[ch]' -exec fgrep -H -n 'program_name' '{}' \;
./lib/error.c:63:# define program_name program_invocation_name
   ^^^
./lib/error.c:95:/* The calling program should define program_name and set it 
to the
./lib/error.c:97:extern char *program_name;
./lib/error.c:248:  __fxprintf (NULL, %s: , program_name);
./lib/error.c:250:  fprintf (stderr, %s: , program_name);
./lib/error.c:307:  __fxprintf (NULL, %s:, program_name);
./lib/error.c:309:  fprintf (stderr, %s:, program_name);
./src/netrc.c:463:  char *program_name, *file, *target;
./src/netrc.c:472:  program_name = argv[0];

Google for that and you will find the corresponding man page. Like it's
written here 
http://www.tin.org/bin/man.cgi?section=3topic=PROGRAM_INVOCATION_NAME
These variables are automatically initialised by the glibc run-time
 startup code.

I've also opened Wget with GDB: the variable exists but seems to point to
a bad memory area... 

Sincerly,
Saint Xavier.


Re: Where is program_name?

2008-09-09 Thread Gisle Vanem

Google for that and you will find the corresponding man page. Like it's
written here 
http://www.tin.org/bin/man.cgi?section=3topic=PROGRAM_INVOCATION_NAME
These variables are automatically initialised by the glibc run-time
startup code.


I'm on Windows. So glibc is of no help here.

--gv


Re: Where is program_name?

2008-09-09 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Saint Xavier wrote:
 Hi,
 
 * Gisle Vanem ([EMAIL PROTECTED]) wrote:
 'program_name' is used in lib/error.c, but it is not allocated anywhere. 
 Should it be added to main.c and initialised to exec_name?
 
 $cd wget-mainline
 $find . -name '*.[ch]' -exec fgrep -H -n 'program_name' '{}' \;
 ./lib/error.c:63:# define program_name program_invocation_name
^^^
 ./lib/error.c:95:/* The calling program should define program_name and set it 
 to the
  ^^^

Looks to me like we're expected to supply it. Line 63 is only evaluated
when we're using glibc; otherwise, we need to provide it. The differing
name is probably so we can define it unconditionally.

It appears that lib/error.c isn't even _built_ on my system, perhaps
because glibc supplies what it would fill in. This makes testing a
little dificult. Anyway, see if this fixes your trouble:

diff -r 0c2e02c4f4f3 src/ChangeLog
- --- a/src/ChangeLog Tue Sep 09 09:29:50 2008 -0700
+++ b/src/ChangeLog Tue Sep 09 09:40:00 2008 -0700
@@ -1,3 +1,7 @@
+2008-09-09  Micah Cowan  [EMAIL PROTECTED]
+
+   * main.c: Define program_name for lib/error.c.
+
 2008-09-02  Gisle Vanem  [EMAIL PROTECTED]

* mswindows.h: Must ensure stdio.h is included before
diff -r 0c2e02c4f4f3 src/main.c
- --- a/src/main.cTue Sep 09 09:29:50 2008 -0700
+++ b/src/main.cTue Sep 09 09:40:00 2008 -0700
@@ -826,6 +826,8 @@
   exit (0);
 }

+char *program_name; /* Needed by lib/error.c. */
+
 int
 main (int argc, char **argv)
 {
@@ -833,6 +835,8 @@
   int i, ret, longindex;
   int nurl, status;
   bool append_to_log = false;
+
+  program_name = argv[0];

   i18n_initialize ();



- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxqf67M8hyUobTrERAq0+AJ9KIOFDn9FiDXIIlU6M7DsupDmPYQCcDuoo
9bgAQnuKpgYMvnwc18svfYg=
=DXYi
-END PGP SIGNATURE-


Re: Wget and Yahoo login?

2008-09-09 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Donald Allen wrote:
 On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote:
 On Mon, 8 Sep 2008, Donald Allen wrote:

 The page I get is what would be obtained if an un-logged-in user went to
 the specified url. Opening that same url in Firefox *does* correctly
 indicate that it is logged in as me and reflects my customizations.
 First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts
 need. Then you read the capure and replay them as closely as possible using
 your tool.

 As you will find out, sites like this use all sorts of funny tricks to
 figure out you and to make it hard to automate what you're trying to do.
 They tend to use javascripts for redirects and for fiddling with cookies
 just to make sure you have a javascript and cookie enabled browser. So you
 need to work hard(er) when trying this with non-browsers.

 It's certainly still possible, even without using the browser to get the
 first cookie file. But it may take some effort.
 
 I have not been able to retrieve a page with wget as if I were logged
 in using --load-cookies and Micah's suggestion about 'Accept-Encoding'
 (there was a typo in his message -- it's 'Accept-Encoding', not
 'Accept-Encodings'). I did install livehttpheaders and tried
 --no-cookies and --header cookie info from livehttpheaders and that
 did work.

That's how I did it as well (except I got the headers from tcpdump); I'm
using Firefox 3, so don't have access to FF's new sqllite-based cookies
file (apart from the patch at
http://wget.addictivecode.org/FrontPage?action=AttachFiledo=viewtarget=wget-firefox3-cookie.patch).

 Some of the cookie info sent by Firefox was a mystery,
 because it's not in the cookie file. Perhaps that's the crucial
 difference -- I'm speculating that wget isn't sending quite the same
 thing as Firefox when --load-cookies is used, because Firefox is
 adding stuff that isn't in the cookie file. Just a guess.

Probably there are session cookies involved, that are sent in the first
page, that you're not sending back with the form submit.
- --keep-session-cookies and --save-cookies=foo.txt make a good
combination.

 Is there a
 way to ask wget to print the headers it sends (ala livehttpheaders)?
 I've looked through the options on the man page and didn't see
 anything, though I might have missed it.

- --debug

- --
HTH,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxqL77M8hyUobTrERAovFAJ9yagS2xW+2wFG65BwiFkJNfTMylgCfYaq7
1vOmTDimFg8E7Cn+Q+HGZn8=
=JKXH
-END PGP SIGNATURE-


Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 12:23 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
  On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote:
  On Mon, 8 Sep 2008, Donald Allen wrote:
 
  The page I get is what would be obtained if an un-logged-in user went
 to
  the specified url. Opening that same url in Firefox *does* correctly
  indicate that it is logged in as me and reflects my customizations.
  First, LiveHTTPHeaders is the Firefox plugin everyone who tries these
 stunts
  need. Then you read the capure and replay them as closely as possible
 using
  your tool.
 
  As you will find out, sites like this use all sorts of funny tricks to
  figure out you and to make it hard to automate what you're trying to do.
  They tend to use javascripts for redirects and for fiddling with cookies
  just to make sure you have a javascript and cookie enabled browser. So
 you
  need to work hard(er) when trying this with non-browsers.
 
  It's certainly still possible, even without using the browser to get the
  first cookie file. But it may take some effort.
 
  I have not been able to retrieve a page with wget as if I were logged
  in using --load-cookies and Micah's suggestion about 'Accept-Encoding'
  (there was a typo in his message -- it's 'Accept-Encoding', not
  'Accept-Encodings'). I did install livehttpheaders and tried
  --no-cookies and --header cookie info from livehttpheaders and that
  did work.

 That's how I did it as well (except I got the headers from tcpdump); I'm
 using Firefox 3, so don't have access to FF's new sqllite-based cookies
 file (apart from the patch at

 http://wget.addictivecode.org/FrontPage?action=AttachFiledo=viewtarget=wget-firefox3-cookie.patch
 ).

  Some of the cookie info sent by Firefox was a mystery,
  because it's not in the cookie file. Perhaps that's the crucial
  difference -- I'm speculating that wget isn't sending quite the same
  thing as Firefox when --load-cookies is used, because Firefox is
  adding stuff that isn't in the cookie file. Just a guess.

 Probably there are session cookies involved, that are sent in the first
 page, that you're not sending back with the form submit.
 - --keep-session-cookies and --save-cookies=foo.txt make a good
 combination.

  Is there a
  way to ask wget to print the headers it sends (ala livehttpheaders)?
  I've looked through the options on the man page and didn't see
  anything, though I might have missed it.

 - --debug


Well, I rebuilt my wget with the 'debug' use flag and ran it on the yahoo
test page (after having logged in to yahoo with firefox, of course) with
--load-cookies and the accept-encoding header item, with --debug. Very
useful. wget is sending every cookie item in firefox's cookies.txt. But
firefox sends three additional cookie items in the header that wget does not
send. Those items are *not* in firefox's cookies.txt so wget has no way of
knowing about them. Is it possible that firefox is not writing session
cookies to the file?

The result of this test, just to be clear, was a page that indicated yahoo
thought I was not logged in. Those extra items firefox is sending appear to
be the difference, because I included them (from the livehttpheaders output)
when I tried sending the cookies manually with --header, I got the same page
back with wget that indicated that yahoo knew I was logged in and formatted
with page with my preferences.

/Don





 - --
 HTH,
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxqL77M8hyUobTrERAovFAJ9yagS2xW+2wFG65BwiFkJNfTMylgCfYaq7
 1vOmTDimFg8E7Cn+Q+HGZn8=
 =JKXH
 -END PGP SIGNATURE-



Re: Wget and Yahoo login?

2008-09-09 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Donald Allen wrote:
 The result of this test, just to be clear, was a page that indicated
 yahoo thought I was not logged in. Those extra items firefox is sending
 appear to be the difference, because I included them (from the
 livehttpheaders output) when I tried sending the cookies manually with
 --header, I got the same page back with wget that indicated that yahoo
 knew I was logged in and formatted with page with my preferences.

Perhaps you missed this in my last message:

 Probably there are session cookies involved, that are sent in the first
 page, that you're not sending back with the form submit.
 --keep-session-cookies and --save-cookies=foo.txt make a good
 combination.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxrJ17M8hyUobTrERAvdsAJ9XEwMfimHXRUXKtV66P+YsG+tA7gCfWKbq
nCqAmXJfU3kTncMQkKk0JZo=
=17Yr
-END PGP SIGNATURE-


Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 1:29 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
  The result of this test, just to be clear, was a page that indicated
  yahoo thought I was not logged in. Those extra items firefox is sending
  appear to be the difference, because I included them (from the
  livehttpheaders output) when I tried sending the cookies manually with
  --header, I got the same page back with wget that indicated that yahoo
  knew I was logged in and formatted with page with my preferences.

 Perhaps you missed this in my last message:

  Probably there are session cookies involved, that are sent in the first
  page, that you're not sending back with the form submit.
  --keep-session-cookies and --save-cookies=foo.txt make a good
  combination.


I think we're mis-communicating, easily my fault, since I know just enough
about this stuff to be dangerous.

I am doing the yahoo session login with firefox, not with wget, so I'm using
the first and easier of your two suggested methods. I'm guessing you are
thinking that I'm trying to login to the yahoo session with wget, and thus
--keep-session-cookies and --save-cookies=foo.txt would make perfect sense
to me, but that's not what I'm doing (yet -- if I'm right about what's
happening here, I'm going to have to resort to this). But using firefox to
initiate the session, it looks to me like wget never gets to see the session
cookies because I don't think firefox writes them to its cookie file (which
actually makes sense -- if they only need to live as long as the session,
why write them out?).

/Don





 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxrJ17M8hyUobTrERAvdsAJ9XEwMfimHXRUXKtV66P+YsG+tA7gCfWKbq
 nCqAmXJfU3kTncMQkKk0JZo=
 =17Yr
 -END PGP SIGNATURE-



Re: Wget and Yahoo login?

2008-09-09 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Donald Allen wrote:
 I am doing the yahoo session login with firefox, not with wget, so I'm
 using the first and easier of your two suggested methods. I'm guessing
 you are thinking that I'm trying to login to the yahoo session with
 wget, and thus --keep-session-cookies and --save-cookies=foo.txt would
 make perfect sense to me, but that's not what I'm doing (yet -- if I'm
 right about what's happening here, I'm going to have to resort to this).
 But using firefox to initiate the session, it looks to me like wget
 never gets to see the session cookies because I don't think firefox
 writes them to its cookie file (which actually makes sense -- if they
 only need to live as long as the session, why write them out?).

Yes, and I understood this; the thing is, that if session cookies are
involved (i.e., cookies that are marked for immediate expiration and are
not meant to be saved to the cookies file), then I don't see how you
have much choice other than to use the harder method, or else to fake
the session cookies by manually inserting them to your cookies file or
whatnot (not sure how well that may be expected to work). Or, yeah, add
an explicit --header 'Cookie: ...'.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxrVD7M8hyUobTrERAt19AJ9bmmczCKjzMtGCoXb8B5g25uMLRQCeK8qh
M57W3Reqj+/pO8GuDwb9Nok=
=ajp/
-END PGP SIGNATURE-


Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
  I am doing the yahoo session login with firefox, not with wget, so I'm
  using the first and easier of your two suggested methods. I'm guessing
  you are thinking that I'm trying to login to the yahoo session with
  wget, and thus --keep-session-cookies and --save-cookies=foo.txt would
  make perfect sense to me, but that's not what I'm doing (yet -- if I'm
  right about what's happening here, I'm going to have to resort to this).
  But using firefox to initiate the session, it looks to me like wget
  never gets to see the session cookies because I don't think firefox
  writes them to its cookie file (which actually makes sense -- if they
  only need to live as long as the session, why write them out?).

 Yes, and I understood this; the thing is, that if session cookies are
 involved (i.e., cookies that are marked for immediate expiration and are
 not meant to be saved to the cookies file), then I don't see how you
 have much choice other than to use the harder method, or else to fake
 the session cookies by manually inserting them to your cookies file or
 whatnot (not sure how well that may be expected to work). Or, yeah, add
 an explicit --header 'Cookie: ...'.


Ah, the misunderstanding was that the stuff you thought I missed was
intended to push me in the direction of Plan B -- log in to yahoo with wget.
I understand now. I'll look at trying to make this work. Thanks for all the
help, though I can't guarantee that you are done yet :-) But, hopefully,
this exchange will benefit others.

/Don



 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxrVD7M8hyUobTrERAt19AJ9bmmczCKjzMtGCoXb8B5g25uMLRQCeK8qh
 M57W3Reqj+/pO8GuDwb9Nok=
 =ajp/
 -END PGP SIGNATURE-



Re: Wget and Yahoo login?

2008-09-09 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Donald Allen wrote:
 
 
 On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] wrote:
 
 Donald Allen wrote:
 I am doing the yahoo session login with firefox, not with wget,
 so I'm
 using the first and easier of your two suggested methods. I'm
 guessing
 you are thinking that I'm trying to login to the yahoo session with
 wget, and thus --keep-session-cookies and
 --save-cookies=foo.txt would
 make perfect sense to me, but that's not what I'm doing (yet --
 if I'm
 right about what's happening here, I'm going to have to resort to
 this).
 But using firefox to initiate the session, it looks to me like wget
 never gets to see the session cookies because I don't think firefox
 writes them to its cookie file (which actually makes sense -- if they
 only need to live as long as the session, why write them out?).
 
 Yes, and I understood this; the thing is, that if session cookies are
 involved (i.e., cookies that are marked for immediate expiration and are
 not meant to be saved to the cookies file), then I don't see how you
 have much choice other than to use the harder method, or else to fake
 the session cookies by manually inserting them to your cookies file or
 whatnot (not sure how well that may be expected to work). Or, yeah, add
 an explicit --header 'Cookie: ...'.
 
 
 Ah, the misunderstanding was that the stuff you thought I missed was
 intended to push me in the direction of Plan B -- log in to yahoo with
 wget.

Yes; and that's entirely my fault, as I didn't explicitly say that.

 I understand now. I'll look at trying to make this work. Thanks
 for all the help, though I can't guarantee that you are done yet :-)
 But, hopefully, this exchange will benefit others.

I was actually surprised you kept going after I pointed out that it
required the Accept-Encoding header that results in gzipped content.
This behavior is a little surprising to me from Yahoo!. It's not
surprising in _general_, but for a site that really wants to be as
accessible as possible (I would think?), insisting on the latest
browsers seems ill-advised.

Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape,
visit a site, and get a server-generated page that's empty other than
the phrase You're not using Internet Explorer. :p

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik
3HbbATyqnrm0hAJXqNTqpl4=
=3XD/
-END PGP SIGNATURE-


Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
 
 
  On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  Donald Allen wrote:
  I am doing the yahoo session login with firefox, not with wget,
  so I'm
  using the first and easier of your two suggested methods. I'm
  guessing
  you are thinking that I'm trying to login to the yahoo session with
  wget, and thus --keep-session-cookies and
  --save-cookies=foo.txt would
  make perfect sense to me, but that's not what I'm doing (yet --
  if I'm
  right about what's happening here, I'm going to have to resort to
  this).
  But using firefox to initiate the session, it looks to me like wget
  never gets to see the session cookies because I don't think firefox
  writes them to its cookie file (which actually makes sense -- if they
  only need to live as long as the session, why write them out?).
 
  Yes, and I understood this; the thing is, that if session cookies are
  involved (i.e., cookies that are marked for immediate expiration and are
  not meant to be saved to the cookies file), then I don't see how you
  have much choice other than to use the harder method, or else to fake
  the session cookies by manually inserting them to your cookies file or
  whatnot (not sure how well that may be expected to work). Or, yeah, add
  an explicit --header 'Cookie: ...'.
 
 
  Ah, the misunderstanding was that the stuff you thought I missed was
  intended to push me in the direction of Plan B -- log in to yahoo with
  wget.

 Yes; and that's entirely my fault, as I didn't explicitly say that.


No problem.



  I understand now. I'll look at trying to make this work. Thanks
  for all the help, though I can't guarantee that you are done yet :-)
  But, hopefully, this exchange will benefit others.

 I was actually surprised you kept going after I pointed out that it
 required the Accept-Encoding header that results in gzipped content.


That didn't faze me because the pages I'm after will be processed by a
python program, so having to gunzip would not require a manual step.


 This behavior is a little surprising to me from Yahoo!. It's not
 surprising in _general_, but for a site that really wants to be as
 accessible as possible (I would think?), insisting on the latest
 browsers seems ill-advised.

 Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape,
 visit a site, and get a server-generated page that's empty other than
 the phrase You're not using Internet Explorer. :p


And taking it one step further, I'm greatly enjoying watching Microsoft
thrash around, trying to save themselves, which I don't think they will.
Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not
going to produce milk too much longer. I've just installed the Chrome beta
on the Windows side of one of my machines (I grudgingly give it 10 Gb on
each machine; Linux gets the rest), and it looks very, very nice. They've
still got work to do, but they appear to be heading in a very good
direction. These are smart people at Google. All signs seem to be pointing
towards more and more computing happening on the server side in the coming
years.

/Don




 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik
 3HbbATyqnrm0hAJXqNTqpl4=
 =3XD/
 -END PGP SIGNATURE-



Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
After surprisingly little struggle, I got Plan B working -- logged into
yahoo with wget, saved the cookies, including session cookies, and then
proceeded to fetch pages using the saved cookies. Those pages came back
logged in as me, with my customizations. Thanks to Tony, Daniel, and Micah
-- you all provided critical advice in solving this problem.

/Don

On Tue, Sep 9, 2008 at 2:21 PM, Donald Allen [EMAIL PROTECTED] wrote:



 On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
 
 
  On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  Donald Allen wrote:
  I am doing the yahoo session login with firefox, not with wget,
  so I'm
  using the first and easier of your two suggested methods. I'm
  guessing
  you are thinking that I'm trying to login to the yahoo session with
  wget, and thus --keep-session-cookies and
  --save-cookies=foo.txt would
  make perfect sense to me, but that's not what I'm doing (yet --
  if I'm
  right about what's happening here, I'm going to have to resort to
  this).
  But using firefox to initiate the session, it looks to me like wget
  never gets to see the session cookies because I don't think firefox
  writes them to its cookie file (which actually makes sense -- if they
  only need to live as long as the session, why write them out?).
 
  Yes, and I understood this; the thing is, that if session cookies are
  involved (i.e., cookies that are marked for immediate expiration and are
  not meant to be saved to the cookies file), then I don't see how you
  have much choice other than to use the harder method, or else to fake
  the session cookies by manually inserting them to your cookies file or
  whatnot (not sure how well that may be expected to work). Or, yeah, add
  an explicit --header 'Cookie: ...'.
 
 
  Ah, the misunderstanding was that the stuff you thought I missed was
  intended to push me in the direction of Plan B -- log in to yahoo with
  wget.

 Yes; and that's entirely my fault, as I didn't explicitly say that.


 No problem.



  I understand now. I'll look at trying to make this work. Thanks
  for all the help, though I can't guarantee that you are done yet :-)
  But, hopefully, this exchange will benefit others.

 I was actually surprised you kept going after I pointed out that it
 required the Accept-Encoding header that results in gzipped content.


 That didn't faze me because the pages I'm after will be processed by a
 python program, so having to gunzip would not require a manual step.


 This behavior is a little surprising to me from Yahoo!. It's not
 surprising in _general_, but for a site that really wants to be as
 accessible as possible (I would think?), insisting on the latest
 browsers seems ill-advised.

 Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape,
 visit a site, and get a server-generated page that's empty other than
 the phrase You're not using Internet Explorer. :p


 And taking it one step further, I'm greatly enjoying watching Microsoft
 thrash around, trying to save themselves, which I don't think they will.
 Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not
 going to produce milk too much longer. I've just installed the Chrome beta
 on the Windows side of one of my machines (I grudgingly give it 10 Gb on
 each machine; Linux gets the rest), and it looks very, very nice. They've
 still got work to do, but they appear to be heading in a very good
 direction. These are smart people at Google. All signs seem to be pointing
 towards more and more computing happening on the server side in the coming
 years.

 /Don




 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik
 3HbbATyqnrm0hAJXqNTqpl4=
 =3XD/
 -END PGP SIGNATURE-





Hello, All and bug #21793

2008-09-08 Thread David Coon
Hello everyone,

I thought I'd introduce myself to you all, as I intend to start helping out
with wget.  This will be my first time contributing to any kind of free or
open source software, so I may have some basic questions down the line about
best practices and such, though I'll try to keep that to a minimum.

Anyway, I've been researching unicode and utf-8 recently, so I'm gonna try
to tackle bug #21793 https://savannah.gnu.org/bugs/?21793.

-David A Coon


Re: Hello, All and bug #21793

2008-09-08 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

David Coon wrote:
 Hello everyone,
 
 I thought I'd introduce myself to you all, as I intend to start helping
 out with wget.  This will be my first time contributing to any kind of
 free or open source software, so I may have some basic questions down
 the line about best practices and such, though I'll try to keep that to
 a minimum.
 
 Anyway, I've been researching unicode and utf-8 recently, so I'm gonna
 try to tackle bug #21793 https://savannah.gnu.org/bugs/?21793. 

Hi David, and welcome!

If you haven't already, please see
http://wget.addictivecode.org/HelpingWithWget

I'd encourage you to get a Savannah account, so I can assign that bug to
you. Also, I tend to hang out quite a bit on IRC (#wget @
irc.freenode.net), so you might want to sign on there.

Since you mentioned an interest in Unicode and UTF-8, you might want to
check out Saint Xavier's recent work on IRI and iDNS support in Wget,
which is available at http://hg.addictivecode.org/wget/sxav/.

Among other things, sxav's additions make Wget more aware of the user's
locale, so it might be useful for providing a feature to automatically
transcode filenames to the user's locale, rather than just supporting
UTF-8 only (which should still probably remain an explicit option). If
that sounds like the direction you'd like to take it, you should
probably base your work on sxav's repository, rather than mainline.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxViR7M8hyUobTrERAv/jAJ9/DxAaPaYpdLJojX9gorHn2hqwSACeK7oD
veVZAIH2NjbYI8dG6DimjRg=
=9Qau
-END PGP SIGNATURE-


Wget and Yahoo login?

2008-09-08 Thread Donald Allen
There was a recent discussion concerning using wget to obtain pages
from yahoo logged into yahoo as a particular user. Micah replied to
Rick Nakroshis with instructions describing two methods for doing
this. This information has also been added by Micah to the wiki.

I just tried the simpler of the two methods -- logging into yahoo with
my browser (Firefox 2.0.0.16) and then downloading a page with

wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home
directory/.mozilla/firefox/id2dmo7r.default/cookies.txt
'http://yahoo url'

The page I get is what would be obtained if an un-logged-in user went
to the specified url. Opening that same url in Firefox *does*
correctly indicate that it is logged in as me and reflects my
customizations.

wget -V:
GNU Wget 1.11.1

I am running a reasonably up-to-date Gentoo system (updated within the
last month) on a Thinkpad X61.

Have I missed something here? Any help will be appreciated. Please
include my personal address in your replies as I am not (yet) a
subscriber to this list.

Thanks --
/Don Allen


Re: Wget and Yahoo login?

2008-09-08 Thread Donald Allen
2008/9/8 Tony Godshall [EMAIL PROTECTED]:
 I haven't done this but I can speculate that you need to
 have wget identify itself as firefox.

When I read this, I thought it looked promising, but it doesn't work.
I tried sending exactly the user-agent string firefox is sending and
still got a page from yahoo that clearly indicates yahoo thinks I'm
not logged in.

/Don


 Quote from man wget...

   -U agent-string
   --user-agent=agent-string
   Identify as agent-string to the HTTP server.

   The HTTP protocol allows the clients to identify themselves
 using a User-Agent header field.  This enables distinguishing the
 WWW software,
   usually for statistical purposes or for tracing of protocol
 violations.  Wget normally identifies as Wget/version, version being
 the current ver‐
   sion number of Wget.

   However, some sites have been known to impose the policy of
 tailoring the output according to the User-Agent-supplied
 information.  While this
   is not such a bad idea in theory, it has been abused by
 servers denying information to clients other than (historically)
 Netscape or, more fre‐
   quently, Microsoft Internet Explorer.  This option allows
 you to change the User-Agent line issued by Wget.  Use of this
 option is discouraged,
   unless you really know what you are doing.


 On Mon, Sep 8, 2008 at 12:25 PM, Donald Allen [EMAIL PROTECTED] wrote:
 There was a recent discussion concerning using wget to obtain pages
 from yahoo logged into yahoo as a particular user. Micah replied to
 Rick Nakroshis with instructions describing two methods for doing
 this. This information has also been added by Micah to the wiki.

 I just tried the simpler of the two methods -- logging into yahoo with
 my browser (Firefox 2.0.0.16) and then downloading a page with

 wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home
 directory/.mozilla/firefox/id2dmo7r.default/cookies.txt
 'http://yahoo url'

 The page I get is what would be obtained if an un-logged-in user went
 to the specified url. Opening that same url in Firefox *does*
 correctly indicate that it is logged in as me and reflects my
 customizations.

 wget -V:
 GNU Wget 1.11.1

 I am running a reasonably up-to-date Gentoo system (updated within the
 last month) on a Thinkpad X61.

 Have I missed something here? Any help will be appreciated. Please
 include my personal address in your replies as I am not (yet) a
 subscriber to this list.

 Thanks --
 /Don Allen




 --
 Best Regards.
 Please keep in touch.
 This is unedited.
 P-)



Re: Wget and Yahoo login?

2008-09-08 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Donald Allen wrote:
 There was a recent discussion concerning using wget to obtain pages
 from yahoo logged into yahoo as a particular user. Micah replied to
 Rick Nakroshis with instructions describing two methods for doing
 this. This information has also been added by Micah to the wiki.
 
 I just tried the simpler of the two methods -- logging into yahoo with
 my browser (Firefox 2.0.0.16) and then downloading a page with
 
 wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home
 directory/.mozilla/firefox/id2dmo7r.default/cookies.txt
 'http://yahoo url'
 
 The page I get is what would be obtained if an un-logged-in user went
 to the specified url. Opening that same url in Firefox *does*
 correctly indicate that it is logged in as me and reflects my
 customizations.

Are you signing into the main Yahoo! site?

When I try to do so, whether I use the cookies or no, I get a message
about update your browser to something more modern or the like. The
difference appears to be a combination of _both_ User-Agent (as you've
done), _and_ --header Accept-Encodings: gzip,deflate. This plus
appropriate cookies gets me a decent logged-in page, but of course it's
gzip-compressed.

Since Wget doesn't currently support gzip-decoding and the like, that
makes the use of Wget in this situation cumbersome. Support for
something like this probably won't be seen until 1.13 or 1.14, I'm afraid.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxdw77M8hyUobTrERAi/QAJ0atPMeUQ/0YCNwAP+XiH4nDyvclwCcDxYo
obud0CjpATBYDvA0eS3ZHGY=
=vv4R
-END PGP SIGNATURE-


Internet Draft for Metalink XML Download Description Format (draft-bryan-metalink-02)

2008-09-05 Thread Anthony Bryan
Greetings,

The Internet Draft for Metalink is available at
http://tools.ietf.org/html/draft-bryan-metalink-02
with interim revisions at
http://metalinks.svn.sourceforge.net/viewvc/metalinks/internetdraft/ .

We're looking for review and public comments.

Metalink is currently supported by some 35 applications and used by
projects such as OpenOffice.org, openSUSE, Ubuntu, cURL, and others.

 Metalink is an XML-based document format that describes a file or
 lists of files to be added to a download queue.  Lists are composed
 of a number of files, each with an extensible set of attached
 metadata.  For example, each file can have a description, checksum,
 and list of URIs that it is available from.

 The primary use case that Metalink addresses is the description of
 downloadable content in a format so download agents can act
 intelligently and recover from common errors with little or no user
 interaction necessary.  These errors can include multiple servers
 going down and data corrupted in transmission.

Example .metalink file:

   ?xml version=1.0 encoding=UTF-8?
   metalink xmlns=http://www.metalinker.org;
 published2008-05-15T12:23:23Z/published
 files
   file name=example.ext
 identityExample/identity
 version1.0/version
 descriptionA description of the example file for
   download./description
 verification
   hash type=md583b1a04f18d6782cfe0407edadac377f/hash
   hash type=sha-180bc95fd391772fa61c91ed68567f0980bb45fd9
   /hash
 /verification
 resources
   urlftp://ftp.example.com/example.ext/url
   urlhttp://example.com/example.ext/url
 /resources
   /file
 /files
   /metalink

Thank you,
--
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
 )) Easier, More Reliable, Self Healing Downloads


Re: [wget-notify] add a new option

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

houda hocine wrote:
  Hi,

Hi houda.

This message was sent to the wget-notify, which was not the proper
forum. Wget-notify is reserved for bug-change and (previously) commit
notifications, and is not intended for discussion (though I obviously
haven't blocked discussions; the original intent was to be able to
discuss commits, but I'm not sure I need to allow discussions any more,
so it may be disallowed soon).

The appropriate list would be wget@sunsite.dk, to which this discussion
has been redirected.

 we create a new format for archiviving (. warc), and we want to ensure
 that wget generate directly this format from the input url .
 You can help me by some ideas  to achieve this new option?
 The format is (warc -wget url)
 I am in the process of trying to understand the source code to add this
 new option.  Which .c  file fallows me to do this?

Doing this is not likely to be a trivial undertaking: the current
file-output interface isn't really abstracted enough to allow this, so
basically you'll need to modify most of the existing .c files. We are
hoping at some future point to allow for a more generic output format,
for direct output to (for instance) tarballs and .mhtml archives. At
that point, it'd probably be fairly easy to write extensions to do what
you want.

In the meantime, though, it'll be a pain in the butt. I can't really
offer much help; the best way to understand the source is to read and
explore it. However, on the general topic of adding new options to Wget,
Tony Lewis has written the excellent guide at
http://wget.addictivecode.org/OptionsHowto. Hope that helps!

Please note that I won't likely be entertaining patches to Wget to make
it output to non-mainstream archive formats, and even once generic
output mechanisms are supported, the mainstream archive formats will
most likely be supported as extension plugins or similar, and not as
built-in support within Wget.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvbyf7M8hyUobTrERApl8AJwNvWOdDd0Z//wbNzN/jyZFqKI5iQCfQOx4
3zlxPGaVqjsPhwa7ZwB4wrs=
=Zy+N
-END PGP SIGNATURE-


Re: Checking out Wget

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

vinothkumar raman wrote:
 Hi all,
 
 I need to checkout the complete source into my local hard disk. I am using
 WinCVS when i searched for the module its saying that there is no module
 information out there. Could any one help me out i am a complete novice in
 this regard.

WinCVS won't work, because there _is_ in fact no CVS module for Wget.
Wget uses Mercurial as the source repository (and was using Subversion
prior to that). For more information about the Wget source repository
and its use, see http://wget.addictivecode.org/RepositoryAccess

That page focuses on using the hg command-line tool; you may prefer to
use TortoiseHg instead, http://tortoisehg.sourceforge.net/. The page
does offer additional information about the repository and what is
required to build from those sources.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvb4n7M8hyUobTrERAnquAJ9ItMQH1QYgXvyYTI6/IZDScIFGoACfVlqd
p+LMC9AK5/SwYPyuGVfd5Ns=
=RmLO
-END PGP SIGNATURE-


Re: [BUG:#20329] If-Modified-Since support

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

vinothkumar raman wrote:
 We need to give out the time stamp the local file in the Request
 header for that we need to pass on the local file's time stamp from
 http_loop() to get_http() . The only way to pass on this without
 altering the signature of the function is to add a field to struct url
 in url.h
 
 Could we go for it?

That is acceptable.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvb5B7M8hyUobTrERAv2YAJ0ajYx+pynFLtV2YmEw7fA+vwf8ugCfSaU1
AFkIYSyyyS4egbyXjzBLXBo=
=fIT5
-END PGP SIGNATURE-


Re: [bug #20329] Make HTTP timestamping use If-Modified-Since

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Yes, that's what it means.

I'm not yet committed to doing this. I'd like to see first how many
mainstream servers will respect If-Modified-Since when given as part of
an HTTP/1.0 request (in comparison to how they respond when it's part of
an HTTP/1.1 request). If common servers ignore it in HTTP/1.0, but not
in HTTP/1.1, that'd be an excellent case for holding off until we're
doing HTTP/1.1 requests.

Also, I don't think removing the previous HEAD request code is
entirely accurate: we probably would want to detect when a server is
feeding us non-new content in response to If-Modified-Since, and adjust
to use the current HEAD method instead as a fallback.

- -Micah

vinothkumar raman wrote:
 This mean we should remove the previous HEAD request code and use
 If-Modified-Since by default and have it to handle all the request and
 store pages if it is not returning a 304 response
 
 Is it so?
 
 
 On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan [EMAIL PROTECTED] wrote:
 Follow-up Comment #4, bug #20329 (project wget):

 verbatim-mode's not all that readable.

 The gist is, we should go ahead and use If-Modified-Since, perhaps even now
 before there's true HTTP/1.1 support (provided it works in a reasonable
 percentage of cases); and just ensure that any Last-Modified header is sane.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvb7t7M8hyUobTrERAsvQAJ4k7fKrsFtfC4MQtuvE3Ouwz6LseACePqt2
8JiRBKtEhmcK3schVVO347A=
=yCJV
-END PGP SIGNATURE-


Re: Support for file://

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Petri Koistinen wrote:
 Hi,
 
 I would be nice if wget would also support file://.

Feel free to file an issue for this (I'll mark it Needs Discussion and
set at low priority). I'd thought there was already an issue for this,
but can't find it (either open or closed). I know this has come up
before, at least.

I think I'd need some convincing on this, as well as a clear definition
of what the scope for such a feature ought to be. Unlike curl, which
groks urls, Wget W(eb)-gets, and file:// can't really be argued to
be part of the web.

That in and of itself isn't really a reason not to support it, but my
real misgivings have to do with the existence of various excellent tools
that already do local-file transfers, and likely do it _much_ better
than Wget could hope to. Rsync springs readily to mind.

Even the system cp command is likely to handle things much better than
Wget. In particular, special OS-specific, extended file attributes,
extended permissions and the like, are among the things that existing
system tools probably handle quite well, and that Wget is unlikely to. I
don't really want Wget to be in the business of duplicating the system
cp command, but I might conceivably not mind file:// support if it
means simple _content_ transfer, and not actual file duplication.

Also in need of addressing is what recursion should mean for file://.
Between ftp:// and http://, recursion currently means different
things. In FTP, it means traverse the file hierarchy recursively,
whereas in HTTP it means traverse links recursively. I'm guessing
file:// should work like FTP (i.e., recurse when the path is a
directory, ignore HTML-ness), but anyway this is something that'd need
answering.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvcLq7M8hyUobTrERAl6YAJ9xeTINVkuvl8HkElYlQt7dAsUfHACfXRT3
lNR++Q0XMkcY4c6dZu0+gi4=
=mKqj
-END PGP SIGNATURE-


  1   2   3   4   5   6   7   8   9   10   >