wget from SVN: Issue with recursive downloading from http:// sites

2006-01-05 Thread Steven P. Ulrick
Hello, Everyone
I am running wget from SVN and I have come upon a problem that I have
never had before.  I promise that I checked the documentation on your
website to see if I needed to change how I use wget.  I even joined
this list, and perused the archives, with no results.  Not that they
aren't there, but I didn't find any :)
I will use my own domain as an example:
In the past, I would run:
wget -kr -nc http://www.afolkey2.net
and the result would be a mirror of my domain, with the links
converted for local viewing. (In this case, wget is the SVN version,
which is located at /usr/local/bin/wget)

Now, if I run that same command, I get the following output:

[EMAIL PROTECTED] Archives]$ wget -kr -nc http://www.afolkey2.net
--07:55:48--  http://www.afolkey2.net/

Resolving www.afolkey2.net... 12.203.241.111
Connecting to www.afolkey2.net|12.203.241.111|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1339 (1.3K) [text/html]

100%[==] 
1,339   --.-K/s   in 0.01s

07:55:48 (136 KB/s) - `www.afolkey2.net/index.html' saved [1339/1339]

FINISHED --07:55:48--
Downloaded: 1 files, 1.3K in 0.01s (136 KB/s)
[EMAIL PROTECTED] Archives]$  

As you can see, it downloaded ONLY http://www.afolkey2.net/index.html
and exited without error.

If I try adding a sub-directory to the above example, the result is the
same - wget downloads index.html in the directory that I point it to,
and then exits without error.

But, if I run:
/usr/bin/wget -kr -nc http://www.afolkey2.net; (/usr/bin/wget is the
version of wget that ships with the distro that I run, Fedora Core 3)
the result is as it should be, www.afolkey2.net is downloaded in it's
entirety, and the links are converted for local viewing.

I understand that maybe something has changed about wget's options.
But I was not able to locate that information on my own.
If this is a bug (Fedora Core 3 specific or not) I would be glad to
report it as soon as you tell me that it is a bug.
If you need any more information, let me know and I will be glad to
oblige.
Right now, I'm going to uninstall /usr/local/bin/wget and reinstall
from a fresh download from SVN and see what happens.

Have a Great Day,
Steven P. Ulrick

P.S.: For clarification, I only used afolkey2.net as an example.  Every
website that I attempt wget -kr -nc on behaves the same way.  But I
JUST discovered that recursive downloading from ftp domains seems to
work perfectly.  I am now downloading ftp.crosswire.org, and it looks
like it would happily continue until there was no more to download.


Re: wget from SVN: Issue with recursive downloading from http:// sites

2006-01-05 Thread Steven M. Schweda
 [...] wget is the SVN version, which is located at /usr/local/bin/wget
 [...]
 [...] (/usr/bin/wget is the version of wget that ships with the distro
 that I run, Fedora Core 3) [...]

   Results from wget -V would be much more informative than knowing
the path(s) to the executable(s).  (Should I know what SVN is?) 
Adding -d to your wget commands could also be more helpful in finding
a diagnosis.

   If one program works and one doesn't, why use the one which doesn't?



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Re: wget from SVN: Issue with recursive downloading from http:// sites

2006-01-05 Thread Steven P. Ulrick
On Thu, 5 Jan 2006 08:55:53 -0600 (CST)
[EMAIL PROTECTED] (Steven M. Schweda) wrote:

  [...] wget is the SVN version, which is located
  at /usr/local/bin/wget [...]
  [...] (/usr/bin/wget is the version of wget that ships with the
  distro that I run, Fedora Core 3) [...]
 
Results from wget -V would be much more informative than knowing
 the path(s) to the executable(s).  (Should I know what SVN is?) 
 Adding -d to your wget commands could also be more helpful in
 finding a diagnosis.

Hello, Steven
That's fair enough.
Default Fedora Core 3 version:
[EMAIL PROTECTED] ~]$ /usr/bin/wget -V
GNU Wget 1.10.2 (Red Hat modified)

SVN version:
[EMAIL PROTECTED] ~]$ /usr/local/bin/wget -V
GNU Wget 1.10+devel


 
If one program works and one doesn't, why use the one which
 doesn't?

Well, that's simple: I am not a programmer, I am just a user.  But, if
I use the development versions of programs that I like/use a lot, then
I can make what little contribution that I can back to the community by
reporting any issues that I have with said programs.  If I would just
stop using stuff that does not work, then I would not be doing my
part.  But for real, your question is a good one :)

Steven P. Ulrick

P.S.: Again I do apologize for not putting the version numbers in my
original email.  I had intended on doing that, but I forgot :(


Re: wget from SVN: Issue with recursive downloading from http:// sites

2006-01-05 Thread Hrvoje Niksic
[EMAIL PROTECTED] (Steven M. Schweda) writes:

 Results from wget -V would be much more informative than knowing
 the path(s) to the executable(s).  (Should I know what SVN is?)

I believe SVN stands for Subversion, the version control software that
runs the repository.


Re: wget from SVN: Issue with recursive downloading from http:// sites

2006-01-05 Thread Steven M. Schweda
 Adding -d to your wget commands could also be more helpful in finding
 a diagnosis.

   Still true.

   GNU Wget 1.10.2b built on VMS Alpha V7.3-2 (the original wget
1.10.2 with my VMS-related and other changes) seems to work just fine on
that site.  You might try starting with a less up-to-the-minute source
kit to see if that helps.  (Although you'd like to think that such a
gross problem would be detected before any such problem code had been
checked in.  And with that site's content, I might prefer any program
which sucked down less of it, but that's neither here nor there.)



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Re: wget from SVN: Issue with recursive downloading from http:// sites

2006-01-05 Thread Steven P. Ulrick
On Thu, 5 Jan 2006 09:56:35 -0600 (CST)
[EMAIL PROTECTED] (Steven M. Schweda) wrote:

  Adding -d to your wget commands could also be more helpful in
  finding a diagnosis.

That's fair enough:

[EMAIL PROTECTED] ~]$ wget -d -kr -nc http://www.afolkey2.net
Setting --convert-links (convertlinks) to 1
Setting --recursive (recursive) to 1
Setting --no (noclobber) to 1
DEBUG output created by Wget 1.10+devel on linux-gnu.

Enqueuing http://www.afolkey2.net/ at depth 0
Queue count 1, maxcount 1.
Dequeuing http://www.afolkey2.net/ at depth 0
Queue count 0, maxcount 1.
in http_loop
in http_loop LOOP
--10:16:43--  http://www.afolkey2.net/

in gethttp 1
in gethttp 2
in gethttp 3
Resolving www.afolkey2.net... 12.203.241.111
Caching www.afolkey2.net = 12.203.241.111
Connecting to www.afolkey2.net|12.203.241.111|:80... connected.
Created socket 4.
Releasing 0x09be9240 (new refcount 1).

---request begin---
GET / HTTP/1.0
User-Agent: Wget/1.10+devel
Accept: */*
Host: www.afolkey2.net
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Thu, 05 Jan 2006 16:16:44 GMT
Server: Apache/2.0.53 (Fedora)
Last-Modified: Sun, 18 Dec 2005 13:27:25 GMT
ETag: 192c08-53b-6522a940
Accept-Ranges: bytes
Content-Length: 1339
Connection: close
Content-Type: text/html; charset=UTF-8

---response end---
200 OK
in gethttp 4
in gethttp 5
Length: 1339 (1.3K) [text/html]

100%[==]
1,339   --.-K/s   in 0s

Closed fd 4
10:16:44 (33.6 MB/s) - `www.afolkey2.net/index.html' saved [1339/1339]

FINISHED --10:16:44--
Downloaded: 1 files, 1.3K in 0s (33.6 MB/s)
You have new mail in /var/spool/mail/steve


GNU Wget 1.10.2b built on VMS Alpha V7.3-2 (the original wget
 1.10.2 with my VMS-related and other changes) seems to work just fine
 on that site.  You might try starting with a less up-to-the-minute
 source kit to see if that helps.

Please forgive my ignorance, but what exactly does that mean?  In my
original message on this thread, I mentioned and showed that I tried
the same command with the version of wget that ships with Fedora Core
3.  Though it is true that I did not mention the exact version number.
If there is a different version (other than the SVN version of course)
that you are referring to, please let me know.

  (Although you'd like to think that
 such a gross problem would be detected before any such problem code
 had been checked in.  And with that site's content, I might prefer
 any program which sucked down less of it, but that's neither here nor
 there.)

What exactly does that mean?  If you are referring to afolkey2.net,
that is only my playground for learning how to run a web server and a
mail server.  Like I also said in my original message on this subject,
I mentioned that I was using afolkey2.net as an Example.  I do
sincerely apologize if you did not approve of my example, but I did
mention that that was all that it was.
But, to clarify that last statement, absolutely no offense was taken,
as I am sure that none was intended.  I was just asking what was meant,
that's all :)

Have a Great Day,
Steven P. Ulrick


Re: wget from SVN: Issue with recursive downloading from http:// sites

2006-01-05 Thread Steven M. Schweda
   Your -d output suggests a defective Wget (probably because
Wget/1.10+devel was still in development).  A working one spews much
more stuff (as it downloads much more stuff).

   I'd try starting with the last released source kit:

  http://www.gnu.org/software/wget/
  http://www.gnu.org/software/wget/index.html#downloading
  http://ftp.gnu.org/pub/gnu/wget/
  http://ftp.gnu.org/pub/gnu/wget/wget-1.10.2.tar.gz

 [...]  What exactly does that mean?

   I was just complaining about the content at afolkey2.net, but, as I
said, that's neither here nor there.



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Case insensitive enhancement

2006-01-05 Thread Frank McCown
I'd like to suggest an enhancement that would help people who are 
downloading web sites housed on a Windows server.  (I couldn't find any 
discussion of this in the email list archive or any mention in the 
on-line documentation.)


Since Windows has a case insensitive file system, Apache and IIS running 
on a Windows box will think the following URLs are referencing the same 
resource:


http://foo.org/bar.html
http://foo.org/BAR.html

Apache on a *nix box treats these URLs as references to two different 
resources.


Wget 1.10 running on *nix currently treats the 2 urls as referring to 
different resources regardless of the operating system housing the web 
server.  Therefore 2 files will be created by wget when only 1 file 
actually exists on the Windows web server.  I ran into this problem when 
using wget with http://www.harding.edu/hr/.


I'd like to suggest a new parameter --ignore-case that would tell wget 
to convert all URLs to lowercase when retrieving them.  This would allow 
a more accurate downloading of the files residing on a Windows file 
system and would require fewer files being downloaded.  Of course this 
would not be as useful for mirroring a site on a *nix box since URLs 
referring to BAR.html would now break.


A script could also be used to manually go through and delete redundant 
files (as was suggested in 
http://www.mail-archive.com/wget@sunsite.dk/msg08373.html to remove the 
index.html?BLAH files), but it would be nice to save the user this effort.


Regards,
Frank