Bug-report: wget with multiple cnames in ssl certificate

2007-04-12 Thread Alex Antener
Hi

If i connect with wget 1.10.2 (Debian Etch  Ubuntu Feisty Fawn) to a
secure host, that uses multiple cnames in the certificate i get the
following error:

[EMAIL PROTECTED]:~$ wget https://host.domain.tld
--10:18:55--  https://host.domain.tld/
   = `index.html'
Resolving host.domain.tld... xxx.xxx.xxx.xxx
Connecting to host.domain.tld|xxx.xxx.xxx.xxx|:443... connected.
ERROR: certificate common name `host0.domain.tld' doesn't match
requested host name `host.domain.tld'.
To connect to host.domain.tld insecurely, use `--no-check-certificate'.
Unable to establish SSL connection.

If I do the same with wget 1.9.1 (Debian Sarge) I do not get that Error.

Kind regards, Alex Antener

-- 
Alex Antener
Dipl. Medienkuenstler FH

[EMAIL PROTECTED] // http://lix.cc // +41 (0)44 586 97 63
GPG Key: 1024D/14D3C7A1 https://lix.cc/gpg_key.php
Fingerprint: BAB6 E61B 17D7 A9C9 6313  5141 3A3C DAA3 14D3 C7A1



Bug report: backup files missing when using wget -K

2006-08-14 Thread Ken Kubota

Hi,

when calling wget -k -K , the backup files (.orig) are missing.

In one case (LOG.Linux.short) one backup file is missing (two files were 
converted).
In another case (LOG.IRIX64.short) all backup files are missing.
This is also true when using recursive retrieval (LOG.IRIX64.recursive.short).

See attached files for details. The script calling wget is WGET. There was no 
.wgetrc file.

You probably know the bug described at: 
http://www.mail-archive.com/wget@sunsite.dk/msg07686.html
Remove the two ./CLEAN commands in the script to test recursive re-download 
with it.
I cannot reproduce the bug since the backup files are missing.

Kind regards,

Ken



LOG.Linux.short:
DEBUG output created by Wget 1.8.2 on linux.
Linux linux 2.4.21-99-default #1 Wed Sep 24 13:30:51 UTC 2003 i686 i686 i386 
GNU/Linux
2 Dateien in 0.07 Sekunden konvertiert.
Backup files:  1

LOG.IRIX64.short:
DEBUG output created by Wget 1.10.1 on irix6.5.
IRIX64 Komma 6.5 07010238 IP27
Converted 2 files in 0.204 seconds.
Backup files: 0

LOG.IRIX64.recursive.short:
DEBUG output created by Wget 1.10.1 on irix6.5.
IRIX64 Komma 6.5 07010238 IP27
Converted 55 files in 7.616 seconds.
Backup files: 0

wget-bug.tar.bz2
Description: Binary data


Bug report

2006-04-01 Thread Gary Reysa

Hi,

I don't really know if this is a Wget bug, or some problem with my 
website, but, either way, maybe you can help.


I have a web site ( www.BuildItSolar.com ) with perhaps a few hundred 
pages (260MB of storage total).  Someone did a Wget on my site, and 
managed to log 111,000 hits and 58,000 page views (using more than a GB 
of bandwidth).


I am wondering how this can happen, since the number of page views is 
about 200 times the number of pages on my site??


Is there something I can do to prevent this?  Is there something about 
the organization of my website that is causing Wget to get stuck in a loop?


I've never used Wget, but I am guessing that this guy really did not 
want 50,000+ pages -- do you provide some way for the user to shut 
itself down when it reaches some reasonable limit?


My website is non-commercial, and provides a lot of information that 
people find useful in building renewable energy projects.  It generates 
zero income, and I can't really afford to have a lot of people come in 
and burn up GBs of bandwidth to no useful end.  Help!


Gary Reysa


Bozeman, MT
[EMAIL PROTECTED]







Re: Bug report

2006-04-01 Thread Frank McCown

Gary Reysa wrote:

Hi,

I don't really know if this is a Wget bug, or some problem with my 
website, but, either way, maybe you can help.


I have a web site ( www.BuildItSolar.com ) with perhaps a few hundred 
pages (260MB of storage total).  Someone did a Wget on my site, and 
managed to log 111,000 hits and 58,000 page views (using more than a GB 
of bandwidth).


I am wondering how this can happen, since the number of page views is 
about 200 times the number of pages on my site??


Is there something I can do to prevent this?  Is there something about 
the organization of my website that is causing Wget to get stuck in a loop?


I've never used Wget, but I am guessing that this guy really did not 
want 50,000+ pages -- do you provide some way for the user to shut 
itself down when it reaches some reasonable limit?


My website is non-commercial, and provides a lot of information that 
people find useful in building renewable energy projects.  It generates 
zero income, and I can't really afford to have a lot of people come in 
and burn up GBs of bandwidth to no useful end.  Help!


Gary Reysa


Bozeman, MT
[EMAIL PROTECTED]



Hello Gary,

From a quick look at your site, it appears to be mainly static html 
that would not generate a lot of extra crawls.  If you have some dynamic 
portion of your site, like a calendar, that could make wget go into an 
infinite loop.  It would be much easier to tell if you could look at the 
server logs that show what pages were requested.  They would easily tell 
you want wget was getting hung on.


One problem I did notice is that your site is generating soft 404s. 
In other words, it is sending back a http 200 response when it should be 
sending back a 404 response.  So if wget tries to access


http://www.builditsolar.com/blah

your web server is telling wget that the page actually exists.  This 
*could* cause more crawls than necessary, but not likely.  This problem 
should be fixed though.


It's possible the wget user did not know what they were doing and ran 
the crawler several times.  You could try to block traffic from that 
particular IP address or create a robots.txt file that tells crawlers to 
stay away from your site or just certain pages.  Wget respects 
robots.txt.  For more info:


http://www.robotstxt.org/wc/robots.html

Regards,
Frank



wget bug report

2005-06-13 Thread A.Jones
Sorry for the crosspost, but the wget Web site is a little confusing on the 
point of where to send bug reports/patches.

Just installed wget 1.10 on Friday. Over the weekend, my scripts failed with 
the 
following error (once for each wget run):
Assertion failed: wget_cookie_jar != NULL, file http.c, line 1723
Abort - core dumped

All of my command lines are similar to this:
/home/programs/bin/wget -q --no-cache --no-cookies -O /home/programs/etc/alte_se
iten/xsr.html 'http://www.enterasys.com/download/download.cgi?lib=XSR'

After taking a look at it, i implemented the following change to http.c and 
tried again. It works for me, but i don't know what other implications my 
change 
might have.

--- http.c.orig Mon Jun 13 08:04:23 2005
+++ http.c  Mon Jun 13 08:06:59 2005
@@ -1715,6 +1715,7 @@
   hs-remote_time = resp_header_strdup (resp, Last-Modified);
 
   /* Handle (possibly multiple instances of) the Set-Cookie header. */
+  if (opt.cookies)
   {
 char *pth = NULL;
 int scpos;


Mit freundlichen Grüßen

MVV Energie AG
Abteilung AI.C

Andrew Jones

Telefon: +49 621 290-3645
Fax: +49 621 290-2677
E-Mail: [EMAIL PROTECTED] Internet: www.mvv.de
MVV Energie · Luisenring 49 · 68159 Mannheim
Handelsregister-Nr. HRB 1780
Vorsitzender des Aufsichtsrates: Oberbürgermeister Gerhard Widder
Vorstand: Dr. Rudolf Schulten (Vorsitzender) · Dr. Werner Dub · Hans-Jürgen 
Farrenkopf · Karl-Heinz Trautmann


Bug report: two spaces between filesize and Month

2004-05-03 Thread Iztok Saje
Hello!
I just found a feature in embedded system (no source) with ftp server.
In listing, there are two spaces between fileize and month.
As a consequence, wget allways thinks size is 0.
In procedure ftp_parse_unix_ls  it just steps back one blank
before cur.size is calculated.
My quick hack is just to add one more pointer and atoi,
but maybe a nicer sollution can be done.
case from .listing:
-rw-rw-rw-   0 0  0  68065  Apr 16 08:00 A20040416.0745
-rw-rw-rw-   0 0  0781  Apr 20 07:45 A20040420.0730
-rw-rw-rw-   0 0  0  59606  Apr 16 08:15 A20040416.0800
-rw-rw-rw-   0 0  0781  Apr 23 12:15 A20040423.1200
-rw-rw-rw-   0 0  0   2130  Feb  3 12:00 A20040203.1145
-rw-rw-rw-   0 0  0  33440  Apr 14 12:15 A20040414.1200
BR
Iztok


wget bug report

2004-03-26 Thread Corey Henderson
I sent this message to [EMAIL PROTECTED] as directed in the wget man page, but it 
bounced and said to try this email address.

This bug report is for GNU Wget 1.8.2 tested on both RedHat Linux 7.3 and 9

rpm -q wget
wget-1.8.2-9

When I use a wget with the -S to show the http headers, and I use the spider switch as 
well, it gives me a 501 error on some servers.

The main example I have found was doing it against a server running ntop.

http://www.ntop.org/

You can find an RPM for it at:

http://rpm.pbone.net/index.php3/stat/4/idpl/586625/com/ntop-2.2-0.dag.rh90.i386.rpm.html

You cean search with other parameters at rpm.pbone.net to get ntop for other version 
of linux

So here is the command and output:

wget -S --spider http://SERVER_WITH_NTOP:3000

HTTP request sent, awaiting response...
 1 HTTP/1.0 501 Not Implemented
 2 Date: Sat, 27 Mar 2004 07:08:24 GMT
 3 Cache-Control: no-cache
 4 Expires: 0
 5 Connection: close
 6 Server: ntop/2.2 (Dag Apt RPM Repository) (i686-pc-linux-gnu)
 7 Content-Type: text/html
21:11:56 ERROR 501: Not Implemented.

I get a 501 error. echoing the $? shows an exit status of 1

When I don't use the spider, I get the following:

wget -S http://SERVER_WITH_NTOP:3000

HTTP request sent, awaiting response...
 1 HTTP/1.0 200 OK
 2 Date: Sat, 27 Mar 2004 07:09:31 GMT
 3 Cache-Control: max-age=3600, must-revalidate, public
 4 Connection: close
 5 Server: ntop/2.2 (Dag Apt RPM Repository) (i686-pc-linux-gnu)
 6 Content-Type: text/html
 7 Last-Modified: Mon, 17 Mar 2003 20:27:49 GMT
 8 Accept-Ranges: bytes
 9 Content-Length: 1214

100%[==]
 1,214  1.16M/sETA 00:00

21:13:04 (1.16 MB/s) - `index.html' saved [1214/1214]



The exit status was 0 and the index.html file was downloaded.

If this is a bug please fix it in your next release of wget. If it is not a bug, I 
would appriciate a brief explination as to why.

Thank You

Corey Henderson
Chief Programmer
GlobalHost.com

Bug report

2004-03-24 Thread Juhana Sadeharju
Hello. This is report on some wget bugs. My wgetdir command looks
the following (wget 1.9.1):
wget -k --proxy=off -e robots=off --passive-ftp -q -r -l 0 -np -U Mozilla $@

Bugs:

Command: wgetdir http://www.directfb.org;.
Problem: In file www.directfb.org/index.html the hrefs of type
  /screenshots/index.xml was not converted to relative
  with -k option.

Command: wgetdir http://threedom.sourceforge.net;.
Problem: In file threedom.sourceforge.net/index.html the
hrefs were not converted to relative with -k option.

Command: wgetdir http://liarliar.sourceforge.net;.
Problem: Files are named as
  content.php?content.2
  content.php?content.3
  content.php?content.4
which are interpreted, e.g., by Nautilus as manual pages and are
displayed as plain texts. Could the files and the links to them
renamed as the following?
  content.php?content.2.html
  content.php?content.3.html
  content.php?content.4.html
After all, are those pages still php files or generated html files?
If they are html files produced by the php files, then it could
be a good idea to add a new extension to the files.

Command: wgetdir 
http://www.newtek.com/products/lightwave/developer/lscript2.6/index.html;
Problem: Images are not downloaded. Perhaps because the image links
are the following:
  image src=v26_2.jpg

Regards,
Juhana


Re: Bug report

2004-03-24 Thread Hrvoje Niksic
Juhana Sadeharju [EMAIL PROTECTED] writes:

 Command: wgetdir http://liarliar.sourceforge.net;.
 Problem: Files are named as
   content.php?content.2
   content.php?content.3
   content.php?content.4
 which are interpreted, e.g., by Nautilus as manual pages and are
 displayed as plain texts. Could the files and the links to them
 renamed as the following?
   content.php?content.2.html
   content.php?content.3.html
   content.php?content.4.html

Use the option `--html-extension' (-E).

 After all, are those pages still php files or generated html files?
 If they are html files produced by the php files, then it could be a
 good idea to add a new extension to the files.

They're the latter -- HTML files produced by the server-side PHP code.

 Command: wgetdir 
 http://www.newtek.com/products/lightwave/developer/lscript2.6/index.html;
 Problem: Images are not downloaded. Perhaps because the image links
 are the following:
   image src=v26_2.jpg

I've never seen this tag, but it seems to be the same as IMG.  Mozilla
seems to grok it and its DOM inspector thinks it has seen IMG.  Is
this tag documented anywhere?  Does IE understand it too?



bug report

2003-12-30 Thread Vlada Macek

Hi again,

I found something what can be called a bug.

The command line and the output (shortened):

$ wget -k www.seznam.cz
--14:14:28--  http://www.seznam.cz/
   = `index.html'
Resolving www.seznam.cz... done.
Connecting to www.seznam.cz[212.80.76.18]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

[ = ] 19,975 3.17M/s

14:14:28 (3.17 MB/s) - `index.html' saved [19975]

Converting index.html... 5-123
Converted 1 files in 0.01 seconds.

---
That is, newly created file is really link-converted.

Now I run:

$ wget -k -O myfile www.seznam.cz
--14:16:07--  http://www.seznam.cz/
   = `myfile'
Resolving www.seznam.cz... done.
Connecting to www.seznam.cz[212.80.76.3]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

[ = ] 19,980 3.18M/s

14:16:07 (3.18 MB/s) - `myfile' saved [19980]

index.html.1: No such file or directory
Converting index.html.1... nothing to do.
Converted 1 files in 0.00 seconds.

---
Now myfile is created and then wget tries to convert index.html.1, i.e. the 
file it normally *would* create if there was no -O option... 

When I wish the content to be sent to stdout (-O -), this postponed 
converting function is run again on index.html.1. Which is totally wrong, all 
content has been sent out to stdout already.

Not only my content is not link-converted. Is not here a possibility, that
wget can inadvertently garble files on disk it has nothing to do with?

Vlada Macek




bug report : 302 server response forces host spanning even without-H

2003-04-02 Thread Yaniv Azriel
If wget recieves a 302 TEmporarily Moved redirection to *another site*,
this site is browsed !
wget -r http://original/index.html

Server reply 302 http://redirect/index.html

WGET goes and downloads from redirect 



I also tried adding -D flag but it doesnt help

wget -r -Doriginal -nh http://original/

WGET still browses the redirect site

And by the way - multiple dependcy files are downloaded from the redirect
site - so this is a mojor bug i think




bug report

2003-02-22 Thread Jirka Klaue

1/   (serious)
#include config.h needs to be replaced by #include config.h in several source 
files.
The same applies to strings.h.

2/
#ifdef WINDOWS should be replaced by #ifdef _WIN32.

With these two changes it is even possible to compile wget with MSVC[++] and Intel 
C[++].   :-)

Jirka






bug report about running wget in BSDI 3.1

2003-02-05 Thread julian yin
Hello,

I'v downloaded wget-1.5.3 from http://ftp.gnu.org/gnu/wget into our 
BSDI version 3.1 OS and used following commands:

% gunzip wget-1.5.3.tar.gz
% tar -xvf wget-1.5.3.tar
% cd wget-1.5.3
% ./configure
% ./make -f Makefile
% ./make install

But the following error message was displayed:

--12:53:33--  http://www.osdpd.noaa.gov:80/COB/poltbus.asc
   = `poltbus.asc'
Connecting to www.osdpd.noaa.gov:80...
www.osdpd.noaa.gov: Host not found.

when I ran 
% ./src/wget http://www.osdpd.noaa.gov/COB/poltbus.asc

Couls you please give me your advice about the error message?

Thank you very much.

I.P.S.
Julian

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com



Bug report / feature request

2003-01-28 Thread Stas Ukolov
Hi!

Wget 1.5.3 uses /robots.txt to skip some parts of web-site. But it
doesn't use META NAME=ROBOTS CONTENT=NOFOLLOW tag, which serves
to the same purpose.

I believe that Wget must also parse and use META NAME='ROBOTS' ...
tags

WBR
 Stas  mailto:[EMAIL PROTECTED]




Re: bug report and patch, HTTPS recursive get

2002-05-17 Thread Kiyotaka Doumae


In message Re: bug report and patch, HTTPS recursive get,
Ian Abbott wrote...
 Thanks again for the bug report and the proposed patch.  I thought some
 of the scheme tests in recur.c were getting messy, so propose the
 following patch that uses a function to check for similar schemes.

Thanks for your rewriting.
By your patch, the problem was solved.

Thankyou

---
Doumae Kiyotaka
Internet Initiative Japan Inc.
Technical Planning Division



Re: bug report and patch, HTTPS recursive get

2002-05-15 Thread Ian Abbott

On Wed, 15 May 2002 18:44:19 +0900, Kiyotaka Doumae [EMAIL PROTECTED]
wrote:

I found a bug of wget with HTTPS resursive get, and proposal
a patch.

Thanks for the bug report and the proposed patch.  The current scheme
comparison checks are getting messy, so I'll write a function to check
schemes for similarity (when I can spare the time later today).



Re: Bug report

2002-05-04 Thread Ian Abbott

On Fri, 3 May 2002 18:37:22 +0200, Emmanuel Jeandel
[EMAIL PROTECTED] wrote:

ejeandel@yoknapatawpha:~$ wget -r a:b
Segmentation fault

Patient: Doctor, it hurts when I do this
Doctor: Well don't do that then!

Seriously, this is already fixed in CVS.



Bug report

2002-05-03 Thread Emmanuel Jeandel

ejeandel@yoknapatawpha:~$ wget -r a:b
Segmentation fault
ejeandel@yoknapatawpha:~$ 

I encounter this bug while i wanted to do wget ftp://a:b@c/, forgetting the
ftp://
The bug is not present when -r is not there (a:b: Unsupported scheme)

Emmanuel



GNU wget 1.8.1 - Bug report memory occupied

2002-03-26 Thread Dipl. Ing. Hermann Rugen





-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hallo specialists,
I used wget 1.8.1 on my system to mirror the site www.europa.eu.int.
Transfer was throug a proxy and DSL over night.
After about 12-13 hours I found following situation:
Totally download about 1.8GB data.
wget process was increased to approx 75MB RAM occupying!

This increasing was fatal for the system, because there where only
32MBRAM in the intel486-maschine.
Downlaod rate was dramatically reduced, but system was still running.
Did kill the process with Ctrl-C.
Everything seems to be o.k.
After research the data I found, that redirecting was not good in all
ways.
Files where set in the right directories, but relinking in the page
was often wrong.
should be: http://myroot/europa.eu.int/indidivdual_dir

but was:
http://europa.eu.int/individual_dir

problem seems to be, that wget misses the part of directory, that is
leading to the downloaded one.

My calling conditions for wget where:

wget -m http://europa.eu.int/

All parameters have been in downloaded status and where unchanged
excep the adres for the proxy, I had to use.
compiling was with standard features under linux 2.4.13 kernel.

Quesction:
Did I make a configuration mistake?
If not can cou correct the relinking?
Howa to make, that wget will not use so many RAM?
Do I have the chance to correct the wrong 'links'. (Not by hand,
there are thousands)

Mit freundlichem Gruß

Dipl. Ing. Hermann Rugen

Rugen Consulting
Max-Planck-Straße 7
49767 Twist

Tel.: 05931 4099 151
Fax: 05931 4099 152

eMail: [EMAIL PROTECTED]
Internet: www.rugen-consulting.com


-BEGIN PGP SIGNATURE-
Version: PGPfreeware 7.0.3 for non-commercial use http://www.pgp.com

iQA/AwUBPKBkl0Y5W7VNHjVzEQIPHQCg0xNHFV2Qrf5as2+xwvlK4Uf5Gr0AoMtY
RENbT04glmugzL3kiWOh/wG3
=i623
-END PGP SIGNATURE-






PGPexch.rtf.asc
Description: Binary data


bug report

2002-03-20 Thread Andax



I found a serious bug in wget, all versions 
affected.

Description: It is highly addictive
Solution:You should include a warning about this 
somewhere in the product :)


a windows user





Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-21 Thread Ian Abbott

On 17 Jan 2002 at 2:15, Hrvoje Niksic wrote:

 Michael Jennings [EMAIL PROTECTED] writes:
  WGet returns an error message when the .wgetrc file is terminated
  with an MS-DOS end-of-file mark (Control-Z). MS-DOS is the
  command-line language for all versions of Windows, so ignoring the
  end-of-file mark would make sense.
 
 Ouch, I never thought of that.  Wget opens files in binary mode and
 handles the line termination manually -- but I never thought to handle
 ^Z.

Why not just open the wgetrc file in text mode using
fopen(name, r) instead of rb? Does that introduce other
problems?

In the Windows C compilers I've tried (Microsoft and Borland ones),
r causes the file to be opened in text mode by default (there are
ways to override that at compile time and/or run time), and this
causes the ^Z to be treated as an EOF (there might be ways to
override that too).



Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-21 Thread Thomas Lussnig



WGet returns an error message when the .wgetrc file is terminated
with an MS-DOS end-of-file mark (Control-Z). MS-DOS is the
command-line language for all versions of Windows, so ignoring the
end-of-file mark would make sense.

Ouch, I never thought of that.  Wget opens files in binary mode and
handles the line termination manually -- but I never thought to handle
^Z.


Why not just open the wgetrc file in text mode using
fopen(name, r) instead of rb? Does that introduce other
problems?

In the Windows C compilers I've tried (Microsoft and Borland ones),
r causes the file to be opened in text mode by default (there are
ways to override that at compile time and/or run time), and this
causes the ^Z to be treated as an EOF (there might be ways to
override that too).

I think it has to do with comments because the defeinition is that 
starting with '#'  the rest of the line
is ignored. And an line ends with '\n' or the end of the file and not 
with and spezial charakter '\0' that
mean for me that to abort the reading of an textfile when zero isfound 
mean's incorrect parsing.

Cu Thomas Lußnig




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-21 Thread Ian Abbott

On 21 Jan 2002 at 14:56, Thomas Lussnig wrote:

 Why not just open the wgetrc file in text mode using
 fopen(name, r) instead of rb? Does that introduce other
 problems?
 I think it has to do with comments because the defeinition is that 
 starting with '#'  the rest of the line
 is ignored. And an line ends with '\n' or the end of the file and not 
 with and spezial charakter '\0' that
 mean for me that to abort the reading of an textfile when zero isfound 
 mean's incorrect parsing.

(N.B. the control-Z character would be '\032', not '\0'.)

So maybe just mention in the documentation that the wgetrc file is
considered to be a plain text file, whatever that means for the
system Wget is running on. Maybe mention peculiaries of
DOS/Windows, etc.

In general, it is more portable to read or write native text files
in text mode as it performs whatever local conversions are
necessary to make reads and writes of text files appear like UNIX
i.e. each line of text terminated by a newline '\n'). In binary
mode, what you get depends on the system (Mac text files have lines
terminated by carriage return ('\r') for example, and some systems
(VMS?) don't even have line termination characters as such.)

In the case of Wget, log files are already written in text mode. I
think wgetrc needs to be read in text mode and that's an easy
change.

In the case of the --input-file option, ideally the input file
should be read in text mode unless the --force-html option is used,
in which case it should be read in the same mode as when parsing
other locally-stored HTML files.

Wget stores retrieved files in binary mode but the mode used when
reading those locally-stored files is less precise (not that it
makes much difference for UNIX). It uses open() (not fopen()) and
read() to read those files into memory (or uses mmap() to map them
into memory space if supported). The DOS/Windows version of open()
allows you to specify text or binary mode, defaulting to text mode,
so it looks like the Windows version of Wget saves html files in
binary mode and reads them back in in text mode! Well whatever -
the HTML parser still seems to work okay on Windows, probably
because HTML isn't that fussy about line-endings anyway!

So to support --input-file portably (not the --force-html version),
the get_urls_file() function in url.c should probably call a new
function read_file_text() (or read_text_file() instead of
read_file() as it does at the moment. For UNIX-type systems, that
could just fall back to calling read_file().

The local HTML file parsing stuff should probably be left well
alone but possibly add some #ifdef code for Windows to open the
file in binary mode, though there may be differences between
compilers for that.




RE: Bug report: 1) Small error 2) Improvement to Manual

2002-01-17 Thread csaba . raduly


On 17/01/2002 07:34:05 Herold Heiko wrote:
[proper order restored]
 -Original Message-
 From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, January 17, 2002 2:15 AM
 To: Michael Jennings
 Cc: [EMAIL PROTECTED]
 Subject: Re: Bug report: 1) Small error 2) Improvement to Manual


 Michael Jennings [EMAIL PROTECTED] writes:

  1) There is a very small bug in WGet version 1.8.1. The bug occurs
 when a .wgetrc file is edited using an MS-DOS text editor:
 
  WGet returns an error message when the .wgetrc file is terminated
  with an MS-DOS end-of-file mark (Control-Z). MS-DOS is the
  command-line language for all versions of Windows, so ignoring the
  end-of-file mark would make sense.

 Ouch, I never thought of that.  Wget opens files in binary mode and
 handles the line termination manually -- but I never thought to handle
 ^Z.

 As much as I'd like to be helpful, I must admit I'm loath to encumber
 the code with support for this particular thing.  I have never seen it
 before; is it only an artifact of DOS editors, or is it used on
 Windows too?



[snip copy con file.txt]

However in this case (at least when I just tried) the file won't contain
the ^Z. OTOH some DOS programs still will work on NT4, NT2k and XP, and
could be used, and would create files ending with ^Z. But do they really
belong here and should wget be bothered ?

What we really need to know is:

Is ^Z still a valid, recognized character indicating end-of-file (for
textmode files) for command shell programs on windows NT 4/2k/Xp ?
Somebody with access to the *windows standards* could shed more light on
this question ?

My personal idea is:
As a matter of fact no *windows* text editor I know of, even the
supplied windows ones (notepad, wordpad) AFAIK will add the ^Z at the
end of file.txt. Wget is a *windows* program (although running in
console mode), not a *Dos* program (except for the real dos port I know
exists but never tried out).


I don't think there's a distinction between DOS and Windows programs
in this regard. The C runtime library is most likely to play a
significant role here. For a file fopen-ed in rt mode, teh RTL
would convert \r\n - \n and silently eat the _first_ ^Z,
returning EOF at that point.

When writing, it goes the other way 'round WRT \n-\r\n.
I'm unsure about whether it writes ^Z at the end, though.

So personally I'd say it would not be really necessary adding support
for the ^Z, even in the win32 port; except possibly for the Dos port, if
the porter of that beast thinks it would be useful.


Problem could be solved by opening .netrc in rt
However, the t is a non-standard extension.

However, this is not wget's problem IMO. Different editors may behave
differently. Example: on OS/2 (which isn't a DOS shell, but can run
DOS programs), the system editor (e.exe) *does* append a ^Z at the end
of every file it saves. People have patched the binary to remove this
feature :-) AFAIK no other OS/2 editor does this.


--
Csaba Ráduly, Software Engineer   Sophos Anti-Virus
email: [EMAIL PROTECTED]http://www.sophos.com
US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933




Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-17 Thread Hrvoje Niksic

Herold Heiko [EMAIL PROTECTED] writes:

 My personal idea is:
 As a matter of fact no *windows* text editor I know of, even the
 supplied windows ones (notepad, wordpad) AFAIK will add the ^Z at the
 end of file.txt. Wget is a *windows* program (although running in
 console mode), not a *Dos* program (except for the real dos port I know
 exists but never tried out).
 
 So personally I'd say it would not be really neccessary adding support
 for the ^Z, even in the win32 port;

That was my line of thinking too.



Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-17 Thread Michael Jennings

-


Obviously, this is completely your decision. You are right, only DOS editors make the 
mistake. (It should be noted that DOS is MS Windows only command line language. It 
isn't going away; even Microsoft supplies command line utilities with all versions of 
its OSs. Yes, Windows will probably eventually go away, but not soon.)

However, I have a comment: There is simple logic that would solve this problem. WGet, 
when it reads a line in the configuration file, probably now strips off trailing 
spaces (hex 20, decimal 32). I suggest that it strip off both trailing spaces and 
control characters (characters with hex values of 1F or less, decimal values of 31 or 
less). This is a simple change that would work in all cases.

Regards,

Michael


__


Hrvoje Niksic wrote:

 Herold Heiko [EMAIL PROTECTED] writes:

  My personal idea is:
  As a matter of fact no *windows* text editor I know of, even the
  supplied windows ones (notepad, wordpad) AFAIK will add the ^Z at the
  end of file.txt. Wget is a *windows* program (although running in
  console mode), not a *Dos* program (except for the real dos port I know
  exists but never tried out).
 
  So personally I'd say it would not be really neccessary adding support
  for the ^Z, even in the win32 port;

 That was my line of thinking too.




RE: Bug report: 1) Small error 2) Improvement to Manual

2002-01-17 Thread Herold Heiko

 From: Michael Jennings [mailto:[EMAIL PROTECTED]]
 Obviously, this is completely your decision. You are right, 
 only DOS editors make the mistake. (It should be noted that 
 DOS is MS Windows only command line language. It isn't going 
 away; even Microsoft supplies command line utilities with all 
 versions of its OSs. Yes, Windows will probably eventually go 

Please note the difference: all windows versions include a command line.
However that commandline afaik is not dos - it is able to run dos
programs, either because based on dos (win 9x) or because capable of
understanding the difference between w32 commandline programs and dos
programs, and starting the neccessary dos *emulation*. But it is not
dos, and the behaviour is not like dos.
As far as I know, windows command line programs do not use ^Z as
end-of-file terminators (although some do honour it for
emulation/compatibility), only real dos programs do (anybody knows if
there is a - MS - standard for this ?). If this is true, should wget on
windows really emulate the behaviour of dos programs, of a environment
windows originally was based on but where it is *not*running*anymore*
(wget I mean) ? From a purists point of view, not. From a end-user point
of view, possibly in order to facilitate the changeover.
On the other hand, your report is the first one I ever saw, considering
Hrvoje's reaction and the lack of support in the original windows port
I'd say this is not a problem generally felt as important, so personally
I'm in favor of not cluttering up the port anymore with special
behaviour. But it is Hrvoje's decsion, as always.
If you feel it is important write a patch and submit it, shouldn't be a
major piece of work.
 
Heiko

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907087
-- ITALY



Bug report

2001-12-13 Thread Pavel Stepchenko

Hello bug-wget,

$ wget --version
GNU Wget 1.8


$ wget 
ftp://password:[EMAIL PROTECTED]:12345/Dir%20One/This.Is.Long.Name.Of.The.Directory/*
Warning: wildcards not supported in HTTP.


Oooops! But this is FTP url, not HTTP!

Please, fix it.


Thank you,

-- 
Best regards from future,
HillDale.
Pavel  mailto:[EMAIL PROTECTED]




Re: Bug report

2001-12-13 Thread Hrvoje Niksic

Pavel Stepchenko [EMAIL PROTECTED] writes:

 Hello bug-wget,
 
 $ wget --version
 GNU Wget 1.8
 
 $ wget 
ftp://password:[EMAIL PROTECTED]:12345/Dir%20One/This.Is.Long.Name.Of.The.Directory/*
 Warning: wildcards not supported in HTTP.
 
 Oooops! But this is FTP url, not HTTP!

Are you using a proxy?



Re: Re[2]: Bug report

2001-12-13 Thread Hrvoje Niksic

Pavel Stepchenko [EMAIL PROTECTED] writes:

 Warning: wildcards not supported in HTTP.
 
 Oooops! But this is FTP url, not HTTP!
HN Are you using a proxy?
 Yes.

This means that HTTP is used for retrieval, and '*' won't work --
which is what Wget is trying to warn you about.

 --17:26:58--  ftp://1.2.3.4:12345/Dir%20One/This.Is.Long.Name.Of.The.Directory/*
= `*'
 Connecting to 2.2.2.2:3128... connected!
 Proxy request sent, awaiting response... ^C
 
 1.7.1 dont say no one word about Warning: wildcards not supported in
 HTTP.

Instead, it just silently doesn't work.



RE: WGET 1.8 bug report

2001-12-12 Thread Herold Heiko

 From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]]
 Herold Heiko [EMAIL PROTECTED] writes:
 
  I put up the current cvs, mainly since there have been those patches
  to ftp-ls.c and the signal handler. Ok ?
 
 Please don't do that.  Although all changes in the current CVS
 *should* be stable, mistakes are possible.  Please provide a binary
 that is 1.8 plus the most critical patches -- currently only the
 progress.c patch.

Correct, sorry. Site updated.

  quite a userbase which does take the zipped cvs sources I put up in
  order to use them on unix platforms. Don't ask me why. Well,
  possibly folks behind firewalls who can't use cvs but can download
  with a proxy or something..
 
 We should have daily source snapshots for such people.

I agree. With a minimum bit of logic this shouldn't even load the server
too much - before tarring check if there have been commits since last
time, shouldn't be difficult to parse a cvs history or somewhat. Or
checkout and do a find -newer. Possibly (if the general setup and sysop
permits that) the checkout files could even be directly in the sunsite
wget ftp directory for easy access to changelogs or single files.

Heiko

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907087
-- ITALY