Re: errno patches for Windows

2003-10-17 Thread Hrvoje Niksic
Gisle Vanem [EMAIL PROTECTED] writes:

 #ifndef ENOTCONN
 # define ENOTCONN X_ENOTCONN
 #endif

 Except you cannot make Winsock return X_ENOTCONN.

But we don't really care because we're in control of what gets stores
into errno after Winsock calls.  So instead of:

  errno = WSAGetLastError ();

windows_select and friends can go ahead and say:

  errno = winsock_error_to_errno (WSAGetLastError ());

winsock_error_to_errno and esaily convert Winsock errors to errno
errors expected by the rest of Wget, adding support for the missing
ones such as ENOTCONN.

 It returns WSAENOTCONN (def'ed to ENOTCONN in mswindows.h).

If we do this, we should probably remove those defines.  They would no
longer be needed.

 const char *
 windows_strerror (int err)
 {
   /* Leave the standard ones to strerror. */
   if (err  X_ERRBASE)
 return strerror (err);
 
   /* Handle the unsupported ones manually. */
   switch (err)
 {
   case X_ENOTCONN:
 return Connection refused;

 Which AFAICS is the pretty much the same as in my patch.

One difference is that your patch requires the use of special
GET_ERRNO and SET_ERRNO codes that I'm trying to avoid.  Another is
that windows_strerror calls real strerror for everything except for
the few error codes which really are unavailable under Windows, such
as ENOTCONN.  This should (I think) remove the need for the large
switch you have in get_winsock_error.

 Another thing is that Wget could mask errnos for Unix too. In
 connect.c:

  ...
{
  CLOSE (sock);
  sock = -1;
  goto out;
}

 out:
 ...
  else
{
  save_errno = errno;
  if (!silent)
logprintf (LOG_VERBOSE, failed: %s.\n, strerror (errno));
  errno = save_errno;
}

 The close() could possibly set errno too, but we want the errno 
 from bind() or connect() don't we?

For close() to set errno, it would have to fail, and that should not
be possible in normal operation.  (Unlike fclose, close cannot write
data, it should just tell kernel to get rid of the descriptor.)  If
close really fails, then something is seriously wrong and we care
about the errno from close at least as much as we care about errno
from connect or bind.  In practice it probably doesn't make sense to
care about close setting errno.


RE: Wget 1.9-rc1 available for testing

2003-10-17 Thread Herold Heiko
Windows MSVC binary at
http://xoomer.virgilio.it/hherold/
Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax

 -Original Message-
 From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 16, 2003 4:42 PM
 To: [EMAIL PROTECTED]
 Subject: Wget 1.9-rc1 available for testing
 
 
 As the name implies, this should be 1.9 (with only version changed)
 unless a show-stopper is discovered.  Get it from:
 
 http://fly.srk.fer.hr/~hniksic/wget/wget-1.9-rc1.tar.gz
 


Re: Wget 1.8.2 bug

2003-10-17 Thread Hrvoje Niksic
??? ?? [EMAIL PROTECTED] writes:

 I've seen pages that do that kind of redirections, but Wget seems
 to follow them, for me.  Do you have an example I could try?

 [EMAIL PROTECTED]:~/ /usr/local/bin/wget -U
 All.by  -np -r -N -nH --header=Accept-Charset: cp1251, windows-1251, win,
 x-cp1251, cp-1251 --referer=http://minskshop.by  -P /tmp/minskshop.by -D
 minskshop.by http://minskshop.by http://www.minskshop.by
[...]

The problem with these pages lies not in redirection, but in the fact
that the server returns them with the `text/plain' content-type
instead of `text/html', which Wget requires in order to treat a page
as HTML.

Observe:

 --13:05:47--  http://minskshop.by/cgi-bin/shop.cgi?id=1cookie=set
 Length: ignored [text/plain]
 --13:05:53--  http://minskshop.by/cgi-bin/shop.cgi?id=1cookie=set
 Length: ignored [text/plain]
 --13:05:59--  http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set
 Length: ignored [text/plain]
 --13:06:00--  http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set
 Length: ignored [text/plain]

Incidentally, Wget is not the only browser that has a problem with
that.  For me, Mozilla is simply showing the source of
http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set, because
the returned content-type is text/plain.


Re: Wget 1.8.2 bug

2003-10-17 Thread Tony Lewis
Hrvoje Niksic wrote:

 Incidentally, Wget is not the only browser that has a problem with
 that.  For me, Mozilla is simply showing the source of
 http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set, because
 the returned content-type is text/plain.

On the other hand, Internet Explorer will treat lots of content types as
HTML if the content starts with html.

To see for yourself, try these links:
http://www.exelana.com/test.cgi
http://www.exelana.com/test.cgi?text/plain
http://www.exelana.com/test.cgi?image/jpeg

Perhaps we can add an option to wget so that it will look for an html tag
in plain text files?

Tony



Re: Wget 1.8.2 bug

2003-10-17 Thread Hrvoje Niksic
Tony Lewis [EMAIL PROTECTED] writes:

 Hrvoje Niksic wrote:

 Incidentally, Wget is not the only browser that has a problem with
 that.  For me, Mozilla is simply showing the source of
 http://www.minskshop.by/cgi-bin/shop.cgi?id=1cookie=set, because
 the returned content-type is text/plain.

 On the other hand, Internet Explorer will treat lots of content
 types as HTML if the content starts with html.

I know.  But so far noone has asked for this in Wget.

 Perhaps we can add an option to wget so that it will look for an
 html tag in plain text files?

If more people clamor for the option, I suppose we could overload
`--force-html' to perform such detection.


wget downloading a single page when it should recurse

2003-10-17 Thread Philip Mateescu
Hi,

I'm having a problem with wget 1.8.2 cygwin and I'm almost ready to
swear it once worked...
I'm trying to download the php manual off the web using this command:

$ wget -nd -nH -r -np -p -k -S http://us4.php.net/manual/en/print/index.php



Here's the result:

--10:12:15--  http://us4.php.net/manual/en/print/index.php
= `index.php'
Resolving us4.php.net... done.
Connecting to us4.php.net[209.197.17.2]:80... connected.
HTTP request sent, awaiting response...
  1 HTTP/1.1 200 OK
  2 Date: Fri, 17 Oct 2003 15:12:18 GMT
  3 Server: Apache/1.3.27 (Unix) Debian GNU/Linux PHP/4.3.2
mod_python/2.7.8 Pyth
on/2.2.3 mod_ssl/2.8.14 OpenSSL/0.9.7b mod_perl/1.27 mod_lisp/2.32 DAV/1.0.3
  4 X-Powered-By: PHP/4.3.2
  5 Content-language: en
  6 Set-Cookie: LAST_LANG=en; expires=Sat, 16-Oct-04 15:12:18 GMT;
path=/; domain
=.php.net
  7 Set-Cookie: COUNTRY=USA%2C65.208.59.73; expires=Fri, 24-Oct-03
15:12:18 GMT;
path=/; domain=.php.net
  8 Status: 200 OK
  9 Last-Modified: Sat, 18 Oct 2003 06:12:28 GMT
10 Vary: Cookie
11 Connection: close
12 Content-Type: text/html;charset=ISO-8859-1
 [  =] 13,96136.16K/s

10:12:17 (36.16 KB/s) - `index.php' saved [13961]

FINISHED --10:12:17--
Downloaded: 13,961 bytes in 1 files
Converting index.php... 3-183
Converted 1 files in 0.01 seconds.
I expected it to follow the links and
Am I doing anything wrong?
Thank you very much,

philip
---
Don't belong. Never join. Think for yourself. Peace
---


Re: wget downloading a single page when it should recurse

2003-10-17 Thread Aaron S. Hawley
The HTML of those pages contains the meta-tag

meta name=robots content=noindex,nofollow /

and Wget listened, and only downloaded the first page.

Perhaps Wget should give a warning message that the file contained a
meta-robots tag, so that people aren't quite so dumb-founded.

/a

On Fri, 17 Oct 2003, Philip Mateescu wrote:

 Hi,

 I'm having a problem with wget 1.8.2 cygwin and I'm almost ready to
 swear it once worked...

 I'm trying to download the php manual off the web using this command:

 $ wget -nd -nH -r -np -p -k -S http://us4.php.net/manual/en/print/index.php

-- 
Consider supporting GNU Software and the Free Software Foundation
By Buying Stuff - http://www.gnu.org/gear/
  (GNU and FSF are not responsible for this promotion
   nor necessarily agree with the views of the author)


Re: wget downloading a single page when it should recurse

2003-10-17 Thread Philip Mateescu
Thanks!

A warning message would be nice when for not so obvious reasons wget 
doesn't behave as one would expect.

I don't know if there are other tags that could change wget's behavior 
(like -r and meta name=robots do), but if they happen it would be 
useful to have a message.

Thanks again!



Aaron S. Hawley wrote:

The HTML of those pages contains the meta-tag

meta name=robots content=noindex,nofollow /

and Wget listened, and only downloaded the first page.

Perhaps Wget should give a warning message that the file contained a
meta-robots tag, so that people aren't quite so dumb-founded.
/a

On Fri, 17 Oct 2003, Philip Mateescu wrote:


Hi,

I'm having a problem with wget 1.8.2 cygwin and I'm almost ready to
swear it once worked...
I'm trying to download the php manual off the web using this command:

$ wget -nd -nH -r -np -p -k -S http://us4.php.net/manual/en/print/index.php


---
Don't belong. Never join. Think for yourself. Peace
---


Re: wget downloading a single page when it should recurse

2003-10-17 Thread Tony Lewis
Philip Mateescu wrote:

 A warning message would be nice when for not so obvious reasons wget
 doesn't behave as one would expect.

 I don't know if there are other tags that could change wget's behavior
 (like -r and meta name=robots do), but if they happen it would be
 useful to have a message.

I agree that this is worth a notable mention in the wget output. At the very
least, running with -d should provided more guidance on why the links it has
appended to urlpos are not being followed. Buried in the middle of hundreds
of lines of output is:

no-follow in index.php

On the other hand, if other rules prevent a URL from being followed, you
might see something like:

Deciding whether to enqueue http://www.othersite.com/index.html;.
This is not the same hostname as the parent's (www.othersite.com and
www.thissite.com).
Decided NOT to load it.

Tony



Re: wget downloading a single page when it should recurse

2003-10-17 Thread Hrvoje Niksic
Aaron S. Hawley [EMAIL PROTECTED] writes:

 The HTML of those pages contains the meta-tag

 meta name=robots content=noindex,nofollow /

 and Wget listened, and only downloaded the first page.

 Perhaps Wget should give a warning message that the file contained a
 meta-robots tag, so that people aren't quite so dumb-founded.

Good point.  A message would be easy to add, and in this case
enormously useful.



Re: Wget 1.9 about to be released

2003-10-17 Thread Hrvoje Niksic
In case you're curious, I'm still waiting for a response from the GNU
people.

If I don't hear from them soon, I'll release 1.9 anyway and put it on
a private FTP site.  That way there will be a release for Noel to
package for Debian and we can watch the fun as the bug reports start
pouring.


P.S.
The question about the maintainership of the stable branch wasn't a
joke.  If someone wants to volunteer, please contact me.



wget syntax for extension reject

2003-10-17 Thread Patrick Robinson
Hello,


would someone be so kind to tell me the exact syntax to tell wget which
extensions it should reject, e.g. mp3 or mpg
Somehow I can't figure it out by trying and by the manpage.

Thanks

Regards

Patrick Robinson