Re: Bug#261755: Control sequences injection patch

2004-08-22 Thread Jan Minar
tags 261755 +patch
thanks

On Sun, Aug 22, 2004 at 11:39:07AM +0200, Thomas Hood wrote:
> The changes contemplated look very invasive.  How quickly can this
> bug be fixed?

Here we go:  Hacky, non-portable, but pretty slick & non-invasive,
whatever that means.  Now I'm going to check whether it is going to
catch all the cases where malicious characters could be possibly
injected.

This patch (hopefully) solves the problem of remote attacker (server or
otherwise) injects malicious control sequences in the HTTP headers.  It
by no mean solves the spoofing bug, which is by nature tricky to address
well.

Cheers,
Jan.

-- 
   "To me, clowns aren't funny. In fact, they're kind of scary. I've wondered
 where this started and I think it goes back to the time I went to the circus,
  and a clown killed my dad."
--- wget-1.9.1.WORK/debian/changelog2004-08-22 19:34:16.0 +0200
+++ wget-1.9.1-jan/debian/changelog 2004-08-22 19:39:48.0 +0200
@@ -1,3 +1,12 @@
+wget (1.9.1-4.local-1) unstable; urgency=medium
+
+  * Local build
+  * Hopeless attempt to filter control chars in log output (see
+Bug#267393)
+  * This probably SHOULD make it in Sarge revision 0
+
+ -- Jan MinĂĄĹ? <[EMAIL PROTECTED]>  Sun, 22 Aug 2004 19:39:02 +0200
+
 wget (1.9.1-4) unstable; urgency=low
 
   * made passive the default. sorry forgot again.:(
--- wget-1.9.1.WORK/src/log.c   2004-08-22 19:34:16.0 +0200
+++ wget-1.9.1-jan/src/log.c2004-08-22 19:31:33.0 +0200
@@ -63,6 +63,12 @@
 #include "wget.h"
 #include "utils.h"
 
+/* vasprintf() requires _GNU_SOURCE.  Which is OK with Debian. */
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include 
+
 #ifndef errno
 extern int errno;
 #endif
@@ -345,7 +351,49 @@
   int expected_size;
   int allocated;
 };
+
+/* XXX Where does the declaration belong?? */
+void escape_buffer (char **src);
 
+/*
+ * escape_untrusted  -- escape using '\NNN'.  To be used wherever we want to
+ * print untrusted data.
+ *
+ * Syntax: escape_buffer (&buf-to-escape);
+ */
+void escape_buffer (char **src)
+{
+   char *dest;
+   int i, j;
+
+   /* We encode each byte using at most 4 bytes, + trailing '\0'. */
+   dest = xmalloc (4 * strlen (*src) + 1);
+
+   for (i = j = 0; (*src)[i] != '\0'; ++i) {
+   /*
+* We allow any non-control character, because LINE TABULATION
+* & friends can't do more harm than SPACE.  And someone
+* somewhere might be using these, so unless we actually can't
+* protect against spoofing attacks, we don't pretend we can.
+*
+* Note that '\n' is included both in the isspace() *and*
+* iscntrl() range.
+*/
+   if (isprint((*src)[i]) || isspace((*src)[i])) {
+   dest[j++] = (*src)[i];
+   } else {
+   dest[j++] = '\\';
+   dest[j++] = '0' + (((*src)[i] & 0xff) >> 6);
+   dest[j++] = '0' + (((*src)[i] & 0x3f) >> 3);
+   dest[j++] = '0' + ((*src)[i] & 7);
+   }
+   }
+   dest[j] = '\0';
+
+   xfree (*src);
+   *src = dest;
+}
+
 /* Print a message to the log.  A copy of message will be saved to
saved_log, for later reusal by log_dump_context().
 
@@ -364,15 +412,28 @@
   int available_size = sizeof (smallmsg);
   int numwritten;
   FILE *fp = get_log_fp ();
+  char *buf;
+
+  /* int vasprintf(char **strp, const char *fmt, va_list ap); */
+  if (vasprintf (&buf , fmt, args) == -1) {
+perror (_("Error"));
+exit (1);
+  }
+
+  escape_buffer (&buf);
 
   if (!save_context_p)
 {
   /* In the simple case just call vfprintf(), to avoid needless
  allocation and games with vsnprintf(). */
-  vfprintf (fp, fmt, args);
-  goto flush;
-}
 
+  /* vfprintf() didn't check return value, neither will we */
+  (void) fprintf(fp, "%s", buf);
+}
+  else /* goto flush; */ /* There's no need to use goto here */
+/* This else-clause purposefully shifted 4 columns to the left, so that the
+ * diff is easy to read --Jan */
+{
   if (state->allocated != 0)
 {
   write_ptr = state->bigmsg;
@@ -384,8 +445,12 @@
  missing from legacy systems.  Therefore I consider it safe to
  assume that its return value is meaningful.  On the systems where
  vsnprintf() is not available, we use the implementation from
- snprintf.c which does return the correct value.  */
-  numwritten = vsnprintf (write_ptr, available_size, fmt, args);
+ snprintf.c which does return the correct value.
+ 
+ With snprintf(), this probably doesn't hold anymore.  But this is Debian,
+ so who cares. */
+
+  numwritten = snprintf (write_ptr, available_size, "%s", buf);
 
   /* vsnprintf() will not step over the limit given by available_size.
  If it fails, it will return either -1 (POSIX?) or the numb

Re: Bug in wget 1.9.1 documentation

2004-07-12 Thread Hrvoje Niksic
Tristan Miller <[EMAIL PROTECTED]> writes:

> There appears to be a bug in the documentation (man page, etc.) for
> wget 1.9.1.

I think this is a bug in the man page generation process.



Re: [BUG] wget 1.9.1 and below can't download >=2G file on 32bits system

2004-05-27 Thread Hrvoje Niksic
Yup; 1.9.1 cannot download large files.  I hope to fix this by the
next release.



Re: Bug report

2004-03-24 Thread Hrvoje Niksic
Juhana Sadeharju <[EMAIL PROTECTED]> writes:

> Command: "wgetdir http://liarliar.sourceforge.net";.
> Problem: Files are named as
>   content.php?content.2
>   content.php?content.3
>   content.php?content.4
> which are interpreted, e.g., by Nautilus as manual pages and are
> displayed as plain texts. Could the files and the links to them
> renamed as the following?
>   content.php?content.2.html
>   content.php?content.3.html
>   content.php?content.4.html

Use the option `--html-extension' (-E).

> After all, are those pages still php files or generated html files?
> If they are html files produced by the php files, then it could be a
> good idea to add a new extension to the files.

They're the latter -- HTML files produced by the server-side PHP code.

> Command: "wgetdir 
> http://www.newtek.com/products/lightwave/developer/lscript2.6/index.html";
> Problem: Images are not downloaded. Perhaps because the image links
> are the following:
>   

I've never seen this tag, but it seems to be the same as IMG.  Mozilla
seems to grok it and its DOM inspector thinks it has seen IMG.  Is
this tag documented anywhere?  Does IE understand it too?



Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-05 Thread Hrvoje Niksic
D Richard Felker III <[EMAIL PROTECTED]> writes:

>> The request log shows that the slashes are apparently respected.
>
> I retried a test case and found the same thing -- the slashes were
> respected.

OK.

> Then I remembered that I was using -i. Wget seems to work fine with
> the url on the command line; the bug only happens when the url is
> passed in with:
>
> cat < http://...
> EOF

But I cannot repeat that, either.  As long as the consecutive slashes
are in the query string, they're not stripped.

> Using this method is necessary since it is the ONLY secure way I
> know of to do a password-protected http request from a shell script.

Yes, that is the best way to do it.



Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-04 Thread D Richard Felker III
On Mon, Mar 01, 2004 at 07:25:52PM +0100, Hrvoje Niksic wrote:
> >> > Removing the offending code fixes the problem, but I'm not sure if
> >> > this is the correct solution. I expect it would be more correct to
> >> > remove multiple slashes only before the first occurrance of ?, but
> >> > not afterwards.
> >> 
> >> That's exactly what should happen.  Please give us more details, if
> >> possible accompanied by `-d' output.
> >
> > If you'd still like details now that you know the version I was
> > using, let me know and I'll be happy to do some tests.
> 
> Yes please.  For example, this is how it works for me:
> 
> $ /usr/bin/wget -d "http://www.xemacs.org/something?redirect=http://www.cnn.com";
> DEBUG output created by Wget 1.8.2 on linux-gnu.
> 
> --19:23:02--  http://www.xemacs.org/something?redirect=http://www.cnn.com
>=> `something?redirect=http:%2F%2Fwww.cnn.com'
> Resolving www.xemacs.org... done.
> Caching www.xemacs.org => 199.184.165.136
> Connecting to www.xemacs.org[199.184.165.136]:80... connected.
> Created socket 3.
> Releasing 0x8080b40 (new refcount 1).
> ---request begin---
> GET /something?redirect=http://www.cnn.com HTTP/1.0
> User-Agent: Wget/1.8.2
> Host: www.xemacs.org
> Accept: */*
> Connection: Keep-Alive
> 
> ---request end---
> HTTP request sent, awaiting response...
> ...
> 
> The request log shows that the slashes are apparently respected.

I retried a test case and found the same thing -- the slashes were
respected. Then I remembered that I was using -i. Wget seems to work
fine with the url on the command line; the bug only happens when the
url is passed in with:

cat <

Re: bug in use index.html

2004-03-04 Thread Dražen Kačar
Hrvoje Niksic wrote:
> The whole matter of conversion of "/" to "/index.html" on the file
> system is a hack.  But I really don't know how to better represent
> empty trailing file name on the file system.

Another, for now rather limited, hack: on file systems which support some
sort of file attributes you can mark index.html as an unwanted child of an
empty trailing file name. AFAIK, that should work at least on Solaris and
Linux. Others will join the club one day, I hope.

-- 
 .-.   .-.Yes, I am an agent of Satan, but my duties are largely
(_  \ /  _)   ceremonial.
 |
 |[EMAIL PROTECTED]


Re: bug in use index.html

2004-03-04 Thread Hrvoje Niksic
The whole matter of conversion of "/" to "/index.html" on the file
system is a hack.  But I really don't know how to better represent
empty trailing file name on the file system.



Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-01 Thread Hrvoje Niksic
D Richard Felker III <[EMAIL PROTECTED]> writes:

>> > Think of something like http://foo/bar/redirect.cgi?http://...
>> > wget translates this into: [...]
>> 
>> Which version of Wget are you using?  I think even Wget 1.8.2 didn't
>> collapse multiple slashes in query strings, only in paths.
>
> I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1
> and it persisted.

OK.

>> > Removing the offending code fixes the problem, but I'm not sure if
>> > this is the correct solution. I expect it would be more correct to
>> > remove multiple slashes only before the first occurrance of ?, but
>> > not afterwards.
>> 
>> That's exactly what should happen.  Please give us more details, if
>> possible accompanied by `-d' output.
>
> If you'd still like details now that you know the version I was
> using, let me know and I'll be happy to do some tests.

Yes please.  For example, this is how it works for me:

$ /usr/bin/wget -d "http://www.xemacs.org/something?redirect=http://www.cnn.com";
DEBUG output created by Wget 1.8.2 on linux-gnu.

--19:23:02--  http://www.xemacs.org/something?redirect=http://www.cnn.com
   => `something?redirect=http:%2F%2Fwww.cnn.com'
Resolving www.xemacs.org... done.
Caching www.xemacs.org => 199.184.165.136
Connecting to www.xemacs.org[199.184.165.136]:80... connected.
Created socket 3.
Releasing 0x8080b40 (new refcount 1).
---request begin---
GET /something?redirect=http://www.cnn.com HTTP/1.0
User-Agent: Wget/1.8.2
Host: www.xemacs.org
Accept: */*
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
...

The request log shows that the slashes are apparently respected.



Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-01 Thread D Richard Felker III
On Mon, Mar 01, 2004 at 03:36:55PM +0100, Hrvoje Niksic wrote:
> D Richard Felker III <[EMAIL PROTECTED]> writes:
> 
> > The following code in url.c makes it impossible to request urls that
> > contain multiple slashes in a row in their query string:
> [...]
> 
> That code is removed in CVS, so multiple slashes now work correctly.
> 
> > Think of something like http://foo/bar/redirect.cgi?http://...
> > wget translates this into: [...]
> 
> Which version of Wget are you using?  I think even Wget 1.8.2 didn't
> collapse multiple slashes in query strings, only in paths.

I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1 and
it persisted.

> > Removing the offending code fixes the problem, but I'm not sure if
> > this is the correct solution. I expect it would be more correct to
> > remove multiple slashes only before the first occurrance of ?, but
> > not afterwards.
> 
> That's exactly what should happen.  Please give us more details, if
> possible accompanied by `-d' output.

If you'd still like details now that you know the version I was using,
let me know and I'll be happy to do some tests.

Rich



Re: Bug in wget: cannot request urls with double-slash in the query string

2004-03-01 Thread Hrvoje Niksic
D Richard Felker III <[EMAIL PROTECTED]> writes:

> The following code in url.c makes it impossible to request urls that
> contain multiple slashes in a row in their query string:
[...]

That code is removed in CVS, so multiple slashes now work correctly.

> Think of something like http://foo/bar/redirect.cgi?http://...
> wget translates this into: [...]

Which version of Wget are you using?  I think even Wget 1.8.2 didn't
collapse multiple slashes in query strings, only in paths.

> Removing the offending code fixes the problem, but I'm not sure if
> this is the correct solution. I expect it would be more correct to
> remove multiple slashes only before the first occurrance of ?, but
> not afterwards.

That's exactly what should happen.  Please give us more details, if
possible accompanied by `-d' output.



Re: bug in connect.c

2004-02-06 Thread Hrvoje Niksic
Manfred Schwarb <[EMAIL PROTECTED]> writes:

>> Interesting.  Is it really necessary to zero out sockaddr/sockaddr_in
>> before using it?  I see that some sources do it, and some don't.  I
>> was always under the impression that, as long as you fill the relevant
>> members (sin_family, sin_addr, sin_port), other initialization is not
>> necessary.  Was I mistaken, or is this something specific to FreeBSD?
>>
>> Do others have experience with this?
>
> e.g. look at http://cvs.tartarus.org/putty/unix/uxnet.c
>
> putty encountered the very same problem ...

Amazing.  This obviously doesn't show up when binding to remote
addresses, or it would have been noticed ages ago.

Thanks for the pointer.  This patch should fix the problem in the CVS
version:

2004-02-06  Hrvoje Niksic  <[EMAIL PROTECTED]>

* connect.c (sockaddr_set_data): Zero out
sockaddr_in/sockaddr_in6.  Apparently BSD-derived stacks need this
when binding a socket to local address.

Index: src/connect.c
===
RCS file: /pack/anoncvs/wget/src/connect.c,v
retrieving revision 1.62
diff -u -r1.62 connect.c
--- src/connect.c   2003/12/12 14:14:53 1.62
+++ src/connect.c   2004/02/06 16:59:01
@@ -87,6 +87,7 @@
 case IPV4_ADDRESS:
   {
struct sockaddr_in *sin = (struct sockaddr_in *)sa;
+   xzero (*sin);
sin->sin_family = AF_INET;
sin->sin_port = htons (port);
sin->sin_addr = ADDRESS_IPV4_IN_ADDR (ip);
@@ -96,6 +97,7 @@
 case IPV6_ADDRESS:
   {
struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sa;
+   xzero (*sin6);
sin6->sin6_family = AF_INET6;
sin6->sin6_port = htons (port);
sin6->sin6_addr = ADDRESS_IPV6_IN6_ADDR (ip);


Re: bug in connect.c

2004-02-06 Thread Manfred Schwarb
Interesting.  Is it really necessary to zero out sockaddr/sockaddr_in
before using it?  I see that some sources do it, and some don't.  I
was always under the impression that, as long as you fill the relevant
members (sin_family, sin_addr, sin_port), other initialization is not
necessary.  Was I mistaken, or is this something specific to FreeBSD?
Do others have experience with this?


e.g. look at http://cvs.tartarus.org/putty/unix/uxnet.c

putty encountered the very same problem ...

regards
manfred


Re: bug in connect.c

2004-02-04 Thread Hrvoje Niksic
"francois eric" <[EMAIL PROTECTED]> writes:

> after some test:
> bug is when: ftp, with username and password, with bind address specifyed
> bug is not when: http, ftp without username and password
> looks like memory leaks. so i made some modification before bind:
> src/connect.c:
> --
> ...
>   /* Bind the client side to the requested address. */
>   wget_sockaddr bsa;
> //!
>   memset (&bsa,0,sizeof(bsa));
> /!!
>   wget_sockaddr_set_address (&bsa, ip_default_family, 0, &bind_address);
>   if (bind (sock, &bsa.sa, sockaddr_len ()))
> ..
> --
> after it all downloads become sucesfull.
> i think better do memset in wget_sockaddr_set_address, but it is for your
> choose.

Interesting.  Is it really necessary to zero out sockaddr/sockaddr_in
before using it?  I see that some sources do it, and some don't.  I
was always under the impression that, as long as you fill the relevant
members (sin_family, sin_addr, sin_port), other initialization is not
necessary.  Was I mistaken, or is this something specific to FreeBSD?

Do others have experience with this?



Re: bug report

2004-01-28 Thread Hrvoje Niksic
You are right, it's a bug.  -O is implemented in a weird way, which
makes it work strangely with features such as timestamping and link
conversion.  I plan to fix it when I get around to revamping the file
name generation support for grokking the Content-Disposition header.


Re: Bug: Support of charcters like '\', '?', '*', ':' in URLs

2003-10-21 Thread Hrvoje Niksic
"Frank Klemm" <[EMAIL PROTECTED]> writes:

> Wget don't work properly when the URL contains characters which are
> not allowed in file names on the file system which is currently
> used. These are often '\', '?', '*' and ':'.
>
> Affected are at least:
> - Windows and related OS
> - Linux when using FAT or Samba as file system
[...]

Thanks for the report.  This has been fixed in Wget 1.9-beta.  It
doesn't use characters that FAT can't handle by default, and if you
use a mounted FAT filesystem, you can tell Wget to assume behavior as
if it were under Windows.



Re: bug in 1.8.2 with

2003-10-14 Thread Hrvoje Niksic
You're right -- that code was broken.  Thanks for the patch; I've now
applied it to CVS with the following ChangeLog entry:

2003-10-15  Philip Stadermann  <[EMAIL PROTECTED]>

* ftp.c (ftp_retrieve_glob): Correctly loop through the list whose
elements might have been deleted.




RE: Bug in Windows binary?

2003-10-06 Thread Herold Heiko
> From: Gisle Vanem [mailto:[EMAIL PROTECTED]

> "Jens Rösner" <[EMAIL PROTECTED]> said:
> 
...
 
> I assume Heiko didn't notice it because he doesn't have that function
> in his kernel32.dll. Heiko and Hrvoje, will you correct this ASAP?
> 
> --gv

Probably.
Currently I'm compiling and testing on NT 4.0 only.
Beside that I'm VERY tight on time in this moment so testing usually means
"does it run ? Does it download one sample http and one https site ? Yes ?
Put it up for testing!".

Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax


Re: Bug in Windows binary?

2003-10-05 Thread Hrvoje Niksic
"Gisle Vanem" <[EMAIL PROTECTED]> writes:

> --- mswindows.c.org Mon Sep 29 11:46:06 2003
> +++ mswindows.c Sun Oct 05 17:34:48 2003
> @@ -306,7 +306,7 @@
>  DWORD set_sleep_mode (DWORD mode)
>  {
>HMODULE mod = LoadLibrary ("kernel32.dll");
> -  DWORD (*_SetThreadExecutionState) (DWORD) = NULL;
> +  DWORD (WINAPI *_SetThreadExecutionState) (DWORD) = NULL;
>DWORD rc = (DWORD)-1;
>
> I assume Heiko didn't notice it because he doesn't have that
> function in his kernel32.dll. Heiko and Hrvoje, will you correct
> this ASAP?

I've now applied the patch, thanks.  I use the following ChangeLog
entry:

2003-10-05  Gisle Vanem  <[EMAIL PROTECTED]>

* mswindows.c (set_sleep_mode): Fix type of
_SetThreadExecutionState.



Re: Bug in Windows binary?

2003-10-05 Thread Gisle Vanem
"Jens Rösner" <[EMAIL PROTECTED]> said:

> I downloaded
> wget 1.9 beta 2003/09/29 from Heiko
> http://xoomer.virgilio.it/hherold/
...
> wget -d http://www.google.com
> DEBUG output created by Wget 1.9-beta on Windows.
>
> set_sleep_mode(): mode 0x8001, rc 0x8000
>
> I disabled my wgetrc as well and the output was exactly the same.
>
> I then tested
> wget 1.9 beta 2003/09/18 (earlier build!)
> from the same place and it works smoothly.
>
> Can anyone reproduce this bug?

Yes, but the MSVC version crashed on my machine.  But I've found
the cause caused by my recent change :(

A "simple" case of wrong calling-convention:

--- mswindows.c.org Mon Sep 29 11:46:06 2003
+++ mswindows.c Sun Oct 05 17:34:48 2003
@@ -306,7 +306,7 @@
 DWORD set_sleep_mode (DWORD mode)
 {
   HMODULE mod = LoadLibrary ("kernel32.dll");
-  DWORD (*_SetThreadExecutionState) (DWORD) = NULL;
+  DWORD (WINAPI *_SetThreadExecutionState) (DWORD) = NULL;
   DWORD rc = (DWORD)-1;

I assume Heiko didn't notice it because he doesn't have that function
in his kernel32.dll. Heiko and Hrvoje, will you correct this ASAP?

--gv




Re: BUG in --timeout (exit status)

2003-10-02 Thread Manfred Schwarb
OK, I see.
But I do not agree.
And I don't think it is a good idea to treat the first download special.

In my opinion, exit status 0 means "everything during the whole 
retrieval went OK".
My prefered solution would be to set the final exit status to the highest
exit status of all individual downloads. Of course, retries which are 
triggered by "--tries" should erase the exit status of the previous attempt.
A non-zero exit status does not mean "nothing went OK" but "some individual
downloads failed somehow".
And setting a non-zero exit status does not mean wget has to stop
retrieval immediately, it is OK to continue.

Again, wget's behaviour is not what the user expects.

And the user has always the possibility to make combinations of
--accept, --reject, --domains, etc. so in normal cases all 
individual downloads succeed, if he needs a exit status 0.
If he does not care about exit status, there is no problem at all,
of course...


regards
Manfred


Zitat von Hrvoje Niksic <[EMAIL PROTECTED]>:

> This problem is not specific to timeouts, but to recursive download (-r).
> 
> When downloading recursively, Wget expects some of the specified
> downloads to fail and does not propagate that failure to the code that
> sets the exit status.  This unfortunately includes the first download,
> which should probably be an exception.
> 




This message was sent using IMP, the Internet Messaging Program.


Re: BUG in --timeout (exit status)

2003-10-02 Thread Hrvoje Niksic
This problem is not specific to timeouts, but to recursive download (-r).

When downloading recursively, Wget expects some of the specified
downloads to fail and does not propagate that failure to the code that
sets the exit status.  This unfortunately includes the first download,
which should probably be an exception.


RE: bug maybe?

2003-09-23 Thread Matt Pease
how do I get off this list?   I tried a few times before & 
got no response from the server.

thank you-
Matt

> -Original Message-
> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 23, 2003 8:53 PM
> To: Randy Paries
> Cc: [EMAIL PROTECTED]
> Subject: Re: bug maybe?
> 
> 
> "Randy Paries" <[EMAIL PROTECTED]> writes:
> 
> > Not sure if this is a bug or not.
> 
> I guess it could be called a bug, although it's no simple oversight.
> Wget currently doesn't support large files.
> 


Re: bug maybe?

2003-09-23 Thread Hrvoje Niksic
"Randy Paries" <[EMAIL PROTECTED]> writes:

> Not sure if this is a bug or not.

I guess it could be called a bug, although it's no simple oversight.
Wget currently doesn't support large files.



Re: bug in wget 1.8.1/1.8.2

2003-09-16 Thread Hrvoje Niksic
Dieter Drossmann <[EMAIL PROTECTED]> writes:

> I use a extra file with a long list of http entries. I included this
> file with the -i option.  After 154 downloads I got an error
> message: Segmentation fault.
>
> With wget 1.7.1 everything works well.
>
> Is there a new limit of lines?

No, there's no built-in line limit, what you're seeing is a bug.

I cannot see anything wrong inspecting the code, so you'll have to
help by providing a gdb backtrace.  You can get it by doing this:

* Compile Wget with `-g' by running `make CFLAGS=-g' in its source
  directory (after configure, of course.)

* Go to the src/ directory and run that version of Wget the same way
  you normally run it, e.g. ./wget -i FILE.

* When Wget crashes, run `gdb wget core', type `bt' and mail us the
  resulting stack trace.

Thanks for the report.



Re: bug in wget - wget break on time msec=0

2003-09-13 Thread Hrvoje Niksic
"Boehn, Gunnar von" <[EMAIL PROTECTED]> writes:

> I think I found a bug in wget.

You did.  But I believe your subject line is slightly incorrect.  Wget
handles 0 length time intervals (see the assert message), but what it
doesn't handle are negative amounts.  And indeed:

> gettimeofday({1063461157, 858103}, NULL) = 0
> gettimeofday({1063461157, 858783}, NULL) = 0
> gettimeofday({1063461157, 880833}, NULL) = 0
> gettimeofday({1063461157, 874729}, NULL) = 0

As you can see, the last gettimeofday returned time *preceding* the
one before it.  Your ntp daemon must have chosen that precise moment
to set back the system clock by ~6 milliseconds, to which Wget reacted
badly.

Even so, Wget shouldn't crash.  The correct fix is to disallow the
timer code from ever returning decreasing or negative time intervals.
Please let me know if this patch fixes the problem:


2003-09-14  Hrvoje Niksic  <[EMAIL PROTECTED]>

* utils.c (wtimer_sys_set): Extracted the code that sets the
current time here.
(wtimer_reset): Call it.
(wtimer_sys_diff): Extracted the code that calculates the
difference between two system times here.
(wtimer_elapsed): Call it.
(wtimer_elapsed): Don't return a value smaller than the previous
one, which could previously happen when system time is set back.
Instead, reset start time to current time and note the elapsed
offset for future calculations.  The returned times are now
guaranteed to be monotonically nondecreasing.

Index: src/utils.c
===
RCS file: /pack/anoncvs/wget/src/utils.c,v
retrieving revision 1.51
diff -u -r1.51 utils.c
--- src/utils.c 2002/05/18 02:16:25 1.51
+++ src/utils.c 2003/09/13 23:09:13
@@ -1532,19 +1532,30 @@
 # endif
 #endif /* not WINDOWS */
 
-struct wget_timer {
 #ifdef TIMER_GETTIMEOFDAY
-  long secs;
-  long usecs;
+typedef struct timeval wget_sys_time;
 #endif
 
 #ifdef TIMER_TIME
-  time_t secs;
+typedef time_t wget_sys_time;
 #endif
 
 #ifdef TIMER_WINDOWS
-  ULARGE_INTEGER wintime;
+typedef ULARGE_INTEGER wget_sys_time;
 #endif
+
+struct wget_timer {
+  /* The starting point in time which, subtracted from the current
+ time, yields elapsed time. */
+  wget_sys_time start;
+
+  /* The most recent elapsed time, calculated by wtimer_elapsed().
+ Measured in milliseconds.  */
+  long elapsed_last;
+
+  /* Approximately, the time elapsed between the true start of the
+ measurement and the time represented by START.  */
+  long elapsed_pre_start;
 };
 
 /* Allocate a timer.  It is not legal to do anything with a freshly
@@ -1577,22 +1588,17 @@
   xfree (wt);
 }
 
-/* Reset timer WT.  This establishes the starting point from which
-   wtimer_elapsed() will return the number of elapsed
-   milliseconds.  It is allowed to reset a previously used timer.  */
+/* Store system time to WST.  */
 
-void
-wtimer_reset (struct wget_timer *wt)
+static void
+wtimer_sys_set (wget_sys_time *wst)
 {
 #ifdef TIMER_GETTIMEOFDAY
-  struct timeval t;
-  gettimeofday (&t, NULL);
-  wt->secs  = t.tv_sec;
-  wt->usecs = t.tv_usec;
+  gettimeofday (wst, NULL);
 #endif
 
 #ifdef TIMER_TIME
-  wt->secs = time (NULL);
+  time (wst);
 #endif
 
 #ifdef TIMER_WINDOWS
@@ -1600,39 +1606,76 @@
   SYSTEMTIME st;
   GetSystemTime (&st);
   SystemTimeToFileTime (&st, &ft);
-  wt->wintime.HighPart = ft.dwHighDateTime;
-  wt->wintime.LowPart  = ft.dwLowDateTime;
+  wst->HighPart = ft.dwHighDateTime;
+  wst->LowPart  = ft.dwLowDateTime;
 #endif
 }
 
-/* Return the number of milliseconds elapsed since the timer was last
-   reset.  It is allowed to call this function more than once to get
-   increasingly higher elapsed values.  */
+/* Reset timer WT.  This establishes the starting point from which
+   wtimer_elapsed() will return the number of elapsed
+   milliseconds.  It is allowed to reset a previously used timer.  */
 
-long
-wtimer_elapsed (struct wget_timer *wt)
+void
+wtimer_reset (struct wget_timer *wt)
 {
+  /* Set the start time to the current time. */
+  wtimer_sys_set (&wt->start);
+  wt->elapsed_last = 0;
+  wt->elapsed_pre_start = 0;
+}
+
+static long
+wtimer_sys_diff (wget_sys_time *wst1, wget_sys_time *wst2)
+{
 #ifdef TIMER_GETTIMEOFDAY
-  struct timeval t;
-  gettimeofday (&t, NULL);
-  return (t.tv_sec - wt->secs) * 1000 + (t.tv_usec - wt->usecs) / 1000;
+  return ((wst1->tv_sec - wst2->tv_sec) * 1000
+ + (wst1->tv_usec - wst2->tv_usec) / 1000);
 #endif
 
 #ifdef TIMER_TIME
-  time_t now = time (NULL);
-  return 1000 * (now - wt->secs);
+  return 1000 * (*wst1 - *wst2);
 #endif
 
 #ifdef WINDOWS
-  FILETIME ft;
-  SYSTEMTIME st;
-  ULARGE_INTEGER uli;
-  GetSystemTime (&st);
-  SystemTimeToFileTime (&st, &ft);
-  uli.HighPart = ft.dwHighDateTime;
-  uli.LowPart = ft.dwLowDateTime;
-  return (long)((uli.QuadPart - wt->wintime.QuadPart) / 1);
+  return (long)(wst1->QuadPart - wst2->QuadPart) / 1;
 #endif
+}
+
+/* R

RE: Bug in total byte count for large downloads

2003-08-26 Thread Herold Heiko
Wget 1.5.3 is ancient.
You should be well advised to upgrade to the current stable version (1.8.2)
or better the latest development version (1.9beta) even if wget is currently
in develpment stasis due to lack of maintainer.
You can find more information how to get the sources at
http://wget.sunsite.dk/
There are about 35 user visible changes mentioned in the "news" file after
1.5.3, so take a look at that before upgrading.
Heiko 

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax

> -Original Message-
> From: Stefan Recksiegel 
> [mailto:[EMAIL PROTECTED]
> Sent: Monday, August 25, 2003 6:49 PM
> To: [EMAIL PROTECTED]
> Subject: Bug in total byte count for large downloads
> 
> 
> Hi,
> 
> this may be known, but
> 
> [EMAIL PROTECTED]:/scratch/suse82> wget --help
> GNU Wget 1.5.3, a non-interactive network retriever.
> 
> gave me
> 
> FINISHED --18:32:38--
> Downloaded: -1,713,241,830 bytes in 5879 files
> 
> while
> 
> [EMAIL PROTECTED]:/scratch/suse82> du -c
> 6762560 total
> 
> would be correct.
> 
> Best wishes,  Stefan
> 
> -- 
> 
> * Stefan Recksiegelstefan AT recksiegel.de *
> * Physikdepartment T31 office +49-89-289-14612 *
> * Technische Universität München home +49-89-9547 4277 *
> * D-85747 Garching, Germanymobile +49-179-750 2854 *
> 
> 
> 


Re: bug in --spider option

2003-08-14 Thread Aaron S. Hawley
On Mon, 11 Aug 2003, dEth wrote:

> Hi everyone!
>
> I'm using wget to check if some files are downloadable, I also use to
> determine the size of the file. Yesterday I noticed that wget
> ignores --spider option for ftp addresses.
> It had to show me the filesize and other parameters, but it began to
> download the file :( That's too bad. Can anyone fix it? My only idea
> was to shorten the time of work using supported options, so that the
> downloading would be aborted. That's a user's solution, now a
> programmer's one is needed.

http://www.google.com/search?q=wget+spider+ftp

> The other problem is that wget dowesn't correctly replace hrefs in
> downloaded pages (it uses hostname of the local machine to replace
> remote hostname and there's no feature to give any other base-url the
> -B option is for another purpose.) If anyone is interested, I can
> describe the problem more detailed. If it won't be fixed, I'll write
> a perl script to replace base urls after wget downloads pages I need,
> but that's not the best way).

would this option help?:

GNU Wget 1.8.1, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...
..
  -k,  --convert-links  convert non-relative links to relative.
..


Re: Bug, feature or my fault?

2003-08-08 Thread Aaron S. Hawley
On Wed, 6 Aug 2003, DervishD wrote:

> Hi all :))
>
> After asking in the wget list (with no success), and after having
> a look at the sources (a *little* look), I think that this is a bug,
> so I've decided to report here.

note, the bug and the help lists are currently the same list.

[snip]


Re: Bug with specials characters : can't write output file

2003-02-08 Thread Kalin KOZHUHAROV
Hello!

I have found the following bug with wget 1.8.1 (windows) :



I try to download picture of CD audio from this URL :
wget could get this picture from the web server, but can't write the
output file :

-
http://www.aligastore.com/query.dll/img?gcdFab=8811803124&type=0
   => `img?gcdFab=8811803124&type=0'
Resolving www.aligastore.com... done.
Connecting to www.aligastore.com[217.167.112.169]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 509 [text/html]

Do you see what I see? I didn't know that pictures are of type text/html :-)
Apparently Microsoft IIS think so :-) If your IE is showing that binary 
data as picture it is wrong, should give error instead, or try to show 
it as HTML. You can save it and later view it though.

img?gcdFab=8811803124&type=0: Invalid argument
-

I think that "?" is a special character and is forbidden in the file
name.

Yes. In Windoze.

Solution :
Replace all special characters (under windows : ' " * ? : / \ | < > ')
by _ for example.
It may be OS dependant ?

Yes, it is. And FS dependent as well. (File System) Try to save the file 
'VeryveryLONG--_%fileewqewqewqewqe' to a FAT12. And you have no idea 
what FS are around.

Better solution:

1. RTFM.
2. Again if you don't find what you are looking for.
3. Search Google.
By any of 1..3 you should have understand that there is an option like this:

   --output-document=file
   The documents will not be written to the appropriate
   files, but all will be concatenated together and writ-
   ten to file.  If file already exists, it will be over-
   written.  If the file is -, the documents will be
   written to standard output.  Including this option
   automatically sets the number of tries to 1.

Of course, you will need to get your files one at a time, unless you use 
some kind of script. Live with it.

Suggestion :
Especially with the "--input-file" option, it will be usefull to have
the possibility to provide the output file name.

>

For example, if the input file contain something like : "input
filename;ouput filename", it will be fine !

Don't get you here...

I know I am being sarcastic, but it helps (sometimes).
So, your complete answer:
C:\windowze\MyDocu~2>wget -S -O YourFav~1.gif 
'http://www.aligastore.com/query.dll/img?gcdFab=0724349987128&type=0&default=imgdef.gif&TYPEPRODUIT=DISQUES'
--11:32:47-- 
http://www.aligastore.com/query.dll/img?gcdFab=0724349987128&type=0&default=imgdef.gif&TYPEPRODUIT=DISQUES
   => `YourFav~1.gif'
Resolving www.aligastore.com... done.
Connecting to www.aligastore.com[217.167.112.169]:80... connected.
HTTP request sent, awaiting response...
 1 HTTP/1.1 200 OK
 2 Server: Microsoft-IIS/5.0
 3 Date: Sun, 09 Feb 2003 02:31:18 GMT
 4 Connection: keep-alive
 5 Content-Type: text/html
 6 Content-Length: 1666
 7 Content:

100%[>] 1,666932.54K/s 
ETA 00:00

11:32:47 (32.54 KB/s) - `YourFav~1.gif' saved [1666/1666]


Best regards,
Kalin Kozhuharov.

P.S. ALWAYS USE --server-response (or -S for short) when diagnosing 
problems.

--
||///_ o  *
||//,_/> WWW: http://ThinRope.net/
|||\ <"
|||\\ '
^^^



Re: Bug in relative URL handling

2003-01-26 Thread Kalin KOZHUHAROV
Gary Hargrave wrote:

--- Kalin KOZHUHAROV <[EMAIL PROTECTED]> wrote:

Well, I am sure it is wrong URL, but took some time till I pinpoint
it in  RFC1808. Otherwise it would be very difficult to code URL
parser.

Ooops :-) It seems I was wrong...


BTW, did you try to click in your browser on that link?

Relative links beginning with "http:" work fine in Mozilla and
Internet Explorer.

My fault again. I didn't click on a link, but instead put that in the
location bar without thinking it that it is completely different for 
relative URLs.

Since Mozilla is designed for standards compliance they appear to
interpret rfc1808 they same way I do.

Yes! Viva Mozilla!

Kalin.

--
||///_ o  *
||//,_/> WWW: http://ThinRope.net/
|||\ <"
|||\\ '
^^^




Re: Bug in relative URL handling

2003-01-24 Thread Gary Hargrave
--- Kalin KOZHUHAROV <[EMAIL PROTECTED]> wrote:
>Well, I am sure it is wrong URL, but took some time till I pinpoint it 
>in  RFC1808. Otherwise it would be very difficult to code URL parser. 
>What you actually try to convince us is that you can omit the 
>net-location (i.e. usually comes in the middle) and still be able to 

>From the rfc:

|URL= ( absoluteURL | relativeURL ) [ "#" fragment ]
|
| absoluteURL = generic-RL | ( scheme ":" *( uchar | reserved ))
|
| generic-RL  = scheme ":" relativeURL
|
| relativeURL = net_path | abs_path | rel_path
|
| net_path= "//" net_loc [ abs_path ]
| abs_path= "/"  rel_path
| rel_path= [ path ] [ ";" params ] [ "?" query ]

It is clear that if the string after the ":" does not
begin with "//" or "/" then it is a relative path.

>tell the location. Then how do you interpret http:program.com ?
>Is it a site program in TLD com, or a .com (DOS executable) file served 
>who knows why via http?

It does not have a // before the program.com so it is not
a TLD.

>
>So one of the places this is discussed in RFC1808 is:
>
>4.  Resolving Relative URLs
>...
>
>Step 2b): If the embedded URL starts with a scheme name, it is 
> interpreted as an *absolute* URL and we are done.

The rfc states that this is an example algorithm. It does not
claim it is the definitive algorithm.


>BTW, did you try to click in your browser on that link?

Relative links beginning with "http:" work fine in
Mozilla and Internet Explorer. Since Mozilla is designed
for standards compliance they appear to interpret 
rfc1808 they same way I do.

Gary


_
Get your FREE E-mail @Gibweb.net. Visit www.GibWeb.net for Gibraltar 
weather,news,lottery results,search and much more.



Re: Bug in relative URL handling

2003-01-23 Thread Kalin KOZHUHAROV
I just realized, I didn't send this and some other post to the list, but 
directly to the replier...

Gary Hargrave wrote:
wget does not seem to handle relative links in web pages
of the form

http:page3.html

According to my understanding of rfc1808 this is a valid
URL. When recursively retrieving html pages wget ignores
these links with out displaying an error or warning.


Well, I am sure it is wrong URL, but took some time till I pinpoint it
in  RFC1808. Otherwise it would be very difficult to code URL parser.
What you actually try to convince us is that you can omit the
net-location (i.e. usually comes in the middle) and still be able to
tell the location. Then how do you interpret http:program.com ?
Is it a site program in TLD com, or a .com (DOS executable) file served
who knows why via http?

So one of the places this is discussed in RFC1808 is:

4.  Resolving Relative URLs
...

Step 2b): If the embedded URL starts with a scheme name, it is
 interpreted as an *absolute* URL and we are done.

BTW, did you try to click in your browser on that link?

Kalin.

--
||///_ o  *
||//,_/> WWW: http://ThinRope.net/
|||\ <"   mobile: +81 (90) 6265-0856
|||\\ ' NetPager: [EMAIL PROTECTED]
^^^





Re: BUG on multiprocessor systems

2003-01-09 Thread Max Bowsher
Grzegorz Dzięgielewski wrote:
> Hello!
>
> While wget is used on dualcpu machine the assert(msecs>=0) from
> calc_rate() broke program execution with this:
> wget: retr.c:262: calc_rate: Warunek `msecs >= 0' nie został
> spełniony. (Polish locale - sorry)
>
> We think that bug is in wtimer_elapsed() function. Probably it's a
> problem with forking and timer. We have cutted of the measure of KB/s
> and everything is ok.
>
> My system: Debian woody 2.4.20+grsec1.9.8a, 2xCeleron433Mhz, 192RAM.

When I experienced this, it was because gettimeofday()'s subsecond value
sometimes went *backwards*.
To test for this, I made this code:

--- gtodtest.c --
#include 
#include 

int main(int argc, char* argv[]) {
struct timeval t,t1 = {0,0};
setvbuf(stdout, NULL, _IONBF, 0);
while (1) {
gettimeofday(&t,NULL);
//  printf("%12li : %7li\n", t.tv_sec, t.tv_usec);
if (t.tv_sec == t1.tv_sec)
{ if (t.tv_usec < t1.tv_usec) putc('!',stdout); }
else putc('.',stdout);
t1.tv_sec = t.tv_sec;
t1.tv_usec = t.tv_usec;
}
return 0;
}
-

Compile and run. It prints a '.' every second, and a '!' every time
gettimeofday misbehaves. If you get lots of '!'s then you know this is the
problem. A crude solution is to change the assert to "if (msecs >= 0) msecs
= 0;".

Max.




Re: bug or limitation of wget used to access VMS servers

2003-01-09 Thread Ken Senior
Max,

The newer versions of wget support VMS ftp servers for sure.  As I
understand it, a separate piece of code was written to deal with them. 
The syntax for navigating a VMS ftp server is a bit strange, e.g., to
change directories:

cd [level1.level2]

To change disks:

cd disk1:[level1.level2]

However, the piece of code written in wget to allow for VMS ftp (with
the exception of changing disks) allows one to use the usual UNIX style:

wget ftp://vmssite.com/level1/level2/filename

BUT, evidently (as far as I know), it does not support changing of
disks.  Does anyone else know about this?

Ken


On Wed, 2003-01-08 at 16:53, Max Bowsher wrote:
> 
> - Original Message -
> From: "Ken Senior" <[EMAIL PROTECTED]>
> 
> 
> > There does not seem to be support to change disks when accessing a VMS
> > server via wget.  Is this a bug or just a limitation?
> 
> Wget does plain old HTTP and FTP. I know nothing about VMS. Does it have
> some strange syntax for discs?
> 
> Max.
> 
-- 
_  __   _____
   / |/ /  / _ \  / /   Ken Senior  (202)767-2043
  Naval Research Lab_   Space Applications Branch
/_/|_/  /_/|_| //   [EMAIL PROTECTED]

Code 8154 . 4555 Overlook AV SW . Washington, DC  20375





Re: bug or limitation of wget used to access VMS servers

2003-01-08 Thread Max Bowsher

- Original Message -
From: "Ken Senior" <[EMAIL PROTECTED]>


> There does not seem to be support to change disks when accessing a VMS
> server via wget.  Is this a bug or just a limitation?

Wget does plain old HTTP and FTP. I know nothing about VMS. Does it have
some strange syntax for discs?

Max.




Re: Bug with user:pass in URL

2002-09-16 Thread Daniel Stenberg

On Tue, 17 Sep 2002, Nikolay Kuzmin wrote:

> There is a bug in wget1.8.2 when username or password contains symbol '@'.
> I think you should change code in file src/url.c from

I disagree. The name and password fields must never contain a @ letter, as it
is a reserved letter in URL strings. If your name or password contain @, then
replace it with %40 in the URL.

-- 
  Daniel Stenberg - http://daniel.haxx.se - +46-705-44 31 77
   ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol




Re: bug?

2002-09-11 Thread Thomas Lussnig

Mats Andrén wrote:

> I found this problem when fetching files recursively:
>
> What if the filenames of linked files from a www-page contains the 
> []-characters? They are treated as some kind of patterns, and not just 
> the way they are. Clearly not desirable! Since wget just fetches the 
> filenames from the www-page, it's beyond the wget-users control to 
> insert escapechars, and so on.

Hi,
are you sure that this is wget's fault ?
Did you enclose the filename in '' like "wget 
'http://domain/file[a-z].html'", else it is the bash and not wgets fault.
It is under Linux an commen mistake that people think the programm do 
the expansion. But it is the bash who do it for
parameter.

Cu thomas




msg04282/pgp0.pgp
Description: PGP signature


Re: [BUG] assert test msecs

2002-08-04 Thread Colin 't Hart

> I have run across this problem too. It is because with Linux 2.4.18 (and
other
> versions??) in certain circumstances, gettimeofday() is broken and will
jump
> backwards. See http://kt.zork.net/kernel-traffic/kt20020708_174.html#1.
>
> Is there any particular reason for this assert? If there is, maybe:
> if (msecs < 0) msecs = 0;
> would be more suitable.

Seems like this is only used to calculate a rate to display on the screen.
Maybe we should just accept Linux's opinion that time is going backwards.
Eventually it should go forwards again. :-)

Cheers,

Colin





Re: [BUG] assert test msecs

2002-08-01 Thread Max Bowsher

Hartwig, Thomas wrote:
> I got a assert exit of wget in "retr.c" in the function "calc_rate"
> because "msecs" is 0 or lesser than 0 (in spare cases).
> I don't know how perhaps because I have a big line to the server or
> the wrong OS. To get worked with this I patched "retr.c" setting
> "msecs = 1" if equal or below zero.
>
> Some informations are added below, what else do you need?
>
> #: cat /proc/version
> Linux version 2.4.18 (root@netbrain) (gcc version 2.96 2731 (Red
> Hat Linux 7.3 2.96-110)) #4 Sun Jul 28 09:01:06 CEST 2002

I have run across this problem too. It is because with Linux 2.4.18 (and other
versions??) in certain circumstances, gettimeofday() is broken and will jump
backwards. See http://kt.zork.net/kernel-traffic/kt20020708_174.html#1.

Is there any particular reason for this assert? If there is, maybe:
if (msecs < 0) msecs = 0;
would be more suitable.

Max.




Re: Bug with wget ? I need help.

2002-06-21 Thread Cédric Rosa

thanks for your help :)
I'm installing version 1.9 to check. I think this update may solve my
problem.

Cedric Rosa.

- Original Message -
From: "Hack Kampbjørn" <[EMAIL PROTECTED]>
To: "Cédric Rosa" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, June 21, 2002 7:27 PM
Subject: Re: Bug with wget ? I need help.


> Cédric Rosa wrote:
> >
> > Hello,
> >
> > First, scuse my english but I'm french.
> >
> > When I try with wget (v 1.8.1) to download an url which is behind a
router,
> > the software wait for ever even if I've specified a timeout.
> >
> > With ethereal, I've seen that there is no response from the server (ACK
> > never appears).
> >
> This a documented behavior, because of programming issues the timeout
> does not cover the connection but only response after a connection has
> been established. For version 1.9 the timeout option will also cover the
> connection.
>
>
http://cvs.sunsite.dk/viewcvs.cgi/*checkout*/wget/NEWS?rev=HEAD&content-type
=text/plain
>
> > Here is the debug output:
> > rosa@r1:~/htmlparser1.1/lib$ wget www.sosi.cnrs.fr
> > --16:30:54-- http://www.sosi.cnrs.fr/
> > => `index.html'
> > Resolving www.sosi.cnrs.fr... done.
> > Connecting to www.sosi.cnrs.fr[193.55.87.37]:80...
> >
> > Thanks by advance for your help.
> > Cedric Rosa.
>
> --
> Med venlig hilsen / Kind regards
>
> Hack Kampbjørn




Re: Bug with wget ? I need help.

2002-06-21 Thread Hack Kampbjørn

Cédric Rosa wrote:
> 
> Hello,
> 
> First, scuse my english but I'm french.
> 
> When I try with wget (v 1.8.1) to download an url which is behind a router,
> the software wait for ever even if I've specified a timeout.
> 
> With ethereal, I've seen that there is no response from the server (ACK
> never appears).
> 
This a documented behavior, because of programming issues the timeout
does not cover the connection but only response after a connection has
been established. For version 1.9 the timeout option will also cover the
connection.

http://cvs.sunsite.dk/viewcvs.cgi/*checkout*/wget/NEWS?rev=HEAD&content-type=text/plain

> Here is the debug output:
> rosa@r1:~/htmlparser1.1/lib$ wget www.sosi.cnrs.fr
> --16:30:54-- http://www.sosi.cnrs.fr/
> => `index.html'
> Resolving www.sosi.cnrs.fr... done.
> Connecting to www.sosi.cnrs.fr[193.55.87.37]:80...
> 
> Thanks by advance for your help.
> Cedric Rosa.

-- 
Med venlig hilsen / Kind regards

Hack Kampbjørn



RE: Bug with wget ? I need help.

2002-06-21 Thread Herold Heiko

Try telnet www.sosi.cnrs.fr 80
if it connects type GET / HTTP/1.0 followed by two newlines. If you don't
get the output of the webserver you probably have a routing problem or
something else.

Heiko 

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907472
-- ITALY

> -Original Message-
> From: Cédric Rosa [mailto:[EMAIL PROTECTED]]
> Sent: Friday, June 21, 2002 4:37 PM
> To: [EMAIL PROTECTED]
> Subject: Bug with wget ? I need help.
> 
> 
> Hello,
> 
> First, scuse my english but I'm french.
> 
> When I try with wget (v 1.8.1) to download an url which is 
> behind a router,
> the software wait for ever even if I've specified a timeout.
> 
> With ethereal, I've seen that there is no response from the 
> server (ACK
> never appears).
> 
> Here is the debug output:
> rosa@r1:~/htmlparser1.1/lib$ wget www.sosi.cnrs.fr
> --16:30:54-- http://www.sosi.cnrs.fr/
> => `index.html'
> Resolving www.sosi.cnrs.fr... done.
> Connecting to www.sosi.cnrs.fr[193.55.87.37]:80...
> 
> Thanks by advance for your help.
> Cedric Rosa.
> 



Re: Bug with specific URLs

2002-06-21 Thread Kai Schaetzl

Your message of Thu, 20 Jun 2002 15:49:52 +0200:

> I supposed people would read the index.html. Since this is becoming
> something of a faq I've now I've put a 00Readme.txt on the ftp server and a
> Readme.txt in the binary archives, we'll see if that helps.
>

It should :-)

Kai

--

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org






Re: Bug with specific URLs

2002-06-21 Thread Kai Schaetzl

Your message of Thu, 20 Jun 2002 17:41:06 +0200:

> Short answer use quotes.

Yeah, thanks. I thought it was the "&", but I wasn't aware that I could 
avoid this by quoting.

> > Cannot write to `foto.php4?id=148087' (Invalid argument).
> 
> And this is the question mark problem search the archives or use version
> 1.8.2

I see. As I said, I couldn't get it to work on that day and the NEWS file 
doesn't list this bug. I was able to test this now with 1.8.2 and see that 
it works. However, shouldn't it grab this header and change the file name, 
anyway?

Content-Disposition: inline; filename=147945-.jpg

(This is what I get over a Header-sniffer, f.i.:
http://webtools.mozilla.org/web-sniffer/view.cgi?url=http://217.115.140.10
/picserver/online/foto.php4?id=147945&v=2



Kai

--

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org






Re: Bug ?

2002-06-02 Thread Hrvoje Niksic

I don't know why Wget dumps core on startup.  Perhaps a gettext
problem?  I have seen reports of failure on startup on Solaris, and it
strikes me that Wget could have picked up wrong or inconsistent
gettext.

Try unsetting the locale-related evnironment variables and seeing if
Wget works then.



Re: bug report and patch, HTTPS recursive get

2002-05-17 Thread Kiyotaka Doumae


In message "Re: bug report and patch, HTTPS recursive get",
Ian Abbott wrote...
> Thanks again for the bug report and the proposed patch.  I thought some
> of the scheme tests in recur.c were getting messy, so propose the
> following patch that uses a function to check for similar schemes.

Thanks for your rewriting.
By your patch, the problem was solved.

Thankyou

---
Doumae Kiyotaka
Internet Initiative Japan Inc.
Technical Planning Division



Re: bug report and patch, HTTPS recursive get

2002-05-15 Thread Ian Abbott

On Wed, 15 May 2002 18:44:19 +0900, Kiyotaka Doumae <[EMAIL PROTECTED]> wrote:

>We have following HTML document.
>
>https://www.example.com/index.html
>-
>
>
>http://www.wget.org/";>Another Website
>
>
>-
>
>We run wget with -r option.
>
>> wget -r https://www.example.com/index.html
>
>wget gets http://www.wget.org/ and other url which 
>linked from http://www.wget.org/.

Thanks again for the bug report and the proposed patch.  I thought some
of the scheme tests in recur.c were getting messy, so propose the
following patch that uses a function to check for similar schemes.

The patch incorporates your bug-fix in step 7 of download_child_p() and
makes a similar change in step 4 for consistency.

src/ChangeLog entry:

2002-05-15  Ian Abbott  <[EMAIL PROTECTED]>

* url.c (schemes_are_similar_p): New function to test enumerated
scheme codes for similarity.

* url.h: Declare it.

* recur.c (download_child_p): Use it to compare schemes.  This
also fixes a bug that allows hosts to be spanned (without the
-H option) when the parent scheme is https and the child's is
http or vice versa.

Index: src/recur.c
===
RCS file: /pack/anoncvs/wget/src/recur.c,v
retrieving revision 1.48
diff -u -r1.48 recur.c
--- src/recur.c 2002/04/21 04:25:07 1.48
+++ src/recur.c 2002/05/15 13:05:35
@@ -415,6 +415,7 @@
 {
   struct url *u = upos->url;
   const char *url = u->url;
+  int u_scheme_like_http;
 
   DEBUGP (("Deciding whether to enqueue \"%s\".\n", url));
 
@@ -445,12 +446,11 @@
  More time- and memory- consuming tests should be put later on
  the list.  */
 
+  /* Determine whether URL under consideration has a HTTP-like scheme. */
+  u_scheme_like_http = schemes_are_similar_p (u->scheme, SCHEME_HTTP);
+
   /* 1. Schemes other than HTTP are normally not recursed into. */
-  if (u->scheme != SCHEME_HTTP
-#ifdef HAVE_SSL
-  && u->scheme != SCHEME_HTTPS
-#endif
-  && !(u->scheme == SCHEME_FTP && opt.follow_ftp))
+  if (!u_scheme_like_http && !(u->scheme == SCHEME_FTP && opt.follow_ftp))
 {
   DEBUGP (("Not following non-HTTP schemes.\n"));
   goto out;
@@ -458,11 +458,7 @@
 
   /* 2. If it is an absolute link and they are not followed, throw it
  out.  */
-  if (u->scheme == SCHEME_HTTP
-#ifdef HAVE_SSL
-  || u->scheme == SCHEME_HTTPS
-#endif
-  )
+  if (schemes_are_similar_p (u->scheme, SCHEME_HTTP))
 if (opt.relative_only && !upos->link_relative_p)
   {
DEBUGP (("It doesn't really look like a relative link.\n"));
@@ -483,7 +479,7 @@
  opt.no_parent.  Also ignore it for documents needed to display
  the parent page when in -p mode.  */
   if (opt.no_parent
-  && u->scheme == start_url_parsed->scheme
+  && schemes_are_similar_p (u->scheme, start_url_parsed->scheme)
   && 0 == strcasecmp (u->host, start_url_parsed->host)
   && u->port == start_url_parsed->port
   && !(opt.page_requisites && upos->link_inline_p))
@@ -526,7 +522,7 @@
 }
 
   /* 7. */
-  if (u->scheme == parent->scheme)
+  if (schemes_are_similar_p (u->scheme, parent->scheme))
 if (!opt.spanhost && 0 != strcasecmp (parent->host, u->host))
   {
DEBUGP (("This is not the same hostname as the parent's (%s and %s).\n",
@@ -535,13 +531,7 @@
   }
 
   /* 8. */
-  if (opt.use_robots
-  && (u->scheme == SCHEME_HTTP
-#ifdef HAVE_SSL
- || u->scheme == SCHEME_HTTPS
-#endif
- )
-  )
+  if (opt.use_robots && u_scheme_like_http)
 {
   struct robot_specs *specs = res_get_specs (u->host, u->port);
   if (!specs)
Index: src/url.c
===
RCS file: /pack/anoncvs/wget/src/url.c,v
retrieving revision 1.74
diff -u -r1.74 url.c
--- src/url.c   2002/04/13 03:04:47 1.74
+++ src/url.c   2002/05/15 13:05:36
@@ -2472,6 +2472,24 @@
   downloaded_files_hash = NULL;
 }
 }
+
+/* Return non-zero if scheme a is similar to scheme b.
+ 
+   Schemes are similar if they are equal.  If SSL is supported, schemes
+   are also similar if one is http (SCHEME_HTTP) and the other is https
+   (SCHEME_HTTPS).  */
+int
+schemes_are_similar_p (enum url_scheme a, enum url_scheme b)
+{
+  if (a == b)
+return 1;
+#ifdef HAVE_SSL
+  if ((a == SCHEME_HTTP && b == SCHEME_HTTPS)
+  || (a == SCHEME_HTTPS && b == SCHEME_HTTP))
+return 1;
+#endif
+  return 0;
+}
 
 #if 0
 /* Debugging and testing support for path_simplify. */
Index: src/url.h
===
RCS file: /pack/anoncvs/wget/src/url.h,v
retrieving revision 1.23
diff -u -r1.23 url.h
--- src/url.h   2002/04/13 03:04:47 1.23
+++ src/url.h   2002/05/15 13:05:36
@@ -158,4 +158,6 @@
 
 char *rewrite_shorthand_url PARAMS ((const char *));
 
+int schemes_are_similar_p PARAMS ((enum url_scheme a, enum url_scheme b));
+
 #endif /* URL_H */





Re: bug report and patch, HTTPS recursive get

2002-05-15 Thread Ian Abbott

On Wed, 15 May 2002 18:44:19 +0900, Kiyotaka Doumae <[EMAIL PROTECTED]>
wrote:

>I found a bug of wget with HTTPS resursive get, and proposal
>a patch.

Thanks for the bug report and the proposed patch.  The current scheme
comparison checks are getting messy, so I'll write a function to check
schemes for similarity (when I can spare the time later today).



Re: Bug report

2002-05-04 Thread Ian Abbott

On Fri, 3 May 2002 18:37:22 +0200, Emmanuel Jeandel
<[EMAIL PROTECTED]> wrote:

>ejeandel@yoknapatawpha:~$ wget -r a:b
>Segmentation fault

Patient: Doctor, it hurts when I do this
Doctor: Well don't do that then!

Seriously, this is already fixed in CVS.



Re: Bug#144242: is this progress bar problem [http://bugs.debian.org/144242] fixed in cvs (2002-04-09+10)?

2002-04-25 Thread Noel Koethe

On Don, 25 Apr 2002, Hrvoje Niksic wrote:

Hello,

> Judging by the provided `strace' output, it seems that your problem is
> caused by the network or perhaps even by the remote server.  Wget
> sleeps on `select', waiting for the connection to close.  That should
> not happen -- if the connection is not persistent (which in this case
> it isn't), it should close immediately after the data is received.

There is/was no problem with wget. Here is the solution/answer
from the bug reporter

--8<--quote--8<--
This bug is to do with `transparent' web proxying in our College (Abstract livesdown 
the hall from me).

I've suffered the same problem before from certain sites.

It is fixed by setting the http_proxy environment variable or with the
--no-http-keep-alive option.
--8<--quote--8<--

Thanks for your help!

-- 
Noèl Köthe



Re: Bug report for wget 1.8.1 / MacOSX : french translation

2002-04-11 Thread Hrvoje Niksic

Pascal Vuylsteker <[EMAIL PROTECTED]> writes:

> I've downloaded wget from http://macosx.forked.net/ as a port to
> MacOSX (package).

I'm not sure how internationalization works on MacOS X.  Perhaps you
should ask the people who did the porting?

If you want Wget to print English (original) messages, unset the LANG
environment variable.



Re: BUG: wget -r with https and robots

2002-02-18 Thread Hrvoje Niksic

"Mr.Fritz" <[EMAIL PROTECTED]> writes:

> When I retrieve recursively a directory using a site with https protocol,
> it searches for http://sitename/robots.txt but the site has only port
> 443 (https) open, so there is a connection refused error. Wget thinks
> the site is down and aborts the transfer.
> Wget should search for https://sitename/robots.txt !!

This is weird.  The code looks like it's doing the correct thing.

Questions: are you using the latest version, which is 1.8.1 (`wget
-V')?  If so, could you provide a URL I could use to repeat the
problem?



Re: bug

2002-02-18 Thread Hrvoje Niksic

Peteris Krumins <[EMAIL PROTECTED]> writes:

> GNU Wget 1.8
>
> get: progress.c:673: create_image: Assertion `p - bp->buffer <= bp->width' failed.

This problem has been fixed in Wget 1.8.1.  Please upgrade.



Re: BUG https + index.html

2002-02-01 Thread csaba . raduly


On 01/02/2002 12:10:59 "Mr.Fritz" wrote:

>After the https/robots.txt bug, doing a recursive wget to an https-only
server
>gives me this error: it searches for http://servername/index.html but
there
>is no server on port 80, so wget receives a Connection refused error and
>quits.  It should search for https://servername/index.html 
>

Are you sure this was an SSL-enabled wget ?
Please provide a debug log by running wget with the -d parameter.


--
Csaba Ráduly, Software Engineer   Sophos Anti-Virus
email: [EMAIL PROTECTED]http://www.sophos.com
US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933




Re: bug when processing META tag.

2002-01-31 Thread Hrvoje Niksic

"An, Young Hun" <[EMAIL PROTECTED]> writes:

> if HTML document contains code like this
> 
> 
> 
> wget may be crushed. It has 'refresh' but
> does not have 'content'. Of course this is
> incorrect HTML. But I found some pages at web :)
> 
> simply add check routine at 'tag_handle_meta' function.

Thanks for the report; this patch should fix the bug:

2002-02-01  Hrvoje Niksic  <[EMAIL PROTECTED]>

* html-url.c (tag_handle_meta): Don't crash on  where content is missing.

Index: src/html-url.c
===
RCS file: /pack/anoncvs/wget/src/html-url.c,v
retrieving revision 1.23
diff -u -r1.23 html-url.c
--- src/html-url.c  2001/12/19 01:15:34 1.23
+++ src/html-url.c  2002/02/01 03:32:55
@@ -521,10 +521,13 @@
 get to the URL.  */
 
   struct urlpos *entry;
-
   int attrind;
-  char *p, *refresh = find_attr (tag, "content", &attrind);
   int timeout = 0;
+  char *p;
+
+  char *refresh = find_attr (tag, "content", &attrind);
+  if (!refresh)
+   return;
 
   for (p = refresh; ISDIGIT (*p); p++)
timeout = 10 * timeout + *p - '0';



Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-21 Thread Ian Abbott

On 21 Jan 2002 at 14:56, Thomas Lussnig wrote:

> >Why not just open the wgetrc file in text mode using
> >fopen(name, "r") instead of "rb"? Does that introduce other
> >problems?
> I think it has to do with comments because the defeinition is that 
> starting with '#'  the rest of the line
> is ignored. And an line ends with '\n' or the end of the file and not 
> with and spezial charakter '\0' that
> mean for me that to abort the reading of an textfile when zero isfound 
> mean's incorrect parsing.

(N.B. the control-Z character would be '\032', not '\0'.)

So maybe just mention in the documentation that the wgetrc file is
considered to be a plain text file, whatever that means for the
system Wget is running on. Maybe mention peculiaries of
DOS/Windows, etc.

In general, it is more portable to read or write native text files
in text mode as it performs whatever local conversions are
necessary to make reads and writes of text files appear like UNIX
i.e. each line of text terminated by a newline '\n'). In binary
mode, what you get depends on the system (Mac text files have lines
terminated by carriage return ('\r') for example, and some systems
(VMS?) don't even have line termination characters as such.)

In the case of Wget, log files are already written in text mode. I
think wgetrc needs to be read in text mode and that's an easy
change.

In the case of the --input-file option, ideally the input file
should be read in text mode unless the --force-html option is used,
in which case it should be read in the same mode as when parsing
other locally-stored HTML files.

Wget stores retrieved files in binary mode but the mode used when
reading those locally-stored files is less precise (not that it
makes much difference for UNIX). It uses open() (not fopen()) and
read() to read those files into memory (or uses mmap() to map them
into memory space if supported). The DOS/Windows version of open()
allows you to specify text or binary mode, defaulting to text mode,
so it looks like the Windows version of Wget saves html files in
binary mode and reads them back in in text mode! Well whatever -
the HTML parser still seems to work okay on Windows, probably
because HTML isn't that fussy about line-endings anyway!

So to support --input-file portably (not the --force-html version),
the get_urls_file() function in url.c should probably call a new
function read_file_text() (or read_text_file() instead of
read_file() as it does at the moment. For UNIX-type systems, that
could just fall back to calling read_file().

The local HTML file parsing stuff should probably be left well
alone but possibly add some #ifdef code for Windows to open the
file in binary mode, though there may be differences between
compilers for that.




Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-21 Thread Andre Majorel

On 2002-01-21 18:53 +0100, Hrvoje Niksic wrote:
> "Ian Abbott" <[EMAIL PROTECTED]> writes:
> 
> > Why not just open the wgetrc file in text mode using fopen(name,
> > "r") instead of "rb"? Does that introduce other problems?
> 
> Not that I'm aware of.  The reason we use "rb" now is the fact that we
> handle the EOL problem ourselves, and it seems "safer" to open the
> file in binary mode and get the real contents.

Back in my DOS days, my personal party line was to fopen all
text files in "r" mode, detect EOL by comparing with '\n' and
otherwise ignore anything that verifies isspace(). It took care of
the ^Z problem, and the code worked well on both DOS and Unix
without any #ifdefs.

-- 
André Majorel http://www.teaser.fr/~amajorel/>
std::disclaimer ("Not speaking for my employer");



Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-21 Thread Hrvoje Niksic

"Ian Abbott" <[EMAIL PROTECTED]> writes:

> Why not just open the wgetrc file in text mode using fopen(name,
> "r") instead of "rb"? Does that introduce other problems?

Not that I'm aware of.  The reason we use "rb" now is the fact that we
handle the EOL problem ourselves, and it seems "safer" to open the
file in binary mode and get the real contents.



Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-21 Thread Thomas Lussnig

>
>
>>>WGet returns an error message when the .wgetrc file is terminated
>>>with an MS-DOS end-of-file mark (Control-Z). MS-DOS is the
>>>command-line language for all versions of Windows, so ignoring the
>>>end-of-file mark would make sense.
>>>
>>Ouch, I never thought of that.  Wget opens files in binary mode and
>>handles the line termination manually -- but I never thought to handle
>>^Z.
>>
>
>Why not just open the wgetrc file in text mode using
>fopen(name, "r") instead of "rb"? Does that introduce other
>problems?
>
>In the Windows C compilers I've tried (Microsoft and Borland ones),
>"r" causes the file to be opened in text mode by default (there are
>ways to override that at compile time and/or run time), and this
>causes the ^Z to be treated as an EOF (there might be ways to
>override that too).
>
I think it has to do with comments because the defeinition is that 
starting with '#'  the rest of the line
is ignored. And an line ends with '\n' or the end of the file and not 
with and spezial charakter '\0' that
mean for me that to abort the reading of an textfile when zero isfound 
mean's incorrect parsing.

Cu Thomas Lußnig




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-21 Thread Ian Abbott

On 17 Jan 2002 at 2:15, Hrvoje Niksic wrote:

> Michael Jennings <[EMAIL PROTECTED]> writes:
> > WGet returns an error message when the .wgetrc file is terminated
> > with an MS-DOS end-of-file mark (Control-Z). MS-DOS is the
> > command-line language for all versions of Windows, so ignoring the
> > end-of-file mark would make sense.
> 
> Ouch, I never thought of that.  Wget opens files in binary mode and
> handles the line termination manually -- but I never thought to handle
> ^Z.

Why not just open the wgetrc file in text mode using
fopen(name, "r") instead of "rb"? Does that introduce other
problems?

In the Windows C compilers I've tried (Microsoft and Borland ones),
"r" causes the file to be opened in text mode by default (there are
ways to override that at compile time and/or run time), and this
causes the ^Z to be treated as an EOF (there might be ways to
override that too).



Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-18 Thread Hrvoje Niksic

Michael Jennings <[EMAIL PROTECTED]> writes:

> However, I have a comment: There is simple logic that would solve
> this problem. WGet, when it reads a line in the configuration file,
> probably now strips off trailing spaces (hex 20, decimal 32). I
> suggest that it strip off both trailing spaces and control
> characters (characters with hex values of 1F or less, decimal values
> of 31 or less). This is a simple change that would work in all
> cases.

The problem here is that I don't want Wget to randomly strip off
characters from its input.  Although the control characters are in
most cases a sign of corruption, I don't want Wget to be the judge of
that.

Wget currently has clearly defined parsing process: strip whitespaces
at the beginning and end of line, and around the `=' token.  Stripping
all the control characters would IMHO be a very random thing to do.

If I implemented the support for ^Z, I'd only strip it if it occurred
at the end of file, and that's somewhat harder.



RE: Bug report: 1) Small error 2) Improvement to Manual

2002-01-17 Thread Herold Heiko

> From: Michael Jennings [mailto:[EMAIL PROTECTED]]
> Obviously, this is completely your decision. You are right, 
> only DOS editors make the mistake. (It should be noted that 
> DOS is MS Windows only command line language. It isn't going 
> away; even Microsoft supplies command line utilities with all 
> versions of its OSs. Yes, Windows will probably eventually go 

Please note the difference: all windows versions include a command line.
However that commandline afaik is not dos - it is able to run dos
programs, either because based on dos (win 9x) or because capable of
understanding the difference between w32 commandline programs and dos
programs, and starting the neccessary dos *emulation*. But it is not
dos, and the behaviour is not like dos.
As far as I know, windows command line programs do not use ^Z as
end-of-file terminators (although some do honour it for
emulation/compatibility), only real dos programs do (anybody knows if
there is a - MS - standard for this ?). If this is true, should wget on
windows really emulate the behaviour of dos programs, of a environment
windows originally was based on but where it is *not*running*anymore*
(wget I mean) ? From a purists point of view, not. From a end-user point
of view, possibly in order to facilitate the changeover.
On the other hand, your report is the first one I ever saw, considering
Hrvoje's reaction and the lack of support in the original windows port
I'd say this is not a problem generally felt as important, so personally
I'm in favor of not cluttering up the port anymore with special
behaviour. But it is Hrvoje's decsion, as always.
If you feel it is important write a patch and submit it, shouldn't be a
major piece of work.
 
Heiko

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907087
-- ITALY



Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-17 Thread Michael Jennings

-


Obviously, this is completely your decision. You are right, only DOS editors make the 
mistake. (It should be noted that DOS is MS Windows only command line language. It 
isn't going away; even Microsoft supplies command line utilities with all versions of 
its OSs. Yes, Windows will probably eventually go away, but not soon.)

However, I have a comment: There is simple logic that would solve this problem. WGet, 
when it reads a line in the configuration file, probably now strips off trailing 
spaces (hex 20, decimal 32). I suggest that it strip off both trailing spaces and 
control characters (characters with hex values of 1F or less, decimal values of 31 or 
less). This is a simple change that would work in all cases.

Regards,

Michael


__


Hrvoje Niksic wrote:

> Herold Heiko <[EMAIL PROTECTED]> writes:
>
> > My personal idea is:
> > As a matter of fact no *windows* text editor I know of, even the
> > supplied windows ones (notepad, wordpad) AFAIK will add the ^Z at the
> > end of file.txt. Wget is a *windows* program (although running in
> > console mode), not a *Dos* program (except for the real dos port I know
> > exists but never tried out).
> >
> > So personally I'd say it would not be really neccessary adding support
> > for the ^Z, even in the win32 port;
>
> That was my line of thinking too.




Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-17 Thread Hrvoje Niksic

Herold Heiko <[EMAIL PROTECTED]> writes:

> My personal idea is:
> As a matter of fact no *windows* text editor I know of, even the
> supplied windows ones (notepad, wordpad) AFAIK will add the ^Z at the
> end of file.txt. Wget is a *windows* program (although running in
> console mode), not a *Dos* program (except for the real dos port I know
> exists but never tried out).
> 
> So personally I'd say it would not be really neccessary adding support
> for the ^Z, even in the win32 port;

That was my line of thinking too.



RE: Bug report: 1) Small error 2) Improvement to Manual

2002-01-17 Thread csaba . raduly


On 17/01/2002 07:34:05 Herold Heiko wrote:
[proper order restored]
>> -Original Message-
>> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]]
>> Sent: Thursday, January 17, 2002 2:15 AM
>> To: Michael Jennings
>> Cc: [EMAIL PROTECTED]
>> Subject: Re: Bug report: 1) Small error 2) Improvement to Manual
>>
>>
>> Michael Jennings <[EMAIL PROTECTED]> writes:
>>
>> > 1) There is a very small bug in WGet version 1.8.1. The bug occurs
>> >when a .wgetrc file is edited using an MS-DOS text editor:
>> >
>> > WGet returns an error message when the .wgetrc file is terminated
>> > with an MS-DOS end-of-file mark (Control-Z). MS-DOS is the
>> > command-line language for all versions of Windows, so ignoring the
>> > end-of-file mark would make sense.
>>
>> Ouch, I never thought of that.  Wget opens files in binary mode and
>> handles the line termination manually -- but I never thought to handle
>> ^Z.
>>
>> As much as I'd like to be helpful, I must admit I'm loath to encumber
>> the code with support for this particular thing.  I have never seen it
>> before; is it only an artifact of DOS editors, or is it used on
>> Windows too?
>>


[snip "copy con file.txt"]
>
>However in this case (at least when I just tried) the file won't contain
>the ^Z. OTOH some DOS programs still will work on NT4, NT2k and XP, and
>could be used, and would create files ending with ^Z. But do they really
>belong here and should wget be bothered ?
>
>What we really need to know is:
>
>Is ^Z still a valid, recognized character indicating end-of-file (for
>textmode files) for command shell programs on windows NT 4/2k/Xp ?
>Somebody with access to the *windows standards* could shed more light on
>this question ?
>
>My personal idea is:
>As a matter of fact no *windows* text editor I know of, even the
>supplied windows ones (notepad, wordpad) AFAIK will add the ^Z at the
>end of file.txt. Wget is a *windows* program (although running in
>console mode), not a *Dos* program (except for the real dos port I know
>exists but never tried out).
>

I don't think there's a distinction between DOS and Windows programs
in this regard. The C runtime library is most likely to play a
significant role here. For a file fopen-ed in "rt" mode, teh RTL
would convert \r\n -> \n and silently eat the _first_ ^Z,
returning EOF at that point.

When writing, it goes the other way 'round WRT \n->\r\n.
I'm unsure about whether it writes ^Z at the end, though.

>So personally I'd say it would not be really necessary adding support
>for the ^Z, even in the win32 port; except possibly for the Dos port, if
>the porter of that beast thinks it would be useful.
>

Problem could be solved by opening .netrc in "rt"
However, the "t" is a non-standard extension.

However, this is not wget's problem IMO. Different editors may behave
differently. Example: on OS/2 (which isn't a DOS shell, but can run
DOS programs), the system editor (e.exe) *does* append a ^Z at the end
of every file it saves. People have patched the binary to remove this
feature :-) AFAIK no other OS/2 editor does this.


--
Csaba Ráduly, Software Engineer   Sophos Anti-Virus
email: [EMAIL PROTECTED]http://www.sophos.com
US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933




RE: Bug report: 1) Small error 2) Improvement to Manual

2002-01-16 Thread Herold Heiko

Unfortunately every version of W9x can (with some kind of mind - please
don't start religious wars here) be considered a shell (nice, horrible,
choose what you prefer) around some kind of dos. From Win NT 4 upwards
that isn't true anymore, but for (some) compatibilities sake there are
many parallelisms which do emulate partly the behaviour of the old dos
environment.

For example, in order to rapidly create a (small file) on nt4 I still
can

C:\tmp>copy con some.file
some garbage
^Z
1 file copiato(i).

which is just like cat >file ... ^D, the difference being con a special
not-exactly-file somewhat similar to /dev/tty on unix.

However in this case (at least when I just tried) the file won't contain
the ^Z. OTOH some dos programs still will work on NT4, NT2k and XP, and
could be used, and would create files ending with ^Z. But do they really
belong here and should wget be bothered ?

What we really need to know is:

Is ^Z still a valid, recognized character indicating end-of-file (for
textmode files) for command shell programs on windows NT 4/2k/Xp ?
Somebody with access to the *windows standards* could shed more light on
this question ?

My personal idea is:
As a matter of fact no *windows* text editor I know of, even the
supplied windows ones (notepad, wordpad) AFAIK will add the ^Z at the
end of file.txt. Wget is a *windows* program (although running in
console mode), not a *Dos* program (except for the real dos port I know
exists but never tried out).

So personally I'd say it would not be really neccessary adding support
for the ^Z, even in the win32 port; except possibly for the Dos port, if
the porter of that beast thinks it would be usefull.

Heiko

-- 
-- PREVINET S.p.A.[EMAIL PROTECTED]
-- Via Ferretto, 1ph  x39-041-5907073
-- I-31021 Mogliano V.to (TV) fax x39-041-5907087
-- ITALY

> -Original Message-
> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, January 17, 2002 2:15 AM
> To: Michael Jennings
> Cc: [EMAIL PROTECTED]
> Subject: Re: Bug report: 1) Small error 2) Improvement to Manual
> 
> 
> Michael Jennings <[EMAIL PROTECTED]> writes:
> 
> > 1) There is a very small bug in WGet version 1.8.1. The bug occurs
> >when a .wgetrc file is edited using an MS-DOS text editor:
> > 
> > WGet returns an error message when the .wgetrc file is terminated
> > with an MS-DOS end-of-file mark (Control-Z). MS-DOS is the
> > command-line language for all versions of Windows, so ignoring the
> > end-of-file mark would make sense.
> 
> Ouch, I never thought of that.  Wget opens files in binary mode and
> handles the line termination manually -- but I never thought to handle
> ^Z.
> 
> As much as I'd like to be helpful, I must admit I'm loath to encumber
> the code with support for this particular thing.  I have never seen it
> before; is it only an artifact of DOS editors, or is it used on
> Windows too?
> 



Re: Bug report: 1) Small error 2) Improvement to Manual

2002-01-16 Thread Hrvoje Niksic

Michael Jennings <[EMAIL PROTECTED]> writes:

> 1) There is a very small bug in WGet version 1.8.1. The bug occurs
>when a .wgetrc file is edited using an MS-DOS text editor:
> 
> WGet returns an error message when the .wgetrc file is terminated
> with an MS-DOS end-of-file mark (Control-Z). MS-DOS is the
> command-line language for all versions of Windows, so ignoring the
> end-of-file mark would make sense.

Ouch, I never thought of that.  Wget opens files in binary mode and
handles the line termination manually -- but I never thought to handle
^Z.

As much as I'd like to be helpful, I must admit I'm loath to encumber
the code with support for this particular thing.  I have never seen it
before; is it only an artifact of DOS editors, or is it used on
Windows too?



Re: Bug? (not following 302's, or following them incorrectly)

2002-01-13 Thread Hrvoje Niksic

Brendan Ragan <[EMAIL PROTECTED]> writes:

> This is the problem i'm having with an older wget (1.5.3) when i
> enter the url
> 
> 'http://www.tranceaddict.com/cgi-bin/songout.php?id=1217-dirty_dirty&month=dec'
> 
> it goes
> 
> Connecting to www.tranceaddict.com:80... connected!
> HTTP request sent, awaiting response... 302 Found
> Location: 
>http://vid1.tranceaddict.com/mp3/singles/dec/Dirty-Dirty-(Original_Mix)-XP-www_tranceaddict_com.mp3
> [following]  
> 
> and dutifully retrieves the file.
> On a newer version (1.7) it goes
> 
> Connecting to www.tranceaddict.com:80... connected!
> HTTP request sent, awaiting response... 302 Moved Temporarily
> Location: http://www.yahoo.com [following]

www.tranceaddict.com does not appear to be available at the moment, so
I can't repeat this myself.  Here are several hints how to provide
more data:

* Try the same URL with the latest version of Wget, which is 1.8.1.
  Many bugs have been fixed; perhaps this is one of them.

* Use the `-d' switch to provide the debug output for both the
  successful 1.5.3 run and the unsuccessful 1.8.1 run, and mail both
  outputs here.  That way we'll have a chance to determine what 1.8.1
  is doing wrong.



Re: Bug if current folder don't existe

2001-12-29 Thread Hrvoje Niksic

Jean-Edouard BABIN <[EMAIL PROTECTED]> writes:

> I found a little bug when we download from an deleted directory:
[...]

Thanks for the report.

I wouldn't consider it a real bug.  Downloading things into a deleted
directory is bound to produce all kinds of problems.

The diagnostic message could perhaps be improved, but I don't consider
the case of downloading into deleted directories to be all that
frequent.  The IO code is always hard, and diagnostics will never be
completely in sync with reality.



Re: bug in wget 1.8

2001-12-17 Thread Hrvoje Niksic

Vladimir Volovich <[EMAIL PROTECTED]> writes:

> while downloading some file (via http) with wget 1.8, i got an error:
> 
> assertion failed: p - bp->buffer <= bp->width, file progress.c, line 673
> Abort (core dumped)

Thanks for the report.  It's a known problem in 1.8, fixed by this
patch.

Index: src/progress.c
===
RCS file: /pack/anoncvs/wget/src/progress.c,v
retrieving revision 1.21
retrieving revision 1.22
diff -u -r1.21 -r1.22
--- src/progress.c  2001/12/09 01:24:40 1.21
+++ src/progress.c  2001/12/09 04:51:40 1.22
@@ -647,7 +647,7 @@
/* Hours not printed: pad with three spaces (two digits and
   colon). */
APPEND_LITERAL ("   ");
-  else if (eta_hrs >= 10)
+  else if (eta_hrs < 10)
/* Hours printed with one digit: pad with one space. */
*p++ = ' ';
   else



Re: bug with cookies being filed using proxy host, not real host

2001-12-13 Thread Hrvoje Niksic

[EMAIL PROTECTED] writes:

> I use a proxy server, and have a line in my .wgetrc that says
> something like:

What version of Wget are you using?  I believe this bug has been fixed
in Wget 1.7.1 and later.

By the way, your analysis is correct.



Re: Bug report

2001-12-13 Thread Hrvoje Niksic

Pavel Stepchenko <[EMAIL PROTECTED]> writes:

> Hello bug-wget,
> 
> $ wget --version
> GNU Wget 1.8
> 
> $ wget 
>ftp://password:[EMAIL PROTECTED]:12345/Dir%20One/This.Is.Long.Name.Of.The.Directory/*
> Warning: wildcards not supported in HTTP.
> 
> Oooops! But this is FTP url, not HTTP!

Are you using a proxy?



Re: bug in wget rate limit feature

2001-12-10 Thread Hrvoje Niksic

<[EMAIL PROTECTED]> writes:

> Today I downloaded the new wget release (1.8) (I'm a huge fan of the
> util btw ;p ) and have been trying out the rate-limit feature.
[...]
> assertion "p - bp->buffer <= bp->width" failed: file "progress.c",
> line 673

Thanks for the report.  The bug shows with downloads whose ETA is 10
or more hours, and is trivially fixed by this patch, already applied
to the CVS:

Index: progress.c
===
RCS file: /pack/anoncvs/wget/src/progress.c,v
retrieving revision 1.21
retrieving revision 1.22
diff -u -r1.21 -r1.22
--- progress.c  2001/12/09 01:24:40 1.21
+++ progress.c  2001/12/09 04:51:40 1.22
@@ -647,7 +647,7 @@
/* Hours not printed: pad with three spaces (two digits and
   colon). */
APPEND_LITERAL ("   ");
-  else if (eta_hrs >= 10)
+  else if (eta_hrs < 10)
/* Hours printed with one digit: pad with one space. */
*p++ = ' ';
   else



Re: Bug: Wget 1.7 (Red Hat 7.2 dist wget-1.7-3)

2001-12-05 Thread Hrvoje Niksic

"William H. Gilmore" <[EMAIL PROTECTED]> writes:

> I have recently tripped across a bug with the version of wget shipped
> with RedHat 7.2.  When I attempt to recursively retrieve a web tree
> starting with an html link that contains a base href, wget apparently
> limits all href to base href even if another absolute path is
> specified.  You can verify this with the following command.

You didn't provide the command.  And I'm not exactly sure what you
mean by "limits all href to base href".  The way base href works is,
every URL gets merged with the base href URL.  The merging process
should correctly handle absolute paths.

For instance if base href is "http://www.server.com/foo/";, the URL
"/bar/index.html" will be merged as
"http://www.server.com/bar/index.html";, i.e. the initial slash in the
URL overrode the "foo" part of the base URL.

> I cannot provide you with the site that I identified the problem
> with because of security reasons.

Understood.  It could still be possible to make a minimum Wget run
that demonstrate the problem with `-d -o log'.  After that, `log'
should contain a full debugging dump of the download.  Replace your
site name with "www.server.com", and the identity of your site should
be protected.



RE: bug?

2001-11-22 Thread Ian Abbott

On 22 Nov 2001, at 14:49, Tomas Hjelmberg wrote:

> Thanks!
> I see, but then, how to exclude from being downloaded per file-basis?

Put the following in the /robots.txt on your website

User-agent: *
Disallow: /tomas.html

See  for more 
info.




Re: bug?

2001-11-22 Thread Hrvoje Niksic

Tomas Hjelmberg <[EMAIL PROTECTED]> writes:

> Isn't it a good idea to have an option to forbid wget to download files with
> the tag:
> 

But how will Wget know not to download those files unless it has
already downloaded and inspected them?  Or, do you mean that they
should be deleted afterwards?



RE: bug?

2001-11-22 Thread Tomas Hjelmberg

Isn't it a good idea to have an option to forbid wget to download files with
the tag:


-Original Message-
From: Jens Roesner [mailto:[EMAIL PROTECTED]]
Sent: 23 November 2001 04:09
To: Tomas Hjelmberg
Cc: Wget List
Subject: Re: bug?


Hi Tomas!

> I see, but then, how to exclude from being downloaded per file-basis?
First, let me be a smartass:
Go to 
http://www.acronymfinder.com
and lokk up 
RTFM
Then, proceed to the docs of wget.
wget offers download restrictions on
host, directory and file name.
Search in the docs for
-H
-D
--exclude-domains
`-A ACCLIST' `--accept ACCLIST' `accept = ACCLIST'
`-R REJLIST' `--reject REJLIST' `reject = REJLIST'
`-I LIST' `--include LIST' `include_directories = LIST'
`-X LIST' `--exclude LIST' `exclude_directories = LIST'

CU
Jens

http://www.jensroesner.de/wgetgui



RE: bug?

2001-11-22 Thread Tomas Hjelmberg

Good idea, but how do you do that on a per file basis,
if the files has nothing in common and it is thousands of them?

-Original Message-
From: Jens Roesner [mailto:[EMAIL PROTECTED]]
Sent: 23 November 2001 04:09
To: Tomas Hjelmberg
Cc: Wget List
Subject: Re: bug?


Hi Tomas!

> I see, but then, how to exclude from being downloaded per file-basis?
First, let me be a smartass:
Go to 
http://www.acronymfinder.com
and lokk up 
RTFM
Then, proceed to the docs of wget.
wget offers download restrictions on
host, directory and file name.
Search in the docs for
-H
-D
--exclude-domains
`-A ACCLIST' `--accept ACCLIST' `accept = ACCLIST'
`-R REJLIST' `--reject REJLIST' `reject = REJLIST'
`-I LIST' `--include LIST' `include_directories = LIST'
`-X LIST' `--exclude LIST' `exclude_directories = LIST'

CU
Jens

http://www.jensroesner.de/wgetgui



Re: bug?

2001-11-22 Thread Jens Roesner

Hi Tomas!

> I see, but then, how to exclude from being downloaded per file-basis?
First, let me be a smartass:
Go to 
http://www.acronymfinder.com
and lokk up 
RTFM
Then, proceed to the docs of wget.
wget offers download restrictions on
host, directory and file name.
Search in the docs for
-H
-D
--exclude-domains
`-A ACCLIST' `--accept ACCLIST' `accept = ACCLIST'
`-R REJLIST' `--reject REJLIST' `reject = REJLIST'
`-I LIST' `--include LIST' `include_directories = LIST'
`-X LIST' `--exclude LIST' `exclude_directories = LIST'

CU
Jens

http://www.jensroesner.de/wgetgui



RE: bug?

2001-11-22 Thread Tomas Hjelmberg

Thanks!
I see, but then, how to exclude from being downloaded per file-basis?

-Original Message-
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]]
Sent: 22 November 2001 14:45
To: Wget List; Tomas Hjelmberg
Subject: Re: bug?


Tomas Hjelmberg <[EMAIL PROTECTED]> writes:

> I want to exclude /var/www/html/tomas.html from being indexed.
> It looks like:
[...]
>   
>   Tomas
[...]
> 
> I invoke wget with:
> wget -r http://localhost
> And tomas.html is unfourtnuately downloaded anyway...

Wget doesn't reallly "index" anything, so it pretty much ignores
`noindex'.  You can specify "nofollow" in which case Wget will refuse
to recurse into the document.



Re: bug?

2001-11-22 Thread Hrvoje Niksic

Tomas Hjelmberg <[EMAIL PROTECTED]> writes:

> I want to exclude /var/www/html/tomas.html from being indexed.
> It looks like:
[...]
>   
>   Tomas
[...]
> 
> I invoke wget with:
> wget -r http://localhost
> And tomas.html is unfourtnuately downloaded anyway...

Wget doesn't reallly "index" anything, so it pretty much ignores
`noindex'.  You can specify "nofollow" in which case Wget will refuse
to recurse into the document.



RE: bug?

2001-11-22 Thread Tomas Hjelmberg

Thanks for the answer!
What information?
I just gave some in my former mail.
Im using wget 1.7.1 on Linux with Apache WS.

-Original Message-
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]]
Sent: 22 November 2001 14:37
To: Wget List
Subject: Re: bug?


Tomas Hjelmberg <[EMAIL PROTECTED]> writes:

> Sorry, but can't anybody say at least that I'm wrong when I state that the

>  tag doesn't work?
> Has anyone got it to work under any circumstances?

These are two different questions.

The answer to the first one is irrelevant, because even if it works
for everyone else, it might still fail to work for you due to a bug
that gets tripped by your case.

The answer to the second one is: yes, when I was testing the code
after having written it.

As I said before, and Daniel reiterated, you'll need to give us more
info about what goes wrong for us to be able to help you.



Re: bug?

2001-11-22 Thread Hrvoje Niksic

Tomas Hjelmberg <[EMAIL PROTECTED]> writes:

> Sorry, but can't anybody say at least that I'm wrong when I state that the 
>  tag doesn't work?
> Has anyone got it to work under any circumstances?

These are two different questions.

The answer to the first one is irrelevant, because even if it works
for everyone else, it might still fail to work for you due to a bug
that gets tripped by your case.

The answer to the second one is: yes, when I was testing the code
after having written it.

As I said before, and Daniel reiterated, you'll need to give us more
info about what goes wrong for us to be able to help you.



RE: bug?

2001-11-22 Thread Tomas Hjelmberg

I want to exclude /var/www/html/tomas.html from being indexed.
It looks like:




Tomas


http://www.blowfish:8080";>blowfish
bu



I invoke wget with:
wget -r http://localhost
And tomas.html is unfourtnuately downloaded anyway...

-Original Message-
From: Daniel Stenberg [mailto:[EMAIL PROTECTED]]
Sent: 22 November 2001 13:15
To: Tomas Hjelmberg
Cc: Wget List
Subject: Re: bug?


On Thu, 22 Nov 2001, Tomas Hjelmberg wrote:

> Sorry, but can't anybody say at least that I'm wrong when I state that the
>  tag doesn't work?
> Has anyone got it to work under any circumstances?

Yes:

- State what you want to do.
- Describe how you do it.
- Describe what happens.
- Describe what you expected to happen.

Provide/show the HTML in the above.

-- 
  Daniel Stenberg - http://daniel.haxx.se - +46-705-44 31 77
   ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol



Re: bug?

2001-11-22 Thread Daniel Stenberg

On Thu, 22 Nov 2001, Tomas Hjelmberg wrote:

> Sorry, but can't anybody say at least that I'm wrong when I state that the
>  tag doesn't work?
> Has anyone got it to work under any circumstances?

Yes:

- State what you want to do.
- Describe how you do it.
- Describe what happens.
- Describe what you expected to happen.

Provide/show the HTML in the above.

-- 
  Daniel Stenberg - http://daniel.haxx.se - +46-705-44 31 77
   ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol




Re: Bug in wget 1.7

2001-10-04 Thread Ian Abbott

On 3 Oct 2001, at 20:07, Thomas Preymesser wrote:

> I have discovered a bug in wget 1.7
[snip]
> wget -d -r -l 1 www.lehele.de 
[snip]
> Loaded www.lehele.de/index.html (size 3830).
> Speicherzugriffsfehler (core dumped)
>
> 
> The file index.html is saved an complete in directory www.lehele.de.
> If I call wget without recursion then everything is ok, but when i try 
> to go deeper wget is crashing.

This bug appears to been fixed around the end of June in the CVS 
repository.



Re: Bug in wget 1.7

2001-10-03 Thread Daniel Stenberg

On Wed, 3 Oct 2001, Thomas Preymesser wrote:

> The file index.html is saved an complete in directory www.lehele.de. If I
> call wget without recursion then everything is ok, but when i try to go
> deeper wget is crashing.

It would probably help a lot if you could do

'gdb /path/to/wget core'

and then type

'where' to display the stack trace.

If you've built without the -g (debug) option, please rebuild it and try
again, as that will give lots of more details in the stack trace dump.

-- 
  Daniel Stenberg - http://daniel.haxx.se - +46-705-44 31 77
   ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol




Re: Bug repport: Stray and character constant too long

2001-09-10 Thread Ian Abbott

On 9 Sep 2001, at 14:33, Thomas Dyhr wrote:

> I was trying to do a "make" on PowerPC cube 450mhz runing Mac OS X 10.0.4
> (powerpc-apple-darwin1.3.7) when I received the following error message:

[snip, as we've seen this a few times already!]

In previous messages on this subject, it has been determined that the 
culprit is Apple's broken C pre-processor (cpp-precomp).

Considered opinion of this email list offers two different solutions:

1) Change line 435 of html-parse.c to:

assert (ch == '\'' || ch == '\"');

   (i.e. add a \ before the ")

2) Use the following command to configure the Makefile:

CPPFLAGS="-no-cpp-precomp" ./configure

The first fix avoids the bug in Apple's C pre-processor. The second 
fix avoids Apple's C pre-processor altogether in favor of the GNU 
one.




Re: bug in DO_REALLOC_FROM_ALLOCA

2001-06-26 Thread Hrvoje Niksic

"T. Bharath" <[EMAIL PROTECTED]> writes:

> void *drfa_new_basevar = xmalloc (do_realloc_newsize);  \
>memcpy (drfa_new_basevar, basevar, sizevar);\
> 
> we are trying to copy  sizevar bytes of  basevar to drfa_new_basevar.One
> thing to notice here is basevar is not of size
> sizevar now since we have  incremented  sizevar to a newsize.

Thanks a lot for the report and the analysis.  The fix is relatively
straightforward: just make sure that SIZEVAR is incremented only after
we no longer need the old value.

2001-06-26  Hrvoje Niksic  <[EMAIL PROTECTED]>

* wget.h (DO_REALLOC_FROM_ALLOCA): Set SIZEVAR after the memcpy()
call because it needs the old value.

Index: src/wget.h
===
RCS file: /pack/anoncvs/wget/src/wget.h,v
retrieving revision 1.23
diff -u -r1.23 wget.h
--- src/wget.h  2001/05/27 19:35:15 1.23
+++ src/wget.h  2001/06/26 08:42:20
@@ -236,7 +236,6 @@
 do_realloc_newsize = 2*(sizevar);  \
 if (do_realloc_newsize < 16)   \
   do_realloc_newsize = 16; \
-(sizevar) = do_realloc_newsize;\
   }\
   if (do_realloc_newsize)  \
 {  \
@@ -249,6 +248,7 @@
  (basevar) = drfa_new_basevar; \
  allocap = 0;  \
}   \
+  (sizevar) = do_realloc_newsize;  \
 }  \
 } while (0)
 



Re: BUG: -nd still create directories

2001-06-18 Thread Hrvoje Niksic

Keh-Chen Lau <[EMAIL PROTECTED]> writes:

> I have installed the lastest version of wget (1.7) but found that
> a bug which is not existing in 1.5.3
> 
>If I download a file VIA this URL
> 
>http://www.foo.bar.com/server/download/abc.zip?fn=/public/abc.zip
> 
>with the '-nd' option, it still create the following folders
> 
>'abc.zip?fn=' and 'public' and then store abc.zip inside it.

Thanks for the report -- good to have it before the 1.7.1 release.
Does this patch fix the problem for you?

2001-06-18  Hrvoje Niksic  <[EMAIL PROTECTED]>

* url.c (url_filename): Make sure that slashes that sneak in to
u->file via query string get protected.
(file_name_protect_query_string): New function.

Index: src/url.c
===
RCS file: /pack/anoncvs/wget/src/url.c,v
retrieving revision 1.45
diff -u -r1.45 url.c
--- src/url.c   2001/05/27 19:35:10 1.45
+++ src/url.c   2001/06/18 08:11:51
@@ -1030,6 +1030,38 @@
   return res;
 }
 
+/* Return a malloced copy of S, but protect any '/' characters. */
+
+static char *
+file_name_protect_query_string (const char *s)
+{
+  const char *from;
+  char *to, *dest;
+  int destlen = 0;
+  for (from = s; *from; from++)
+{
+  ++destlen;
+  if (*from == '/')
+   destlen += 2;   /* each / gets replaced with %2F, so
+  it adds two more chars.  */
+}
+  dest = (char *)xmalloc (destlen + 1);
+  for (from = s, to = dest; *from; from++)
+{
+  if (*from != '/')
+   *to++ = *from;
+  else
+   {
+ *to++ = '%';
+ *to++ = '2';
+ *to++ = 'F';
+   }
+}
+  assert (to - dest == destlen);
+  *to = '\0';
+  return dest;
+}
+
 /* Create a unique filename, corresponding to a given URL.  Calls
mkstruct if necessary.  Does *not* actually create any directories.  */
 char *
@@ -1048,7 +1080,20 @@
   if (!*u->file)
file = xstrdup ("index.html");
   else
-   file = xstrdup (u->file);
+   {
+ /* If the URL came with a query string, u->file will contain
+a question mark followed by query string contents.  These
+contents can contain '/' which would make us create
+unwanted directories.  These slashes must be protected
+explicitly.  */
+ if (!strchr (u->file, '/'))
+   file = xstrdup (u->file);
+ else
+   {
+ /*assert (strchr (u->file, '?') != NULL);*/
+ file = file_name_protect_query_string (u->file);
+   }
+   }
 }
 
   if (!have_prefix)



Re: Bug when converting links in wget 1.6

2001-03-30 Thread Dan Harkless


Hrvoje Niksic <[EMAIL PROTECTED]> writes:
> To be sure that *all* HTML files are handled, I think the addition
> needs to be triggered from within retrieve_url, say by calling a
> "register_html_file_for_conversion()".  I think I'll provide such a
> fix tonight.

Sounds good.  Wonder if it should be more generically-worded, though?  I can
certainly envision other reasons besides conversion where we'd want a list
of all HTML files downloaded.

> On a side note, I think it might be more useful to remove the
> first-step conversion as it seems to do cause more confusion than it
> benefits.

I'd tend to agree.

---
Dan Harkless| To help prevent SPAM contamination,
GNU Wget co-maintainer  | please do not mention this email
http://sunsite.dk/wget/ | address in Usenet posts -- thank you.



Re: Bug when converting links in wget 1.6

2001-03-30 Thread Hrvoje Niksic

Dan Harkless <[EMAIL PROTECTED]> writes:

> > --- src/recur.c Sun Dec 17 20:28:20 2000
> > +++ src/recur.c.new Sun Mar 25 20:25:12 2001
> > @@ -165,7 +165,18 @@
> >first_time = 0;
> >  }
> >else
> > +{
> > +u = newurl ();
> > +err = parseurl (this_url, u, 0);
> > +if (err == URLOK)
> > +  {
> > +ulist = add_slist (ulist, u->url, 0); /* ??? */
> > +   urls_downloaded = add_url (urls_downloaded, u->url, file);
> > +   urls_html = add_slist (urls_html, file, NOSORT);
> > +  }
> > +freeurl(u, 1);  
> >  ++depth;
> > +}
> >  
> >if (opt.reclevel != INFINITE_RECURSION && depth > opt.reclevel)
> >  /* We've exceeded the maximum recursion depth specified by the user. */
> 
> Thanks for the patch, Carsten.  Hopefully someone will have time to review
> it soon and apply it if it's appropriate.  
> 
> Does this look like the right fix, Hrvoje?

I'm not completely sure.  I suspect that it doesn't handle the HTML
files that are downloaded but not followed recursively.

To be sure that *all* HTML files are handled, I think the addition
needs to be triggered from within retrieve_url, say by calling a
"register_html_file_for_conversion()".  I think I'll provide such a
fix tonight.

On a side note, I think it might be more useful to remove the
first-step conversion as it seems to do cause more confusion than it
benefits.



Re: Bug when converting links in wget 1.6

2001-03-26 Thread Dan Harkless


Hi, Carsten.  In the future please send such mail to [EMAIL PROTECTED], as is
documented, NOT to me directly.

Carsten Mackenroth <[EMAIL PROTECTED]> writes:
> Hi,
> 
> wget 1.6 has a problem when called with --convert-links and multiple
> URLs. Something like
>   wget --timestamping --convert-links --backup-converted \
>   --page-requisites http://localhost/ http://localhost/site-docs/
> correctly fixes the links in localhost/index.html, but not in
> localhost/site-docs/index.html (both are directory listings with icons).
> 
> It seems that only the first URL is added to urls_html and downloaded_urls
> in src/recur.c:recursive_retrieve(), because only then first_time is true.
> Here is a patch that fixes this. It is mainly copied from the if(first_time)
> section and propably not a very clean solution (basedir ? err != URLOK ?),
> but at least it works in my case and can give you a hint where the problem
> is.

Funny timing -- we were just discussing this issue on the wget list (or
perhaps you read that)?

> --- src/recur.c   Sun Dec 17 20:28:20 2000
> +++ src/recur.c.new   Sun Mar 25 20:25:12 2001
> @@ -165,7 +165,18 @@
>first_time = 0;
>  }
>else
> +{
> +u = newurl ();
> +err = parseurl (this_url, u, 0);
> +if (err == URLOK)
> +  {
> +ulist = add_slist (ulist, u->url, 0); /* ??? */
> + urls_downloaded = add_url (urls_downloaded, u->url, file);
> + urls_html = add_slist (urls_html, file, NOSORT);
> +  }
> +freeurl(u, 1);
>  ++depth;
> +}
>  
>if (opt.reclevel != INFINITE_RECURSION && depth > opt.reclevel)
>  /* We've exceeded the maximum recursion depth specified by the user. */

Thanks for the patch, Carsten.  Hopefully someone will have time to review
it soon and apply it if it's appropriate.  

Does this look like the right fix, Hrvoje?

---
Dan Harkless| To help prevent SPAM contamination,
GNU Wget co-maintainer  | please do not mention this email
http://sunsite.dk/wget/ | address in Usenet posts -- thank you.



Re: bug at page requisites?

2001-03-26 Thread Dan Harkless


"Jaakko Paakkonen" <[EMAIL PROTECTED]> writes:
>   Hi
> 
> First of all, thanks for great piece of software. And I am sorry to address
> you directly but I don't know where else to turn to.

The right place to turn to, as is documented, is [EMAIL PROTECTED]  I'm
cc'ing this reply there.

> There seems to be a bug in -p option, or either recursive get, I don't know
> which. Given situation that a web page consists of several frames, which
> have pictures in them. If you are not getting page recursively, but only
> with page requisite option, the pictures inside the frames are not
> retrieved. The example I have is a company intranet page I tried out wget
> on, so I cannot send you and example, sorry.

This is not so much a bug as an oversight.  I am not a big fan of frames
(they're usually misused) and don't think about them all that much.  It
didn't occur to me that the "one more hop" wouldn't be enough if the first
page were a  page.

Don't have time to fix this at the moment (and I think it'll take more than
a trivial amount of coding because at the place where we increment the
recursive level, I don't believe we know that the last tag we followed was a
 tag).

In the meantime, you can get a frameset and all requisites for the frames by
simply adding "-r -l1".  For instance:

% wget -r -l1 -p http://www.site.tld/index_frames.html

The -r -l1 will go from the  page to the  pages, and the -p
will get the requisites for the  pages.

Anyway, thanks for bringing this up -- I'll update the documentation to note
that you need to do this when using -p on a  page.

---
Dan Harkless| To help prevent SPAM contamination,
GNU Wget co-maintainer  | please do not mention this email
http://sunsite.dk/wget/ | address in Usenet posts -- thank you.



Re: bug?

2001-03-08 Thread Hack Kampbjørn



Tobias Erle wrote:
> 
> Hi,
> 
> I'm using a 1.5.3 port for DOS and wget has a strange behavior to handle
> URLs with "&".

Note that vesion 1.5.3 is quite old. Version 1.6 was release December
31th last year. Look at the web-site (http://sunsite.dk/wget/) for more
information and yes there is a link to DOS binaries.

> 
> Every time when I use URLs like
> 
> 
>http://212.123.106.25/index.php3?partei_id=8&position=1900&sid=1b823ea0bf3c6f280f6ca7675b03caaa
> 
> wget answers:
> 
> |--16:43:50--  http://212.123.106.25:80/
> |index.php3?partei_id=8&position=1900&sid=1b823ea0bf3c6f280f6ca7675b03caaa
> |   => 
>`212.123.106.25/index.php3?partei_id=8&position=1900&sid=1b823ea0bf3c6f280f6ca7675b03caaa'
> |Connecting to 212.123.106.25:80... connected!
> |HTTP request sent, awaiting response... 302 Found
> |2 Date: Thu, 08 Mar 2001 15:47:29 GMT
> |3 Server: Apache/1.3.17 (Unix) PHP/4.0.4pl1
> |4 X-Powered-By: PHP/4.0.4pl1
> |5 Set-Cookie: PHPSESSID=5ebece8a8f9327ff64c3876466a6019c; path=/
> |6 Expires: Thu, 19 Nov 1981 08:52:00 GMT
> |7 Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
> |8 Pragma: no-cache
> |9 Set-Cookie: Cookies_ok=1; expires=Sat, 07-Apr-01 15:47:31 GMT
> |10 Location: 
>index.php3?cookie_ok=1&sid=1b823ea0bf3c6f280f6ca7675b03caaa&position=1900

NON standard redirect location 8-(

> |11 Connection: close
> |12 Content-Type: text/html
> |13
> |Location: index.php3?cookie_ok=1&sid=1b823ea0bf3c6f280f6ca7675b03caaa&position=1900 
[following]
> |index.php3?cookie_ok=1&sid=1b823ea0bf3c6f280f6ca7675b03caaa&position=1900: 
>Unknown/unsupported protocol.
> |
> |FINISHED --16:43:53--
> |Downloaded: 0 bytes in 0 files
> 
> Why I can't download this? What does means "Unknown/unsupported protocol"?
> Does wget don't download URLs with "&"?

This has nothing to do with the '&' in the URL. This is a "problem" with
the web-site itself that isn't complying with the HTTP standard. Now
since this way of doing redirects is quite common (maybe because most
browser supports it) support for this was added in wget version 1.6

> 
> --
> Erwerbsregel 33: Es schadet nie, den Boß auszubooten

-- 
Med venlig hilsen / Kind regards

Hack Kampbjørn   [EMAIL PROTECTED]
HackLine +45 2031 7799



Re: Bug in locale handling when using glibc-2.2.1

2001-01-21 Thread Hrvoje Niksic

"Jan D." <[EMAIL PROTECTED]> writes:

> If I change main.c to do setlocale(LC_ALL, "") instead of
> setlocale(LC_MESSAGES, ""), the problem goes away.

Not using LC_ALL is intentional.  The problem with LC_ALL was that it
affected other things (time and number representations, is* macros,
etc.)  Ulrich's terse reply doesn't give a hint *why* Wget should be
required to use LC_ALL.



<    1   2