Re: FAQ needed (was wget: relative link to non-relative)

2002-10-18 Thread Andre Majorel
On 2002-10-17 12:16 -0600, Daniel Webb wrote:

 Also, concerning the mailing list, I am not interested in using a kludgy
 web-based interface to an email archive.  Where are the mbox download
 links?

Amen.

-- 
André Majorel [EMAIL PROTECTED]
http://www.teaser.fr/~amajorel/



Possible bug : hosts spanned by default

2002-09-27 Thread Andre Majorel

I've just had a recursive wget do something unexpected : it
spanned hosts even though I didn't give the -H option. The command
was :

  wget -r -l20 http://www.modcan.com/page2.html

http://www.modcan.com/pg2_main.html contains a link to
www.paypal.com, and that link was followed.

That was Wget 1.8.2 (the 1.8.2-5 Debian package).

Have your ever seen this behaviour ?

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
std::disclaimer (Not speaking for my employer);



Re: wget tries to print the file prn.html

2002-09-24 Thread Andre Majorel

On 2002-09-20 08:15 +0200, Dominic Chambers wrote:

 I am using wget 1.82 on Win2K SP2, and wget froze on the fifth

1.8.2.

 downloaded file 'prn.html' using the command line:
 
 wget -r -l0 -A htm,html,png,gif,jpg,jpeg --no-parent
 http://java.sun.com/products/jlf/at/book
 
 About twenty seconds after it stops, I get Windows complaining that
 there is no available printer (I don't have one), and canceling the
 job does not cause wget to resume processing.

If I remember correctly (it's been a long time), DOS knows when
you're trying to access a device by looking at the *basename*
(minus path and extension) of the file. As of MS-DOS 6.22, the
list of reserved names was AUX, COM{1,2,3,4}, CON, LPT{1,2,3}, NUL
and PRN. I'm not sure what the list is in the various incarnations
of Windows, nor if it's set in stone (could new reserved names be
added by loading drivers ?).

-- 
André Majorel [EMAIL PROTECTED]
http://www.teaser.fr/~amajorel/



Re: Apology for absence

2002-07-25 Thread Andre Majorel

On 2002-07-26 01:59 +0200, Hrvoje Niksic wrote:

 Only the bare minimum of characters should be encoded.  The ones that
 come to mind are '/' (illegal), '~' (rm -r ~foo dangerous), '*' and
 '?' (used in wildcards), control characters 0-31 (controls), and chars
 128-159 (non-printable).

lobbying
While quoting / is mandatory, I'm not sure it's a good idea to
quote ~ * ? and control or non-ASCII characters. In fact, the
more I think of it, the more I'm convinced we shouldn't...
/lobbying

-- 
André Majorel [EMAIL PROTECTED]
http://www.teaser.fr/~amajorel/



HTML served over FTP

2002-07-18 Thread Andre Majorel

I'm trying to snarf a web site that is served over FTP. wget -r
doesn't work probably because Wget doesn't parse HTML documents
retrieved with FTP (which is reasonable).

Is there a sort of --follow-html option to force Wget to parse
HTML documents served over FTP and follow the links, as if they
came from HTTP ?

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
std::disclaimer (Not speaking for my employer);



Re: Feature Request: Stop on error from input url file list.

2002-07-01 Thread Andre Majorel

On 2002-06-29 21:09 -0400, Dang P. Tran wrote:

 I use the -i option to download files from an url list. The
 server I use have a password that change often. When I have a
 large list if the password change while I'm downloading and give
 401 error, I want wget stop to prevent hammering the site with
 bad password.

A workaround :

  $ echo '#!/bin/sh' wrapper
  $ echo 'wget $@ || kill $PPID' wrapper
  $ chmod +x wrapper
  $ xargs -n10 ./wrapper urllist

If for whatever reason Wget exits with a non-zero status, xargs
is killed. Thus the server will be hit at most 9 too many times.

-- 
André Majorel [EMAIL PROTECTED]
http://www.teaser.fr/~amajorel/



Re: wget and javascript links

2002-05-14 Thread Andre Majorel

On 2002-05-14 13:01 -0400, Kevin Murphy wrote:

 However, I am trying to suck a particular site which relies excessively 
 on javascript'ed links, e.g. via window.open, sometimes wrapped in 
 function calls.
 
 I realize that in general this an intractable problem, but is anybody 
 aware of a partial solution?

Some people expressed interest in having a Javascript
interpreter included in Wget but AFAIK no one actually did it.

Someone pointed out that Javascript code is often simple enough
that one could write a script to parse it and extract the links.

-- 
André Majorel [EMAIL PROTECTED]
http://www.teaser.fr/~amajorel/



Re: ScanMail Message: To Recipient virus found or matched file blocki ng setting.

2002-04-19 Thread Andre Majorel

On 2002-04-19 11:21 +0200, Hrvoje Niksic wrote:

 There are now fewer spams than there were (I know because I get the
 ones that get caught in the net), but we're not quite there yet.  We
 will be, though.

In case this is of any use to you, these procmail recipes block
at least 3/4 of the asian language spam I get :

:0
* ^Subject: .*±¤[-_ :]*°í
spam

:0
* ^Subject: =\?euc-kr\?
spam

:0
* ^Subject: =\?ks_c_5601-1987\?
spam

:0
* ^Subject: 
.*[æÃÁÆÇÏÎÑõÚýÝÞ±¹³º¼¾¥¶®·µ].*[æÃÁÆÇÏÎÑõÚýÝÞ±¹³º¼¾¥¶®·µ].*[æÃÁÆÇÏÎÑõÚýÝÞ±¹³º¼¾¥¶®·µ]
spam

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
std::disclaimer (Not speaking for my employer);



Re: Proposal for despamming the list

2002-04-14 Thread Andre Majorel

On 2002-04-14 05:00 +0200, Hrvoje Niksic wrote:

The moderators are informed about each message that awaits
moderation; that alert would contain a URL they can visit and
approve or reject the mail, at their discretion.

The web interface is not necessary. Listar, for instance, just
forwards the dubious mails to the moderator. Approving the
message is done by replying to listar (actually forwarding to
somelist-repost@somedomain, but you get the idea).

-- 
André Majorel [EMAIL PROTECTED]
http://www.teaser.fr/~amajorel/



Re: Current download speed in progress bar

2002-04-10 Thread Andre Majorel

On 2002-04-10 01:14 +0200, Hrvoje Niksic wrote:
 Andre Majorel [EMAIL PROTECTED] writes:
 
  If find it very annoying when a downloader plays yoyo with the
  remaining time. IMHO, remaining time is by nature a long term thing
  and short term jitter should not cause it to go up and down.
 
 Agreed wholeheartedly, but how would you *implement* a non-jittering
 ETA?

I'm not sure you can, but using the average speed will at least
low pass filter out most of the jittering.

 Do you think it makes sense the way 1.8.1 does it, i.e. to
 calculate the ETA from the average speed?

Yes.

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
std::disclaimer (Not speaking for my employer);



Re: Current download speed in progress bar

2002-04-09 Thread Andre Majorel

On 2002-04-09 20:51 +0200, Hrvoje Niksic wrote:

 The one remaining problem is the ETA.  Based on the current speed, it
 changes value wildly.  Of course, over time it is generally
 decreasing, but one can hardly follow it.  I removed the flushing by
 making sure that it's not shown more than once per second, but this
 didn't fix the problem of unreliable values.
 
 Should we revert to the average speed for ETA, or is there a smarter
 way to handle it?  What are other downloaders doing?

If find it very annoying when a downloader plays yoyo with the
remaining time. IMHO, remaining time is by nature a long term
thing and short term jitter should not cause it to go up and
down.

-- 
André Majorel [EMAIL PROTECTED]
http://www.teaser.fr/~amajorel/



Re: Referrer Faking and other nifty features

2002-04-03 Thread Andre Majorel

On 2002-04-03 08:50 -0500, Dan Mahoney, System Admin wrote:

   1) referrer faking (i.e., wget automatically supplies a referrer
   based on the, well, referring page)
 
  It is the --referer option, see (wget)HTTP Options, from the Info
  documentation.
 
 Yes, that allows me to specify _A_ referrer, like www.aol.com.  When I'm
 trying to help my users mirror their old angelfire pages or something like
 that, very often the link has to come from the same directory.  I'd like
 to see something where when wget follows a link to another page, or
 another image, it automatically supplies the URL of the page it followed
 to get there.  Is there a way to do this?

Somebody already asked for this and AFAICT, there's no way to do
that.

   3) Multi-threading.
 
  I suppose you mean downloading several URIs in parallel.  No, wget
  doesn't support that.  Sometimes, however, one may start several wget
  in parallel, thanks to the shell (the  operator on Bourne shells).
 
 No, I mean downloading multiple files from the SAME uri in parallel,
 instead of downloading files one-by-one-by-one (thus saving time on a fast
 pipe).

This doesn't make sense to me. When downloading from a single
server, the bottleneck is generally either the server or the link
; in either case, there's nothing to win by attempting several
simultaneous transfers. Unless there are several servers at the
same IP and the bottleneck is the server, not the link ?

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
std::disclaimer (Not speaking for my employer);



Re: OK, time to moderate this list

2002-03-22 Thread Andre Majorel

On 2002-03-22 04:08 +0100, Hrvoje Niksic wrote:

  May I suggest that you set a filter that prevents postings to the
  list unless the poster is a subscriber. That filter should forward
  the mail to the admins to allow them the pass the mail through if
  suitable.
 
 Do you volunteer to do the work?  I don't mean to be flippant here --
 I often don't have time to do maintenance for weeks, and I would like
 the list to be alive even when the admin is not available.

A simple rule to reject any message whose subject contains more
than, say, fifty percent of non-ASCII characters would
effortlessly block most of the spam.

-- 
André Majorel [EMAIL PROTECTED]
http://www.teaser.fr/~amajorel/



Re: Incorrect 'beautification' of URL?

2002-03-05 Thread Andre Majorel

On 2002-03-05 11:41 +0100, Philipp Thomas wrote:

 When requesting a URL like http://tmp.logix.cz/slash.xp , wget shortens
 this to http://tmp.logix.cz/slash.xp/. All Browsers I tested (Opera 6b1,
 Mozilla 0.9.8, Konqueror 2.9.2) pass this URL as given.
 
 So the question is, why wget (1.8.1) does what it does

Presumably because the author thought that both URLs are
equivalent. To my surprise, RFC 1945 seems to agree with you. It
says :

   URI= ( absoluteURI | relativeURI ) [ # fragment ]

   absoluteURI= scheme : *( uchar | reserved )

   relativeURI= net_path | abs_path | rel_path

   net_path   = // net_loc [ abs_path ]
   abs_path   = / rel_path
   rel_path   = [ path ] [ ; params ] [ ? query ]

   path   = fsegment *( / segment )
   fsegment   = 1*pchar
   segment= *pchar

Which I understand to mean that a segment can be empty, which
in turn could be interpreted as stating that the trailing
slashes in slash.xp are significant.

That said, setting up a web site to rely on empty path segments
strikes me as a creative way of looking for problems. :-) Why is
it important to you ?

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
std::disclaimer (Not speaking for my employer);



Re: KB or kB

2002-02-08 Thread Andre Majorel

On 2002-02-08 08:54 +0100, Hrvoje Niksic wrote:

 Wget currently uses KB as abbreviation for kilobyte.  In a Debian
 bug report someone suggested that kB should be used because it is
 more correct.  The reporter however failed to cite the reference for
 this, and a search of the web has proven inconclusive.
 
 Does someone understand the spelling issues involved enough to point
 out the correct spelling and back it up with arguments?

The applicable standard is the SI (Système International)
established by the CGPM (Conférence Générale des Poids et
Mesures). It defines the metric system units (s, m, V, g, etc.)
and the following prefixes for multiples and submultiples :

  yocto  y  10**-24
  zepto  z  10**-21
  atto   a  10**-18
  femto  f  10**-15
  pico   p  10**-12
  nano   n  10**-9
  micro  µ  10**-6
  milli  m  10**-3
  centi  c  10**-2
  deci   d  10**-1
  deca   da 10**1
  hecto  h  10**2
  kilo   k  10**3
  mega   M  10**6
  giga   G  10**9
  tera   T  10**12
  peta   P  10**15
  exaE  10**18
  zetta  Z  10**21
  yotta  Y  10**24

Capital K is not a prefix, it's the SI abbreviation for the
temperature unit, the kelvin (note : lower case k) named after
Lord Kelvin.

So it's definitely kB for kilobyte.

Whether that means 1000 bytes or 1024 bytes is another issue.
Regardless, KB is incorrect. As are mb, mB, gb and gB, by the
way.

-- 
André Majorel [EMAIL PROTECTED]
http://www.teaser.fr/~amajorel/



Re: Noise ratio getting a bit high?

2002-01-29 Thread Andre Majorel

On 2002-01-29 22:02 +0100, Hrvoje Niksic wrote:

 But that was just an example.  The actual reasoning for allowing
 non-subscriber posting boils down to three reasons:
 
 1. I believe it is the right thing to do.  I personally hate allegedly
supportive mailing lists that require me to subscribe before
asking a question.  I don't want to subscribe, dammit, I just want
to ask something.

I respectfully disagree. If we can spend the time to read and
answer the poster's question, the poster can spend five minutes
to subscribe/unsubscribe.

For reference, see the netiquette item on posting to newsgroups
and asking for replies by email.

 2. It allows the discussion to extend to non-subscribers.  You can
simply Cc a person to a discussion pertinent to him, and he will be
able to respond to the list.
 
 3. It allows the mails from [EMAIL PROTECTED] to be rerouted to this
list.

Yup.

 I am aware that in this matter, as well as in the infamous `Reply-To'
 debate, this list lies in the minority.  But that is not a sufficient
 reason to back down and let the spammers win.

Right now, [EMAIL PROTECTED] is providing free relaying for
spammers to all its subscribers. sarcasmIf this is not
letting the spammers win, I wonder what is./sarcasm

 If you have a spam-fighting suggestion that does *not* include
 disallowing non-subscriber postings, I am more than willing to listen.

Mmm... What would you think of having the list software
automatically add a special header (say X-Non-Subscriber) to
every mail sent by a non-subscriber ?

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
std::disclaimer (Not speaking for my employer);



Re: Noise ratio getting a bit high?

2002-01-28 Thread Andre Majorel

On 2002-01-28 14:33 -0500, Thomas Reinke wrote:

 Is anyone else not finding the noise ratio (i.e. spam)
 a bit high here?

A bit *low* you mean ? You bet.

 I sympathize with the effort required
 to lightly moderate, but might I recommend that
 _something_ be done to rid us all of this spam? It's
 getting to be irritating enough that I'm tempted to
 drop off the list, which I'd just as soon not do - wget
 is a fantastic little tool that I'd just as soon stay
 involved with actively, if possible.

Setting up a spam filter requires some effort on the part of the
list master. If the list master is too busy, a quick fix is
preventing non-subscribers from posting. That can usually be done
by flipping a bit in the config of the list software.

But what about [EMAIL PROTECTED], then ?

-- 
André Majorel [EMAIL PROTECTED]
http://www.teaser.fr/~amajorel/



Re: stdout

2002-01-25 Thread Andre Majorel

On 2002-01-25 14:01 +0100, Jens Röder wrote:

 for wget I would suggest a switch that allows to send the output directly
 to stdout. It would be easier to use it in pipes.

Does

  wget ... 21 | command

solve your problem ?

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
std::disclaimer (Not speaking for my employer);



Re: Can not build wget-1.8 under SunOS-4.1.4

2001-12-16 Thread Andre Majorel

On 2001-12-16 19:02 +0100, Hrvoje Niksic wrote:
 Andre Majorel [EMAIL PROTECTED] writes:
 
  On 2001-12-15 07:37 +0100, Hrvoje Niksic wrote:
  
  Is there a good fallback value of RAND_MAX for systems that don't
  bother to define it?
  
  The standard (SUS2) says :
  
The value of the {RAND_MAX} macro will be at least 32767.
 
 c9x says the same, but there is a subtle difference between statement
 and the information I actually need.  A SUS-conformant system will not
 present a problem because it will define RAND_MAX anyway.  The
 information I need is what RAND_MAX should fall back to on the
 traditional Unix systems that have rand(), but don't bother to
 define RAND_MAX.
 
 Online SunOS manuals are not very helpful -- the one at
 http://www.freebsd.org/cgi/man.cgi?query=randsektion=3manpath=SunOS+4.1.3
 can't even seem to decide whether RAND_MAX is 2^31-1 or 2^15-1, and
 there is no mention of RAND_MAX or of an include file that might
 define it.

5th edition, 6th edition, 7th edition and System III all
returned 0-32767. As RAND_MAX didn't exist at the time, plenty
of code must have been written that assumed 0-32767. For that
reason I think it unlikely that anybody ever wrote an
implementation of rand() that returned less than 0-32767.

I believe that a default value of 32767 is safe. Not optimal,
but safe.

Apparently, not all 32-bit systems use 2**31 - 1 : according to
one clcm-er, MSVC defines RAND_MAX as 32767.

-- 
André Majorel
Work: [EMAIL PROTECTED]
Home: [EMAIL PROTECTED] http://www.teaser.fr/~amajorel/



Re: Can not build wget-1.8 under SunOS-4.1.4

2001-12-15 Thread Andre Majorel

On 2001-12-15 07:37 +0100, Hrvoje Niksic wrote:

 Is there a good fallback value of RAND_MAX for systems that don't
 bother to define it?

The standard (SUS2) says :

  The value of the {RAND_MAX} macro will be at least 32767.

-- 
André Majorel
Work: [EMAIL PROTECTED]
Home: [EMAIL PROTECTED] http://www.teaser.fr/~amajorel/



Re: wget 1.8beta - handling of non-ascii characters in URL

2001-12-07 Thread Andre Majorel

On 2001-12-07 15:10 +0100, Hrvoje Niksic wrote:

 But: a character being unsafe for URL doesn't mean that the same
 character must be unsafe for the file name.  Wget currently contains
 the two, and that's a bug.  I'll try to fix that bug by adding another
 bitflag to the table, e.g. F which means reserved for file name,
 i.e. the character is unsafe, but don't touch it when encoding for
 file names.

Which of course is another can of worms because it's heavily
platform dependant. Apparently FAT and VFAT do not have the same
set of forbidden characters.

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
(Not speaking for my employer, etc.)



Re: Wget 1.8-beta2 now available

2001-12-03 Thread Andre Majorel

On 2001-12-01 23:30 +0100, Hrvoje Niksic wrote:
 Here is the next 1.8 beta.  Please test it if you can -- try compiling
 it on your granma's Ultrix box, run it on your niece's flashy web
 site, see if cookies work, etc.
 
 Get it from:
 
 ftp://gnjilux.srk.fer.hr/pub/unix/util/wget/.betas/wget-1.8-beta2.tar.gz

Success:
- Debian GNU/Linux woody, 80x86, GCC 2.95.4
- Solaris 7, SPARC, GCC 2.95.2

Failure:
- HP-UX 10.0, PA-RISC, GCC 3.0.1

  Problem #1 :

gcc -I. -I.-DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\ 
-DLOCALEDIR=\/usr/local/share/locale\ -O2 -Wall -Wno-implicit -c connect.c
connect.c: In function `test_socket_open':
connect.c:190: warning: passing arg 2 of `select' from incompatible pointer type
connect.c: In function `select_fd':
connect.c:283: warning: passing arg 2 of `select' from incompatible pointer type
connect.c:283: warning: passing arg 3 of `select' from incompatible pointer type
connect.c:283: warning: passing arg 4 of `select' from incompatible pointer type

(These are just warnings.)

  Problem #2 :

gcc -I. -I.-DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\ 
-DLOCALEDIR=\/usr/local/share/locale\ -O2 -Wall -Wno-implicit -c host.c
host.c: In function `lookup_host':
host.c:258: `h_errno' undeclared (first use in this function)
host.c:258: (Each undeclared identifier is reported only once
host.c:258: for each function it appears in.)

Apparently, h_errno is not declared at all under HP-UX (ie.
find /usr/include -follow -type f | xargs grep h_errno turns
up nothing). Declaring h_errno (extern int h_errno;) fixes
the problem. I suppose we need something like :

  #if HPUX
  extern int h_errno;
  #endif

  Problem #3 :

gcc -I. -I.-DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\ 
-DLOCALEDIR=\/usr/local/share/locale\ -O2 -Wall -Wno-implicit -c snprintf.c
snprintf.c: In function `dopr':
snprintf.c:311: `short int' is promoted to `int' when passed through `...'
snprintf.c:311: (so you should pass `int' not `short int' to `va_arg')
snprintf.c:323: `short unsigned int' is promoted to `int' when passed through `...'
snprintf.c:335: `short unsigned int' is promoted to `int' when passed through `...'
snprintf.c:349: `short unsigned int' is promoted to `int' when passed through `...'

GCC has become very annoying with that sort of things... I did
the suggested changes and the error messages vanished.

- OSF/1 4.0, alpha, DEC C 5.6

  Problem #1 :

cc -std1 -I. -I.-DHAVE_CONFIG_H
  -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\
  -DLOCALEDIR=\/usr/local/share/locale\ -O -Olimit 2000 -c
  host.c
cc: Error: host.c, line 221: In the initializer for lst[0],
  tmpstore does not have a constant address, but occurs in a
  context that requires an address constant.  This is an
  extension of the language.
  char *lst[] = { tmpstore, NULL };
--^

The error message is misleading, IMO. The real problem is that
we're initialising an auto array, which is something C does
not support, at least not C89/C90. The following patch
silences the compiler :

diff -ur wget-1.8-beta2/src/host.c wget-1.8-beta2_aym/src/host.c
--- wget-1.8-beta2/src/host.c   Fri Nov 30 11:50:29 2001
+++ wget-1.8-beta2_aym/src/host.c   Mon Dec  3 16:30:58 2001
@@ -218,7 +218,7 @@
   if ((int)addr != -1)
 {
   char tmpstore[IP4_ADDRESS_LENGTH];
-  char *lst[] = { tmpstore, NULL };
+  char *lst[2];
 
   /* ADDR is defined to be in network byte order, which is what
 this returns, so we can just copy it to STORE_IP.  However,
@@ -232,6 +232,8 @@
   offset = 0;
 #endif
   memcpy (tmpstore, (char *)addr + offset, IP4_ADDRESS_LENGTH);
+  lst[0] = tmpstore;
+  lst[1] = NULL;
   return address_list_new (lst);
 }
 
  Problem #2 :

There is also this shit. Take a deep breath :

  cc: Warning: snprintf.c, line 128: In this declaration, type signed long long 
is a language extension.
 LLONG value, int base, int min, int max, int flags);
  ---^
  cc: Warning: snprintf.c, line 170: In this declaration, type signed long long 
is a language extension.
LLONG value;
  --^
  cc: Warning: snprintf.c, line 315: In this statement, type signed long long is 
a language extension.
value = va_arg (args, LLONG);
  --^
  cc: Warning: snprintf.c, line 315: In this statement, type signed long long is 
a language extension.
value = va_arg (args, LLONG);
  --^
  cc: Warning: snprintf.c, line 315: In this statement, type signed long long is 
a language extension.
value = va_arg (args, LLONG);
  --^
  cc: Warning: snprintf.c, line 315: In this statement, type signed long long is 
a language extension.
  

Re: Wget 1.8-beta2 now available

2001-12-03 Thread Andre Majorel

On 2001-12-03 18:30 +0100, Hrvoje Niksic wrote:
 Andre Majorel [EMAIL PROTECTED] writes:
 
  gcc -I. -I.-DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\ 
-DLOCALEDIR=\/usr/local/share/locale\ -O2 -Wall -Wno-implicit -c connect.c
  connect.c: In function `test_socket_open':
  connect.c:190: warning: passing arg 2 of `select' from incompatible pointer 
type
  connect.c: In function `select_fd':
  connect.c:283: warning: passing arg 2 of `select' from incompatible pointer 
type
  connect.c:283: warning: passing arg 3 of `select' from incompatible pointer 
type
  connect.c:283: warning: passing arg 4 of `select' from incompatible pointer 
type
  
  (These are just warnings.)
 
 And weird ones, too.  These arguments are of type pointer to
 fd_set.  What would HPUX like to see there?

HP-UX 10 wants (int *). However it defines fd_set as

  struct
  {
long[];
  }

so it works anyway.

HP-UX 10 is wrong. SUS2 (and POSIX ?) say (fd_set *). HP-UX 11
has it right.

I suppose the best thing to do is to ignore those warnings.

 I think I'll use something like:
 
 #ifndef h_errno
 extern int h_errno;
 #endif

h_errno is not necessarily a macro ! What do you think of
Maciej's proposal ?

 Two questions here:
 
 * Does HPUX really not have snprintf()?  It sounds weird that a modern
   OS wouldn't have it.

I find describing HP-UX 10 as a modern OS mildly amusing. :-) I
completely disagree with your perception that snprintf() is to
be taken for granted. It's only since C99 that's it's part of C.

But to answer your question, no HP-UX doesn't have it (neither
in the headers nor in libc).

 * short int is promoted to int, ok.  Does that go for all the
   architectures, or just some?  Should I simply replace short int
   with int to get it to compile?

Yes, replace short int and unsigned short by int. It's not
architecture specific, the same thing happened to me on x86. GCC
2.95 doesn't care, GCC 2.96 and 3.0 complain.

  The error message is misleading, IMO. The real problem is that
  we're initialising an auto array, which is something C does
  not support, at least not C89/C90.
 
 Indeed.  I wonder why I thought that was legal C.  Ok, I'll apply your
 patch.

Not enough cafeine. :-)

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
(Not speaking for my employer, etc.)



Re: Wget 1.8-beta2 now available

2001-12-03 Thread Andre Majorel

On 2001-12-01 23:30 +0100, Hrvoje Niksic wrote:
 Here is the next 1.8 beta.  Please test it if you can -- try compiling
 it on your granma's Ultrix box, run it on your niece's flashy web
 site, see if cookies work, etc.
 
 Get it from:
 
 ftp://gnjilux.srk.fer.hr/pub/unix/util/wget/.betas/wget-1.8-beta2.tar.gz

Success:
- NCR MP-RAS 3.0, x86, NCR High Performance C Compiler R3.0c
- FreeBSD 4.0, x86, GCC 2.95.2

Thanks !

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
(Not speaking for my employer, etc.)



Re: Wget 1.8-beta2 now available

2001-12-03 Thread Andre Majorel

On 2001-12-03 19:16 +0100, Hrvoje Niksic wrote:

  I find describing HP-UX 10 as a modern OS mildly amusing. :-)
 
 How old is it?  I used to work on HPUX 9, and I'm not old by most
 definitions of the word.

Around 1995.

  I completely disagree with your perception that snprintf() is to be
  taken for granted. It's only since C99 that's it's part of C.
 
 It's been a part of C since C99, that's true.  But Wget relies on a
 lot of functionality not strictly in C, from alloca to the socket
 interface.
 
 Also, snprintf has become a big security thing recently, when a number
 of exploits was based on overflowing a buffer written to by sprintf.
 The pressure on vendors might be responsible for some of them being
 unusually swift in providing the function.
 
 But yes, I know I can't take it for granted, hence the provided
 replacement.

Yes, I'm with you on that. We have exactly the same problems as
you here and I for one wish snprintf() had been there from the
start.

  But to answer your question, no HP-UX doesn't have it (neither in
  the headers nor in libc).
 
 HPUX 11 doesn't have it either?  Interesting.

HP-UX 10 doesn't but HP-UX 11 has it, according to docs.hp.com.

The work you did on the list of already downloaded URLs seems to
have been efficient ; Wget's long standing tendency to forget
files in recursive downloads appears to be gone. A million
thanks to Hrvoje, the contributors and the testers.

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
(Not speaking for my employer, etc.)



Re: Wget 1.8-beta3 now available

2001-12-03 Thread Andre Majorel

On 2001-12-03 21:55 +0100, Hrvoje Niksic wrote:
 Bugfixes since 1.8-beta2.  Please test it from clean compilation on
 Unix (Windows and MacOS are known not to compile without modifications
 when SSL is used.)
 
 Get it from:
 
 ftp://gnjilux.srk.fer.hr/pub/unix/util/wget/.betas/wget-1.8-beta3.tar.gz

This one compiles on all platforms.

Solaris 7   OK
FreeBSD 4.0 OK
HP-UX 10OK
MP-RAS 3.0  OK
Debian Linux woody  OK
OSF/1 4.0   OK

Beautiful. :-)

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
(Not speaking for my employer, etc.)



Re: Multithreading wget

2001-06-16 Thread Andre Majorel

On 2001-06-15 16:30 -0500, Bazuka wrote:

 So what would be the advantage of multithreading this application ?

Multithreading might be an advantage when retrieving files from
several hosts because gethostbyname() is blocking (and often
takes a while to complete).

-- 
André Majorel [EMAIL PROTECTED]
http://www.teaser.fr/~amajorel/



1.7.1-pre1 on NCR MP-RAS: success

2001-06-15 Thread Andre Majorel

Executive summary: complete success.

On NCR MP-RAS, Wget 1.7.1-pre1 configured and compiled fine, and
passed a few simple tests. The -lnsl/-lsocket and MAP_FAILED
problems seen with previous versions did not occur.

No SSL library is installed on the system. ./configure
--with-ssl detected that correctly. The resulting executable
worked fine with HTTP. For https: URLs, it prints
Unknown/unsupported protocol and exits.

A binary made with plain ./configure without --with-ssl exhibits
the same exact behaviour.

Should you need the logs, they're at

 http://www.teaser.fr/~amajorel/mpras/jp/wget-1.7.1-pre1.config.log.gz
 http://www.teaser.fr/~amajorel/mpras/jp/wget-1.7.1-pre1.config.log.with-ssl.gz

Thanks to everyone involved.

-- 
André Majorel
Work: [EMAIL PROTECTED]
Home: [EMAIL PROTECTED] http://www.teaser.fr/~amajorel/



Re: wget-1.7 does not compile with glibc1 (libc5)

2001-06-08 Thread Andre Majorel

On 2001-06-08 17:57 -0400, Parsons, Donald wrote:
 Previous versions up to 1.6 compiled fine.
 
 cd src  make CC='gcc' CPPFLAGS='' DEFS='-DHAVE_CONFIG_H 
-DSYSTEM_WGETRC=\/usr/etc/wgetrc\ -DLOCA
 LEDIR=\/usr/share/locale\' CFLAGS='-O2 -fomit-frame-pointer -march=pentium 
-mcpu=pentium -pipe' LD
 FLAGS='-s' LIBS='' prefix='/usr' exec_prefix='/usr' bindir='/usr/bin' 
infodir='/usr/info' mandir='/u
 sr/man' manext='1'
 make[1]: Entering directory `/usr/src/wget-1.7/src'
 gcc -I. -I.-DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/etc/wgetrc\ 
-DLOCALEDIR=\/usr/share/locale\
  -O2 -fomit-frame-pointer -march=pentium -mcpu=pentium -pipe -c utils.c
 utils.c: In function `read_file':
 utils.c:980: `MAP_FAILED' undeclared (first use in this function)
 utils.c:980: (Each undeclared identifier is reported only once
 utils.c:980: for each function it appears in.)
 make[1]: *** [utils.o] Error 1
 make[1]: Leaving directory `/usr/src/wget-1.7/src'
 make: *** [src] Error 2

Quick and dirty fix : insert the following in utils.c before the
reference to MAP_FAILED :

#ifndef MAP_FAILED
#  define MAP_FAILED -1
#endif

-- 
André Majorel [EMAIL PROTECTED]
http://www.teaser.fr/~amajorel/



Re: Wget 1.7-pre1 available for testing

2001-06-06 Thread Andre Majorel

On 2001-06-06 12:47 +0200, Jan Prikryl wrote:
  Jan Prikryl [EMAIL PROTECTED] writes:
  
   It seems that -lsocket is not found as it requires -lnsl for
   linking. -lnsl is not detected as it does not contain
   `gethostbyname()' function.
  
  That's weird.  What does libnsl contain if not gethostbyname()?
 
 It seems to contain `gethostname()' ... see the config.log submitted
 in one of the previous emails. But it's a very long distance shot: if,
 after adding -lsocket -lnsl everything works correctly and if with
 -lsocket only the linker complains about missing 'yp_*()' functions
 and also missing `gethostname()' and `getdomainname()', I thinks it's
 likely that these functions are defined in -lnsl. Of course, if -lnsl
 has built in dependency on some other library, the situation might be
 completely different.

I've put the output of nm for libsocket and libnsl at

  http://www.teaser.fr/~amajorel/mpras/libnsl.so.nm.gz
  http://www.teaser.fr/~amajorel/mpras/libsocket.so.nm.gz

-- 
André Majorel
Work: [EMAIL PROTECTED]
Home: [EMAIL PROTECTED] http://www.teaser.fr/~amajorel/



Re: Wget 1.7-pre1 available for testing

2001-06-05 Thread Andre Majorel

On 2001-06-02 20:50 +0200, Andre Majorel wrote:
 On 2001-06-02 17:30 +0200, Hrvoje Niksic wrote:

   - The empty LIBS problem remains (add -lsocket -lnsl).
  
  Do you have a config.log for this?  Wget's configure tries hard to
  determine whether `-lsocket' and `-lnsl' are needed, and this seems to
  work on Solaris.  Can you see why it fails on your machine?
 
 The problem seems so be in autoconf. From my attempts at
 compiling v1.6 on the same system :
 
   checking for gethostbyname in -lnsl... no
   checking for socket in -lsocket... no
 
 when in fact they are there. I don't have access to the machine
 until tuesday. I'll post the config.log then. Sorry.

Tuesday is today. config.log for 1.6 and 1.7-pre1 attached. 1.7
is identical to 1.7-pre1.

-- 
André Majorel
Work: [EMAIL PROTECTED]
Home: [EMAIL PROTECTED] http://www.teaser.fr/~amajorel/

 config.log.gz
 config.log.gz


Re: SVR4 compile error

2001-06-01 Thread Andre Majorel

On 2001-05-26 11:10 +0200, Hrvoje Niksic wrote:
 Andre Majorel [EMAIL PROTECTED] writes:
 
  Compiling Wget 1.6 on an SVR4 derivative (NCR MP-RAS 3.0), I got
  this strange error:
 
 I think the problem is that Wget 1.6 tried to force strict ANSI mode
 out of the compiler.
 
 Try running make like this:
 
 make CC=cc CFLAGS=-g
 
 See if it compiles then.

After removing -cX from $(CC) and adding -lsocket -lnsl to
$(LIBS), it compiled. I guess autoconf has not been given much
testing on this platform. :-) The binary seems fine.

Is there a central repository for wget binaries ?

-- 
André Majorel
Work: [EMAIL PROTECTED]
Home: [EMAIL PROTECTED] http://www.teaser.fr/~amajorel/



SVR4 compile error

2001-05-26 Thread Andre Majorel

Compiling Wget 1.6 on an SVR4 derivative (NCR MP-RAS 3.0), I got
this strange error:

  # make
  CONFIG_FILES= CONFIG_HEADERS=src/config.h ./config.status
  creating src/config.h
  src/config.h is unchanged
  generating po/POTFILES from ./po/POTFILES.in
  creating po/Makefile
  cd src  make CC='cc -Xc -D__EXTENSIONS__' CPPFLAGS='' 
DEFS='-DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\ 
-DLOCALEDIR=\/usr/local/share/locale\'  CFLAGS='-O' LDFLAGS='' LIBS=''  
prefix='/usr/local' exec_prefix='/usr/local' bindir='/usr/local/bin'  
infodir='/usr/local/info' mandir='/usr/local/man' manext='1'
  cc -Xc -D__EXTENSIONS__ -I. -I.   -DHAVE_CONFIG_H 
-DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\ -DLOCALEDIR=\/usr/local/share/locale\ -O 
-c cmpt.c
  NCR High Performance C Compiler R3.0c 
  (c) Copyright 1994-98, NCR Corporation
  (c) Copyright 1987-98, MetaWare Incorporated
  cc -Xc -D__EXTENSIONS__ -I. -I.   -DHAVE_CONFIG_H 
-DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\ -DLOCALEDIR=\/usr/local/share/locale\ -O 
-c connect.c
  NCR High Performance C Compiler R3.0c 
  (c) Copyright 1994-98, NCR Corporation
  (c) Copyright 1987-98, MetaWare Incorporated
  E /usr/include/arpa/inet.h,L66/C19(#164): in_addr_t
  |Symbol declaration is inconsistent with a previous declaration
  |at /usr/include/netinet/in.h,L47/C27.
  E /usr/include/arpa/inet.h,L67/C19(#164): in_port_t
  |Symbol declaration is inconsistent with a previous declaration
  |at /usr/include/netinet/in.h,L46/C28.
  E /usr/include/arpa/inet.h,L68/C1(#164):  in_addr_t
  |Symbol declaration is inconsistent with a previous declaration
  |at /usr/include/netinet/in.h,L47/C27.
  E /usr/include/arpa/inet.h,L69/C1(#164):  in_port_t
  |Symbol declaration is inconsistent with a previous declaration
  |at /usr/include/netinet/in.h,L46/C28.
  w (#657):   (info) How referenced files were included:
  |File /usr/include/netinet/in.h from connect.c.
  |File /usr/include/arpa/inet.h from connect.c.
  4 user errors   1 warning   
  *** Error code 4 (bu21)

  make: fatal error.
  *** Error code 1 (bu21)

  make: fatal error.

I find it strange that there would be more than one definition
for in_addr_t and in_port_t. Does someone understand what's
going on and how to fix it ?

The output of configure:

  # ./configure
  creating cache ./config.cache
  configuring for GNU Wget 1.6
  checking host system type... i586-ncr-sysv4.3.03
  checking whether make sets ${MAKE}... yes
  checking for a BSD compatible install... ./install-sh -c
  checking for gcc... no
  checking for cc... cc
  checking whether the C compiler (cc  ) works... yes
  checking whether the C compiler (cc  ) is a cross-compiler... no
  checking whether we are using GNU C... no
  checking whether cc accepts -g... no
  checking how to run the C preprocessor... /lib/cpp
  checking for AIX... no
  checking for cc option to accept ANSI C... -Xc -D__EXTENSIONS__
  checking for function prototypes... yes
  checking for working const... yes
  checking for size_t... yes
  checking for pid_t... yes
  checking whether byte ordering is bigendian... no
  checking size of long... 4
  checking size of long long... 8
  checking for string.h... yes
  checking for stdarg.h... yes
  checking for unistd.h... yes
  checking for sys/time.h... yes
  checking for utime.h... yes
  checking for sys/utime.h... yes
  checking for sys/select.h... yes
  checking for sys/utsname.h... yes
  checking for pwd.h... yes
  checking for signal.h... yes
  checking whether time.h and sys/time.h may both be included... yes
  checking return type of signal handlers... void
  checking for struct utimbuf... yes
  checking for working alloca.h... yes
  checking for alloca... yes
  checking for strdup... yes
  checking for strstr... yes
  checking for strcasecmp... no
  checking for strncasecmp... no
  checking for gettimeofday... yes
  checking for mktime... yes
  checking for strptime... yes
  checking for strerror... yes
  checking for snprintf... yes
  checking for vsnprintf... yes
  checking for select... yes
  checking for signal... yes
  checking for symlink... yes
  checking for access... yes
  checking for isatty... yes
  checking for uname... yes
  checking for gethostname... no
  checking for gethostbyname... no
  checking for gethostbyname in -lnsl... no
  checking for socket in -lsocket... no
  checking whether NLS is requested... yes
  language catalogs: cs da de el et fr gl hr it ja nl no pl pt_BR ru sk sl sv zh
  checking for msgfmt... msgfmt
  checking for xgettext... :
  checking for gmsgfmt... msgfmt
  checking for locale.h... yes
  checking for libintl.h... no
  checking for gettext... no
  checking for gettext in -lintl... no
  gettext not found; disabling NLS
  checking for makeinfo... no
  checking for emacs... no
  checking for xemacs... no
  updating cache ./config.cache
  creating ./config.status
  creating Makefile
  creating src/Makefile
  creating 

Re: output to standard error?

2001-03-19 Thread Andre Majorel

On 2001-03-20 00:25 +0100, Hrvoje Niksic wrote:
 "Eddy Thilleman" [EMAIL PROTECTED] writes:
 
  Wget sends its output to standard error. Why is that?
 
 "It seemed like a good idea."
 
 The rationale behind it is that Wget's "output" is not real output,
 more a progress indication thingie.  The real output is when you
 specify `-O -', and that goes to stdout.
 
 Francois Pinard once suggested that Wget prints its progress output to
 stdout, except when `-O -' is specified, when progress should go to
 stderr.

Shrug. Anyone who wants to capture the output of a program for
unattended operation (which is what I think Eddy wants)
generally has to catch both stdout and stderr anyway. So does it
matter much how much of it goes to stdout vs. stderr ?

If you're doing wget  21, there's no surprise.

If your shell is command.com, you might see things differently.
;-)

-- 
Andr Majorel [EMAIL PROTECTED]
http://www.teaser.fr/~amajorel/



Patch: new option --ignore-size

2001-02-26 Thread Andre Majorel

I'm mirroring a very large tree locally. As the tree is larger
than the local filesystem, I periodically stop wget, save what
I've downloaded on CD-ROM, truncate the saved files to 0 and
then start wget -N -r again to get more files.

Unfortunately, wget checks not only the mtime but also the size
of the local files and starts downloading them again.

This patch adds the --ignore-size option which prevents this.
When this option is present, wget will not retrieve the remote
file again as long as the local file exists and is more recent,
even if its size is not the same as the remote file.

The patch has been posted to wget-patches. It's also available
at URL:http://www.teaser.fr/~amajorel/wget/.

I will write a documentation patch if you think the patch worth
including in the distribution.

-- 
Andr Majorel
Work: [EMAIL PROTECTED]
Home: [EMAIL PROTECTED] http://www.teaser.fr/~amajorel/