Re: wget and ASCII mode

2005-06-25 Thread Steven M. Schweda
> [...]  (The new code does make one potentially risky assumption,
> but it's explained in the comments.)

   The latest code in my patches and in my new 1.9.1d kit (for VMS,
primarily, but not exclusively) removes the potentially risky assumption
(CR and LF in the same buffer), so it should be swell.  I've left it for
someone else to activate the conditional code which would restore CR-LF
line endings on systems where that's preferred.

   It does seem a bit odd that no one has noticed this fundamental
problem until now, but then I missed it, too.



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Re: Wget and Secure Pages

2005-06-25 Thread Hrvoje Niksic
"John Haymaker" <[EMAIL PROTECTED]> writes:

> I am trying to download all pages in my site except secure pages that
> require login.
>  
> Problem:  when wget encounters a secure page requiging the user to log in,
> it hangs there for up to an hour.  Then miraculously, it moves on.

By "secure pages" do you mean https: pages?

Normally Wget has a timeout mechanism that prevents it from hanging
for so long (the default timeout is 15 minutes, but it can be
shortened to 10 seconds or to whatever works for you), but it
sometimes doesn't work for OpenSSL.


Re: No more Libtool (long)

2005-06-25 Thread Hrvoje Niksic
"Post, Mark K" <[EMAIL PROTECTED]> writes:

> I read the entire message, but I probably didn't have to.  My
> experience with libtool in packages that really are building
> libraries has been pretty painful.  Since wget doesn't build any,
> getting rid of it is one less thing to kill my builds in the future.

Good to know that I'm not the only one harboring -- doubts -- towards
Libtool.  Google doesn't show even nearly enough hits when you search
for "libtool sucks".

> Congratulations.

Thanks.


Wget and Secure Pages

2005-06-25 Thread John Haymaker




I am trying to download all pages in my site except secure 
pages that require login.
 
Problem:  when wget encounters a secure page 
requiging the user to log in, it hangs there for up to an hour.  Then 
miraculously, it moves on.
 
I do not want to download these pages, so I'm not using a 
password.  When a use encounters links to these pages and clicks a link, 
the user is redirected to a log in page.   
 
These secure pages are not in a specific directory, and 
there is nothing in the link that indicates it is secure.  So I can't use 
pattern matching or directories to avoid them.
 
Anyone have any ideas how I should get Wget to quickly 
move on after encountering one of these secure 
pages?


RE: No more Libtool (long)

2005-06-25 Thread Post, Mark K
I read the entire message, but I probably didn't have to.  My experience
with libtool in packages that really are building libraries has been
pretty painful.  Since wget doesn't build any, getting rid of it is one
less thing to kill my builds in the future.  Congratulations.


Mark Post

-Original Message-
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 24, 2005 8:11 PM
To: wget@sunsite.dk
Subject: No more Libtool (long)


Thanks to the effort of Mauro Tortonesi and the prior work of Bruno
Haible, Wget has been modified to no longer use Libtool for linking in
external libraries.  If you are interested in why that might be a cause
for celebration, read on.



Re: ftp bug in 1.10

2005-06-25 Thread Hrvoje Niksic
David Fritz <[EMAIL PROTECTED]> writes:

> "I64" is a size prefix akin to "ll". One still needs to specify the
> argument type as in "%I64d" as with "%lld".

That makes sense, thanks for the explanation!


Re: ftp bug in 1.10

2005-06-25 Thread David Fritz
"I64" is a size prefix akin to "ll". One still needs to specify the argument 
type as in "%I64d" as with "%lld".




Re: ftp bug in 1.10

2005-06-25 Thread Hrvoje Niksic
Gisle Vanem <[EMAIL PROTECTED]> writes:

> "Hrvoje Niksic" <[EMAIL PROTECTED]> wrote:
>
>> It should print a line containing "100".  If it does, it means
>> we're applying the wrong format.  If it doesn't, then we must find
>> another way of printing LARGE_INT quantities on Windows.
>
> I don't know what compiler OP used, but Wget only uses
> "%I64" for MSVC on Windows.

All Heiko's builds are done using MSVC.  The question remains why the
code misbehaved.

I've now simplified this code path, and removed the %I64 usage from
all cases but the one in utils.c, which I don't know how to easily get
rid of.


Re: ftp bug in 1.10

2005-06-25 Thread Gisle Vanem

"Hrvoje Niksic" <[EMAIL PROTECTED]> wrote:


It should print a line containing "100".  If it does, it means
we're applying the wrong format.  If it doesn't, then we must find
another way of printing LARGE_INT quantities on Windows.


I don't know what compiler OP used, but Wget only uses
"%I64" for MSVC on Windows. Ref sysdep.h line 111-114.

--gv


Re: ftp bug in 1.10

2005-06-25 Thread Hrvoje Niksic
Hrvoje Niksic <[EMAIL PROTECTED]> writes:

> This would indicate that the "%I64" format, which Wget uses to print
> the 64-bit "download sum", doesn't work for you.

For what it's worth, MSDN documents it: http://tinyurl.com/ysrh/.
Could you be compiling Wget with an older C runtime that doesn't
support the %I64 format?


Re: ftp bug in 1.10

2005-06-25 Thread Hrvoje Niksic
Herold Heiko <[EMAIL PROTECTED]> writes:

> Downloaded:  bytes in 2 files
>
> Note missing number of bytes.

This would indicate that the "%I64" format, which Wget uses to print
the 64-bit "download sum", doesn't work for you.  What does this
program print?

#include 
int
main (void)
{
  __int64 n = 100I64;  // ten billion, doesn't fit in 32 bits
  printf("%I64\n", n);
  return 0;
}

It should print a line containing "100".  If it does, it means
we're applying the wrong format.  If it doesn't, then we must find
another way of printing LARGE_INT quantities on Windows.


Re: Removing thousand separators from file size output

2005-06-25 Thread Hrvoje Niksic
Alain Bench <[EMAIL PROTECTED]> writes:

> Removing separators will break existing apps parsing wget's output.
> Such apps exist?

They do exist, but *any* change in Wget's output will break them.
Since they probably do the equivalent of sed s/,//g anyway, the
removal of separators is likely to be the least of their problems.

Maybe I was not clear enough in the "pasting" requirement from my
first bullet point: by that I didn't refer to programmatical
processing of Wget's whole output, but to hand-picking parts of it
(such as file size or file name), and manually copy+pasting them to
the shell or to bc.  In that case sed is not trivially involved and
yet the thousand separators *always* have to be removed.

>> omitting the thousand separators merely removes redundancy, not useful
>> information.
>
> That's true only if you assume the user analyses the /unit-size/ and
> /kmt-size/ as a whole, as a unique info. But that's not always the case.
> One may well look only at /unit-size/. Without seps, this user is forced
> to count digits, or to look additionally to /kmt-size/, and do some
> brainwork to find corresponding order of magnitude. For this user, sep
> removal removes readability.

Here you seem to assume that the typical user cares about and first
looks at exact, to-the-byte figures.  In my experience that is rarely
the case -- in most cases, the user cares about the order of
magnitude, such as "640K" or "42M", rather than the byte size.  In
fact, when I do need the exact size, it is exactly in order to be able
to paste it to another program, such as emacs or bc, which Wget makes
harder by inserting those separators!

With the order of magnitude information being readily available in the
form of the unit, Wget (at least for some uses) does me a disservice
by adding that same information in the form of separators.

> Unless a bigger unavoidable danger interferes. That's my humble
> opinion, but I believe it's also some more general ergonomic
> principle.

If so, I have yet to see this principle in writing, or use an
application that abides by it by default, the single exception being
-- Wget.  (And Wget doesn't accept grouped digits in numeric input, so
it's inconsistent to boot.)

Even number-oriented applications touted as user-friendly such as
oocalc (and presumably Excel, but I don't have it around to verify)
don't group digits by default.

>> As for localization, I'm not against it. The argument was that, where
>> possible, I prefer the output of applications to remain parsable.
>
> So we disagree only on the balance. I'd say output to humans should
> be localized as much as possible, unless this creates a really serious
> problem for the machine parsing secondary usage.

You're right, my choice of balance leans more to the parsing side,
although actual parsing is only part of the picture.  For example, the
ISO 8601 dates have the nice property that the simple textual sort
orders them chronologically.  This is useful for file names (e.g. log
files), but also for easy sorting of textual date columns in
spreadsheets and databases!  In this case the computer didn't even try
to make sense of the data, but its regularity helped make it more
useful.

(Of course, ISO 8601 dates also have the property of being easily
parsable with either straightforward regexps or trivial C code,
neither being the case for localized dates -- see GNU getdate.y.)

> Where incompatible, human and machine output may be separated.

An important point of the Unix philosophy is that, with some care, the
same output can be served to humans and machines.  (Piping the output
of `du' or `wc' to sort is an example of doing both.)  While that
principle may be misguided and doesn't directly apply to the more
human-oriented Wget's output, it can be applied with measure.  I find
it self-evident that it is better to at least be able to paste parts
of output into other programs than to not be able to do so.


Re: Removing thousand separators from file size output

2005-06-25 Thread Alain Bench
 On Friday, June 24, 2005 at 6:45:44 PM +0200, Hrvoje Niksic wrote:

> input for other applications, which is very hard with the thousand
> separators.

Pasting is very hard, parsing is not. An app running wget can easely
parse it's output, whatever it is. If not directly then thru a wrapper.
The problem is only with "side-apps" where user must copy/paste. How
frequently is that used?

Removing separators will break existing apps parsing wget's output.
Such apps exist?


> Alain Bench <[EMAIL PROTECTED]> writes:
>> Humans can have habit to look at exact unit size, or rounded
>> kilo/mega/tera size, or both.
> omitting the thousand separators merely removes redundancy, not useful
> information.

That's true only if you assume the user analyses the /unit-size/ and
/kmt-size/ as a whole, as a unique info. But that's not always the case.
One may well look only at /unit-size/. Without seps, this user is forced
to count digits, or to look additionally to /kmt-size/, and do some
brainwork to find corresponding order of magnitude. For this user, sep
removal removes readability.


> If the users were so used to separators, they would surely request
> them in other programs, such as `ls', `du', or `df'?

Those 3 commands print numbers in right-aligned columns: The
ergonomic need for seps is a little lower. And the "ls -l" filename
truncation on 80 wide terms might be seen as a bigger annoyance: 3 seps
added in size would mean 3 chars less in filename. And legacy behaviour
*MUST* absolutly be retained for such old, widely used, and frequently
machine-parsed commands.

But anyway I would personally love to see separators here too.


[localization]
> You can make a case that the correct character and layout should be
> used for digit grouping when it is deployed, but I don't see how you
> can argue that grouping *must* be used in all applications!

I agree. There are cases where localized grouping and even grouping
alone are useless or harmfull: Each time the only or primary destination
of a number is another app.

But when the intendend reader is human, localized grouping *should*
be used. Unless a bigger unavoidable danger interferes. That's my humble
opinion, but I believe it's also some more general ergonomic principle.

I am able to buy the small advantage over code complexity ratio
argument you once explained. But I somewhat regret having to buy it.

BTW my "locale thousands_sep" gives a " " non-breaking space, and
"locale decimal_point" gives a "," comma.


> As for localization, I'm not against it. The argument was that, where
> possible, I prefer the output of applications to remain parsable.

So we disagree only on the balance. I'd say output to humans should
be localized as much as possible, unless this creates a really serious
problem for the machine parsing secondary usage.

Where incompatible, human and machine output may be separated. Say
on option, or like GnuPG --status-fd simultaneously: Human reads
stdout/err, while machine parses another fd. That's material for present
debate, not my wish for wget.


> I consider the ISO 8601 date format a clear advantage over the
> asctime() format.

;-) Good example: I *hate* having to read 8601 dates. Nearly as much
as having to read those other dates, localized or not, with month/day
ambiguity. MHO only, here: I know some people love 8601.


Bye!Alain.
-- 
« if you believe subversive history books, I've got a bridge to sell you. »