Re: wget and ASCII mode
> [...] (The new code does make one potentially risky assumption, > but it's explained in the comments.) The latest code in my patches and in my new 1.9.1d kit (for VMS, primarily, but not exclusively) removes the potentially risky assumption (CR and LF in the same buffer), so it should be swell. I've left it for someone else to activate the conditional code which would restore CR-LF line endings on systems where that's preferred. It does seem a bit odd that no one has noticed this fundamental problem until now, but then I missed it, too. Steven M. Schweda (+1) 651-699-9818 382 South Warwick Street[EMAIL PROTECTED] Saint Paul MN 55105-2547
Re: Wget and Secure Pages
"John Haymaker" <[EMAIL PROTECTED]> writes: > I am trying to download all pages in my site except secure pages that > require login. > > Problem: when wget encounters a secure page requiging the user to log in, > it hangs there for up to an hour. Then miraculously, it moves on. By "secure pages" do you mean https: pages? Normally Wget has a timeout mechanism that prevents it from hanging for so long (the default timeout is 15 minutes, but it can be shortened to 10 seconds or to whatever works for you), but it sometimes doesn't work for OpenSSL.
Re: No more Libtool (long)
"Post, Mark K" <[EMAIL PROTECTED]> writes: > I read the entire message, but I probably didn't have to. My > experience with libtool in packages that really are building > libraries has been pretty painful. Since wget doesn't build any, > getting rid of it is one less thing to kill my builds in the future. Good to know that I'm not the only one harboring -- doubts -- towards Libtool. Google doesn't show even nearly enough hits when you search for "libtool sucks". > Congratulations. Thanks.
Wget and Secure Pages
I am trying to download all pages in my site except secure pages that require login. Problem: when wget encounters a secure page requiging the user to log in, it hangs there for up to an hour. Then miraculously, it moves on. I do not want to download these pages, so I'm not using a password. When a use encounters links to these pages and clicks a link, the user is redirected to a log in page. These secure pages are not in a specific directory, and there is nothing in the link that indicates it is secure. So I can't use pattern matching or directories to avoid them. Anyone have any ideas how I should get Wget to quickly move on after encountering one of these secure pages?
RE: No more Libtool (long)
I read the entire message, but I probably didn't have to. My experience with libtool in packages that really are building libraries has been pretty painful. Since wget doesn't build any, getting rid of it is one less thing to kill my builds in the future. Congratulations. Mark Post -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Friday, June 24, 2005 8:11 PM To: wget@sunsite.dk Subject: No more Libtool (long) Thanks to the effort of Mauro Tortonesi and the prior work of Bruno Haible, Wget has been modified to no longer use Libtool for linking in external libraries. If you are interested in why that might be a cause for celebration, read on.
Re: ftp bug in 1.10
David Fritz <[EMAIL PROTECTED]> writes: > "I64" is a size prefix akin to "ll". One still needs to specify the > argument type as in "%I64d" as with "%lld". That makes sense, thanks for the explanation!
Re: ftp bug in 1.10
"I64" is a size prefix akin to "ll". One still needs to specify the argument type as in "%I64d" as with "%lld".
Re: ftp bug in 1.10
Gisle Vanem <[EMAIL PROTECTED]> writes: > "Hrvoje Niksic" <[EMAIL PROTECTED]> wrote: > >> It should print a line containing "100". If it does, it means >> we're applying the wrong format. If it doesn't, then we must find >> another way of printing LARGE_INT quantities on Windows. > > I don't know what compiler OP used, but Wget only uses > "%I64" for MSVC on Windows. All Heiko's builds are done using MSVC. The question remains why the code misbehaved. I've now simplified this code path, and removed the %I64 usage from all cases but the one in utils.c, which I don't know how to easily get rid of.
Re: ftp bug in 1.10
"Hrvoje Niksic" <[EMAIL PROTECTED]> wrote: It should print a line containing "100". If it does, it means we're applying the wrong format. If it doesn't, then we must find another way of printing LARGE_INT quantities on Windows. I don't know what compiler OP used, but Wget only uses "%I64" for MSVC on Windows. Ref sysdep.h line 111-114. --gv
Re: ftp bug in 1.10
Hrvoje Niksic <[EMAIL PROTECTED]> writes: > This would indicate that the "%I64" format, which Wget uses to print > the 64-bit "download sum", doesn't work for you. For what it's worth, MSDN documents it: http://tinyurl.com/ysrh/. Could you be compiling Wget with an older C runtime that doesn't support the %I64 format?
Re: ftp bug in 1.10
Herold Heiko <[EMAIL PROTECTED]> writes: > Downloaded: bytes in 2 files > > Note missing number of bytes. This would indicate that the "%I64" format, which Wget uses to print the 64-bit "download sum", doesn't work for you. What does this program print? #include int main (void) { __int64 n = 100I64; // ten billion, doesn't fit in 32 bits printf("%I64\n", n); return 0; } It should print a line containing "100". If it does, it means we're applying the wrong format. If it doesn't, then we must find another way of printing LARGE_INT quantities on Windows.
Re: Removing thousand separators from file size output
Alain Bench <[EMAIL PROTECTED]> writes: > Removing separators will break existing apps parsing wget's output. > Such apps exist? They do exist, but *any* change in Wget's output will break them. Since they probably do the equivalent of sed s/,//g anyway, the removal of separators is likely to be the least of their problems. Maybe I was not clear enough in the "pasting" requirement from my first bullet point: by that I didn't refer to programmatical processing of Wget's whole output, but to hand-picking parts of it (such as file size or file name), and manually copy+pasting them to the shell or to bc. In that case sed is not trivially involved and yet the thousand separators *always* have to be removed. >> omitting the thousand separators merely removes redundancy, not useful >> information. > > That's true only if you assume the user analyses the /unit-size/ and > /kmt-size/ as a whole, as a unique info. But that's not always the case. > One may well look only at /unit-size/. Without seps, this user is forced > to count digits, or to look additionally to /kmt-size/, and do some > brainwork to find corresponding order of magnitude. For this user, sep > removal removes readability. Here you seem to assume that the typical user cares about and first looks at exact, to-the-byte figures. In my experience that is rarely the case -- in most cases, the user cares about the order of magnitude, such as "640K" or "42M", rather than the byte size. In fact, when I do need the exact size, it is exactly in order to be able to paste it to another program, such as emacs or bc, which Wget makes harder by inserting those separators! With the order of magnitude information being readily available in the form of the unit, Wget (at least for some uses) does me a disservice by adding that same information in the form of separators. > Unless a bigger unavoidable danger interferes. That's my humble > opinion, but I believe it's also some more general ergonomic > principle. If so, I have yet to see this principle in writing, or use an application that abides by it by default, the single exception being -- Wget. (And Wget doesn't accept grouped digits in numeric input, so it's inconsistent to boot.) Even number-oriented applications touted as user-friendly such as oocalc (and presumably Excel, but I don't have it around to verify) don't group digits by default. >> As for localization, I'm not against it. The argument was that, where >> possible, I prefer the output of applications to remain parsable. > > So we disagree only on the balance. I'd say output to humans should > be localized as much as possible, unless this creates a really serious > problem for the machine parsing secondary usage. You're right, my choice of balance leans more to the parsing side, although actual parsing is only part of the picture. For example, the ISO 8601 dates have the nice property that the simple textual sort orders them chronologically. This is useful for file names (e.g. log files), but also for easy sorting of textual date columns in spreadsheets and databases! In this case the computer didn't even try to make sense of the data, but its regularity helped make it more useful. (Of course, ISO 8601 dates also have the property of being easily parsable with either straightforward regexps or trivial C code, neither being the case for localized dates -- see GNU getdate.y.) > Where incompatible, human and machine output may be separated. An important point of the Unix philosophy is that, with some care, the same output can be served to humans and machines. (Piping the output of `du' or `wc' to sort is an example of doing both.) While that principle may be misguided and doesn't directly apply to the more human-oriented Wget's output, it can be applied with measure. I find it self-evident that it is better to at least be able to paste parts of output into other programs than to not be able to do so.
Re: Removing thousand separators from file size output
On Friday, June 24, 2005 at 6:45:44 PM +0200, Hrvoje Niksic wrote: > input for other applications, which is very hard with the thousand > separators. Pasting is very hard, parsing is not. An app running wget can easely parse it's output, whatever it is. If not directly then thru a wrapper. The problem is only with "side-apps" where user must copy/paste. How frequently is that used? Removing separators will break existing apps parsing wget's output. Such apps exist? > Alain Bench <[EMAIL PROTECTED]> writes: >> Humans can have habit to look at exact unit size, or rounded >> kilo/mega/tera size, or both. > omitting the thousand separators merely removes redundancy, not useful > information. That's true only if you assume the user analyses the /unit-size/ and /kmt-size/ as a whole, as a unique info. But that's not always the case. One may well look only at /unit-size/. Without seps, this user is forced to count digits, or to look additionally to /kmt-size/, and do some brainwork to find corresponding order of magnitude. For this user, sep removal removes readability. > If the users were so used to separators, they would surely request > them in other programs, such as `ls', `du', or `df'? Those 3 commands print numbers in right-aligned columns: The ergonomic need for seps is a little lower. And the "ls -l" filename truncation on 80 wide terms might be seen as a bigger annoyance: 3 seps added in size would mean 3 chars less in filename. And legacy behaviour *MUST* absolutly be retained for such old, widely used, and frequently machine-parsed commands. But anyway I would personally love to see separators here too. [localization] > You can make a case that the correct character and layout should be > used for digit grouping when it is deployed, but I don't see how you > can argue that grouping *must* be used in all applications! I agree. There are cases where localized grouping and even grouping alone are useless or harmfull: Each time the only or primary destination of a number is another app. But when the intendend reader is human, localized grouping *should* be used. Unless a bigger unavoidable danger interferes. That's my humble opinion, but I believe it's also some more general ergonomic principle. I am able to buy the small advantage over code complexity ratio argument you once explained. But I somewhat regret having to buy it. BTW my "locale thousands_sep" gives a " " non-breaking space, and "locale decimal_point" gives a "," comma. > As for localization, I'm not against it. The argument was that, where > possible, I prefer the output of applications to remain parsable. So we disagree only on the balance. I'd say output to humans should be localized as much as possible, unless this creates a really serious problem for the machine parsing secondary usage. Where incompatible, human and machine output may be separated. Say on option, or like GnuPG --status-fd simultaneously: Human reads stdout/err, while machine parses another fd. That's material for present debate, not my wish for wget. > I consider the ISO 8601 date format a clear advantage over the > asctime() format. ;-) Good example: I *hate* having to read 8601 dates. Nearly as much as having to read those other dates, localized or not, with month/day ambiguity. MHO only, here: I know some people love 8601. Bye!Alain. -- « if you believe subversive history books, I've got a bridge to sell you. »