Re: Strange character in file length

2006-03-09 Thread Alain Bench
 On Tuesday, March 7, 2006 at 8:56:15 +0100, Hrvoje Niksic wrote:

 Alain Bench [EMAIL PROTECTED] writes:
 unusable nonsense
 totally weird.

So far Google didn't help me much. Or rather discouraged me with
Borland supports only C locale-like statements. But I didn't found any
official doc. Anybody has infos on Borland's own libc?


 Either there is a magic option I missed, or I'd recommend to treat
 Borland as C locale (forcing coma separator and grouping by 3).
 I suggest we do the latter

The alternative seemingly would be: Change 3;0 to \003\000, and
transcode separator from GetACP() to GetConsoleOutputCP(). Would it work
in all cases? Which transcoding function? And this in a Borland specific
code...


 against 1.10.x? It's always a better idea to patch thed trunk code

Sure: I installed Subversion, and began learning it, just to put my
hands on the said trunk (found no tar.gz snapshots?). But then the
conflicting types for `uintptr_t' stopped me.


Bye!Alain.
-- 
When you post a new message, beginning a new topic, use the mail or
post or new message functions.
When you reply or followup, use the reply or followup functions.
Do not do the one for the other, this breaks or hijacks threads.


Re: Strange character in file length

2006-03-09 Thread Hrvoje Niksic
Alain Bench [EMAIL PROTECTED] writes:

 Sure: I installed Subversion, and began learning it, just to put my
 hands on the said trunk (found no tar.gz snapshots?).

There are no tar.gz snapshots yet.  They would be easy to
autogenerate, it's just that no one has volunteered to set up such a
script, nor was there much interest.

 But then the conflicting types for `uintptr_t' stopped me.

Simply #define HAVE_UINTPTR_T 1 in the MinGW section of
windows/config-compiler.h and you're done.  I'll now check in that fix
myself.

(Under Unix, that would be checked by configure and generated in
config.h.in-config.h translation.  Under Windows, config.h is split
in windows/config.h which contains settings common to all Win32
compilation environments, and windows/config-compiler.h, which
contains settings specific to different compilers.)


Re: Strange character in file length

2006-03-06 Thread Alain Bench
 On Thursday, March 2, 2006 at 22:28:28 +0100, Hrvoje Niksic wrote:

 you can get a free compiler from here:
 http://www.borland.com/downloads/download_cbuilder.html

Nice tip, thank you!


Bad news: Borland 5.5.1 seems to do locales in its own way. Not at
all as I explained here, about msvcrt.dll. Seems that:

 · setlocale() returns always a composite string with newlines (as if
categories were not identical). Example setlocale(LC_ALL, .852)
returns:

| LC_MONETARY=French_France.852
| LC_TIME=French_France.852
| LC_NUMERIC=French_France.852
| LC_COLLATE=French_France.852
| LC_CTYPE=French_France.852

 · setlocale() does French_France.850, not following chcp.

 · It doesn't know about .OCP nor .ACP, but still accepts C,
.1252 and such.

 · Whatever setlocale charset, localeconv() only outputs CP-1252.

 · localeconv() grouping gives an Ascii string 3;0 (or 3;2;0 for
Indian). Wget of course groups by... 51 digits (Ascii code of 3).

Seems a little bit like unusable nonsense to me. Either there is a
magic option I missed, or I'd recommend to treat Borland as C locale
(forcing coma separator and grouping by 3).


The test code works well on MinGW. Wget itself doesn't like the
unixish ./configure and make procedure under Msys 1.0.10, but I found
the configure.bat --mingw way to use directly MinGW 3.1.0. Wget 1.10.2
so compiles, and seemingly grouping, charset, and decimal point work
well with the attached patch.

BTW I had a problem compiling straight subversion trunk:

| F:\wget-r2129\srcmingw32-make.exe
| gcc -DWINDOWS -DHAVE_CONFIG_H -O3 -Wall -I.   -c -o cmpt.o cmpt.c
| In file included from wget.h:89,
|  from cmpt.c:43:
| sysdep.h:199: warning: redefinition of `uint32_t'
| c:/MinGW/include/stdint.h:32: warning: `uint32_t' previously declared here
| sysdep.h:215: conflicting types for `uintptr_t'
| c:/MinGW/include/stdint.h:61: previous declaration of `uintptr_t'
| mingw32-make.exe: *** [cmpt.o] Error 1


Bye!Alain.
-- 
How To Ask Questions The Smart Way
URL:http://www.catb.org/~esr/faqs/smart-questions.html


wget-1.10.2.win32-setlocale.1.patch.gz
Description: application/gunzip


Re: Strange character in file length

2006-03-06 Thread Hrvoje Niksic
Alain Bench [EMAIL PROTECTED] writes:

 Seems a little bit like unusable nonsense to me. Either there is a
 magic option I missed, or I'd recommend to treat Borland as C locale
 (forcing coma separator and grouping by 3).

That's totally weird.  I suggest we do the latter, as I don't think
all that many people use the Borland environment.

By the way, why does your patch work against 1.10.x?  It's always a
better idea to patch thed trunk code, as that's the one new features
get added to.


Re: Strange character in file length

2006-03-02 Thread Alain Bench
 On Thursday, March 2, 2006 at 7:51:43 +0100, Hrvoje Niksic wrote:

 Then the code could look like this:

Seems good to me. I can help testing, if someone compiles.


Bye!Alain.
-- 
Give your computer's unused idle processor cycles to a scientific goal:
The [EMAIL PROTECTED] project at URL:http://folding.stanford.edu/.


Re: Strange character in file length

2006-03-02 Thread Hrvoje Niksic
Alain Bench [EMAIL PROTECTED] writes:

  On Thursday, March 2, 2006 at 7:51:43 +0100, Hrvoje Niksic wrote:

 Then the code could look like this:

 Seems good to me. I can help testing, if someone compiles.

Note that you can get a free compiler from here:

http://www.borland.com/downloads/download_cbuilder.html

That's what I use to test Windows builds.  I imagine MinGW would be as
useful, but it seemed a larger download and a more complex setup.


Re: Strange character in file length

2006-03-01 Thread Alain Bench
 On Saturday, February 25, 2006 at 21:06:19 +0100, Hrvoje Niksic wrote:

 Is the current charset of the console ever really different than the
 default OEM charset?

They are identical by default. But the first can be changed in each
console window, while the later is fixed on a given Windows install.


 How does one change the console charset, anyway?

Thru chcp command in a cmd.exe session, or thru a call to
SetConsoleCP() or SetConsoleOutputCP() in an app.


 the setlocale invocation should look like this:

Hum... You dropped the fallback to ANSI when GetConsoleOutputCP()
returns 0. That's fine, if it's considered useless. But it could lead to
a setlocale(LC_ALL, .0), with unknown behaviour. Hopefully it then
fails, returning NULL, as it does in my setup. But I'm not sure it does
that in all setups.


 Wget is calling setlocale(LC_ALL, ) only if HAVE_NLS is defined,
 which is typically not the case on Windows, as HAVE_NLS implies
 existence of gettext, textdomain, and bindtextdomain.

Also utils.c:get_grouping_data() does setlocale(LC_NUMERIC, )
temporarily. For some platforms including Windows, doing LC_NUMERIC
alone is not guaranteed to have the desired effect. Example on Windows
the call with .850 succeeds, but selects the default ANSI locale (or
whatever was set by LC_ALL).

In fact it seems to me that all manipulations of a category alone,
outside of LC_ALL, are calling for problems on this or that platform.
Especially when the charsets are incompatible between categories (up to
segfaults, when cooperating with some buggy Glibc versions).

Now, if the main setlocale(LC_ALL) is always called, I believe that
get_grouping_data() can be greatly simplified, dropping the
#ifdef LC_NUMERIC setlocale(LC_NUMERIC). And just calling localeconv()
if it exists.


 Testers able to compile Wget and reproduce this problem would be much
 appreciated.

Beware: The default console font Terminal having only an OEM/DOS
script is unable to correctly follow chcp commands. A smarter font
like Lucida Console is better suited for testing.


Bye!Alain.
-- 
When you want to reply to a mailing list, please avoid doing so with
Lotus Notes 5. This lacks necessary references and breaks threads.


Re: Strange character in file length

2006-03-01 Thread Hrvoje Niksic
Alain Bench [EMAIL PROTECTED] writes:

 the setlocale invocation should look like this:

 Hum... You dropped the fallback to ANSI when GetConsoleOutputCP()
 returns 0.

Ah, I didn't know it could return 0.  The code was based on your
description, which said Call GetConsoleOutputCP(), get [for] example
850, build a string

 That's fine, if it's considered useless. But it could lead to
 a setlocale(LC_ALL, .0), with unknown behaviour.

It's easy enough to handle a 0 return code.  What should we use then?
Would  be appropriate in that case?

 Wget is calling setlocale(LC_ALL, ) only if HAVE_NLS is defined,
 which is typically not the case on Windows, as HAVE_NLS implies
 existence of gettext, textdomain, and bindtextdomain.

 Also utils.c:get_grouping_data() does setlocale(LC_NUMERIC, )
 temporarily.

Only in Wget 1.10.  This code has been removed from the repository --
take a look at http://svn.dotsrc.org/repo/wget/trunk/src/utils.c .

 In fact it seems to me that all manipulations of a category alone,
 outside of LC_ALL, are calling for problems on this or that
 platform.

I agree, which is why I removed such fiddlings for the next release.
The original reasons for doing that (suspect in their own right) are
far outweighed by the potential problems.


Re: Strange character in file length

2006-03-01 Thread Alain Bench
 On Wednesday, March 1, 2006 at 16:13:17 +0100, Hrvoje Niksic wrote:

 Alain Bench [EMAIL PROTECTED] writes:
 fallback to ANSI when GetConsoleOutputCP() returns 0.
 I didn't know it could return 0.

I don't know exactly how, but it can. Apparently a graphic frontend
starting a text mode command without a console can arrange
GetConsoleOutputCP() to return 0 to the text command. The command should
then output ANSI text, probably not for direct display, but for
processing by the graphic app. The only example I heard about is
The Bat!™ mailer calling GnuPG as crypto tool.


 Would  be appropriate in that case?

Yes: setlocale(LC_ALL, ) should always select the ANSI charset,
suitable for graphic mode apps. Finally GetACP() is not needed, as 
does implicitly the same.


Bye!Alain.
-- 
When you want to reply to a mailing list, please avoid doing so with
Hushmail. This lacks necessary references and breaks threads.


Re: Strange character in file length

2006-02-25 Thread Alain Bench
Hi Hrvoje,

 On Tuesday, February 21, 2006 at 21:35:24 +0100, Hrvoje Niksic wrote:

 Valery Kondakoff [EMAIL PROTECTED] writes:
 wrong ANSI/OEM character encoding
 What are the steps a Windows console program needs to do to perform
 this conversion correctly?

Call setlocale(LC_ALL, .OCP) which will select the default OEM
charset of the current Windows language. OCP means OEM Code Page, and
console apps by default need to use this OEM charset: Probably CP-852
for you, CP-850 for me, and so on. Here this setlocale .OCP returns
French_France.850.

Another possibly better way, able to follow the current charset of
the console (not only the default): Call GetConsoleOutputCP(), get
example 850, build a string .850 with the dot, and call
setlocale(LC_ALL, .850). Problem: Not every combination of language,
country, and charset is possible. So deal with errors (setlocale returns
NULL), and fallback to .OCP.

Finally if GetConsoleOutputCP() fails returning 0, call GetACP()
instead, as a fallback. This might eventually suit graphic frontends,
which would need an ANSI codepage output.


I don't have what's needed to compile wget on Windows, otherwise I
would have done a patch. MinGW32 and MSYS can't build wget, right?
Anyway I attach a demo program:

| C:\home\abchcp
| Page de codes active : 850# French console default
|
| C:\home\abwin32-console-locale.exe
| locale=French_France.850
| codepage=850
| thousands_sep=  (code FF)   # no-break space in CP-850
|
| C:\home\abchcp 28591 # that's Latin-1 code page
| Page de codes activeá: 28591
|
| C:\home\abwin32-console-locale.exe
| locale=French_France.28591
| codepage=28591
| thousands_sep=á (code A0)   # no-break space in Latin-1


Bye!Alain.
-- 
When you post a new message, beginning a new topic, use the mail or
post or new message functions.
When you reply or followup, use the reply or followup functions.
Do not do the one for the other, this breaks or hijacks threads.
#include stdio.h
#include locale.h
#include windows.h

Set_the_locale_for_the_fine_win32_console () {
  char *locale;
  int codepage;
  char param[42];

  codepage=GetConsoleOutputCP();
  if (codepage) {
sprintf(param, .%d, codepage);
locale=setlocale(LC_ALL, param);/* use current console OEM 
charset */
if (locale == NULL) {
  locale=setlocale(LC_ALL, .OCP); /* use system default OEM 
charset */
}
  }
  else {
locale=setlocale(LC_ALL, );   /* use ANSI charset (for 
graphic apps) */
  }

  printf(locale=%s\ncodepage=%d\n, locale, codepage ? codepage : GetACP());
}

main () {
  struct lconv *lconv;

  Set_the_locale_for_the_fine_win32_console();

  lconv=localeconv();
  printf(thousands_sep=\%s\ (code %02X)\n,
lconv-thousands_sep,
(unsigned char)lconv-thousands_sep[0]);
}


Re: Strange character in file length

2006-02-25 Thread Hrvoje Niksic
Alain Bench [EMAIL PROTECTED] writes:

 Call setlocale(LC_ALL, .OCP) which will select the default OEM
 charset of the current Windows language. OCP means OEM Code Page,
 and console apps by default need to use this OEM charset: Probably
 CP-852 for you, CP-850 for me, and so on. Here this setlocale .OCP
 returns French_France.850.

 Another possibly better way, able to follow the current charset of
 the console (not only the default):

Is the current charset of the console ever really different than the
default OEM charset?  How does one change the console charset,
anyway?

 Call GetConsoleOutputCP(), get example 850, build a string .850
 with the dot, and call setlocale(LC_ALL, .850). Problem: Not every
 combination of language, country, and charset is possible. So deal
 with errors (setlocale returns NULL), and fallback to .OCP.

Thanks for the detailed description.  In that case, the setlocale
invocation should look like this:

#ifdef WINDOWS
  {
char console_code_page[32];
snprintf (console_code_page, sizeof console_code_page,
  .%u, GetConsoleOutputCP ());
if (!setlocale (LC_ALL, console_code_page))
  setlocale (LC_ALL, .OCP);
  }
#else
  setlocale (LC_ALL, );
#endif

But I've now noticed another issue: Wget is calling setlocale(LC_ALL,
) only if HAVE_NLS is defined, which is typically not the case on
Windows, as HAVE_NLS implies existence of gettext, textdomain, and
bindtextdomain.

Along with the above patch, someone should also try to move the call
to setlocale() outside #ifdef HAVE_NLS -- maybe that would be enough
to fix the problem.  Testers able to compile Wget and reproduce this
problem would be much appreciated.


Re: Strange character in file length

2006-02-21 Thread Hrvoje Niksic
Valery Kondakoff [EMAIL PROTECTED] writes:

 I'm not a programmer, so I may be wrong, but I'm pretty sure the
 problem lies in wrong ANSI/OEM character encoding conversion.

I've seen that mentioned before, but I don't know what it refers to.
What are the steps a Windows console program needs to do to perform
this conversion correctly?

 At least I see the '.' character hardcoded as the thousand separator
 in the download speed counter.

I don't understand that -- the download speed counter is printed
exactly the same way as the length parameter.  And the '.' character
is not hardcoded, ',' is (and only so for locales that don't define a
thousand separator).


Re: Strange character in file length

2006-02-20 Thread Hrvoje Niksic
Valery Kondakoff [EMAIL PROTECTED] writes:

 When downloading wget displays 'a' character insted of '.' (dot) in
 a file length.  Here is a screenshot
 http://www.nncron.ru/temp/wget.jpg (GNU Wget 1.10.1 under WinXP
 SP2). Is this a bug or this is intentional behaviour? Am I doing
 smth wrong?

It's a bug.  The a character is probably the thousand separator, but
I don't understand why it is displayed in that way.  If you understand
Windows C programming, take a look at the function `add_thousand_seps'
in utils.c (http://svn.dotsrc.org/repo/wget/branches/1.10/src/utils.c)
and see if you can spot anything wrong.  The thousand grouping data
comes directly from the call to localeconv(), which is ANSI C.