subject:"Re\: default character encoding for everything in debian"

Re: default character encoding for everything in debian

2009-08-14 Thread Osamu Aoki

Hi,

(I want to see as much UTF-8 support.  These days, it is not bad.  Try
using sed with UTF-8.  It works!  Of course with some understandable 
gliches.)

On Mon, Aug 10, 2009 at 08:55:27PM +0200, Norbert Preining wrote:
 On Mo, 10 Aug 2009, Roger Leigh wrote:
  Of course there's a penalty for certain operations.  But UTF-8 is about
  as compact as an extended encoding is going to get.
 
 Rubbish. You know why in Japan and other Asian countries UTF8 is not
 so common? Because many of their glyphs need 4 (four!) bytes, while
 for example jis-2022 (AFAIR) is much more compact.

Hmmm... not the best example here, ... technically if you are talking
size.  We got too many encodings for Japanese.  You see too many ESC
code for jis-2022.
 
 We are not living in an ASCII world anymore.

True.

Our choice of encoding is not much to do with size.  It is inertia and
backward compatibility.

FACTS:

Many Japanese e-mail uses jis-2022 for compatibility.  (E-mail was safe
only for 7 bit data in old days).  

As far as data size goes, compact popular ones are EUC(Unix) or S-JIS(MS
system). These are used in web pages etc. still.  These are as small as
UTF-16/UCS-2 used for many Unicode data internally.

But please note new MAC and XP/Vista/... use Unicode and I see many
files can be in UTF-8.  So things are changing.

Osamu


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-12 Thread Giacomo A. Catenazzi

Bastian Blank wrote:
 On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote:
 In article 20090811183800.ge5...@const.famille.thibault.fr you wrote:
 Not necessarily.  Any sane implementation should just use wchar_t
 Which could be UTF16 and therefore still has complicatd length semantics. 
 
 No, wchar_t is UCS-4 (or UCS-2 in esoteric implementations like
 Windows).

No wchar_t is locale dependent (per POSIX).

BTW on gcc:

-fwide-exec-charset=charset
Set the wide execution character set, used for wide string and
character constants. The default is UTF-32 or UTF-16, whichever
corresponds to the width of wchar_t. As with -fexec-charset, charset can
be any encoding supported by the system's iconv library routine;
however, you will have problems with encodings that do not fit exactly
in wchar_t.

Note that default encoding is UTF-8, thus giving a UTF-32 wchar_t
in most developer machines.

ciao
cate


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-12 Thread Samuel Thibault

Giacomo A. Catenazzi, le Wed 12 Aug 2009 07:54:33 +0200, a écrit :
 Samuel Thibault wrote:
  Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit :
  while length(str) in any language up to the 1990s was a mere
  substraction, now we must go through the string checking each byte to
  see if it is a Unicode marker and substract the appropriate number of
  bytes.
  
  Not necessarily.  Any sane implementation should just use wchar_t and
  substraction gets back.
 
 An implementation that use wchar_t is usually not sane, but usually
 it is (also) buggy.

Why? It's just about using wide functions instead of usual functions.

 PS: note that the binary encoding depend on compiler environment (but
 such info is not exported).

See my other mail.  A lot of things can be made to depend on the
compiler environment.

Samuel


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-12 Thread Samuel Thibault

Giacomo A. Catenazzi, le Wed 12 Aug 2009 08:03:30 +0200, a écrit :
 Bastian Blank wrote:
  On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote:
  In article 20090811183800.ge5...@const.famille.thibault.fr you wrote:
  Not necessarily.  Any sane implementation should just use wchar_t
  Which could be UTF16 and therefore still has complicatd length semantics. 
  
  No, wchar_t is UCS-4 (or UCS-2 in esoteric implementations like
  Windows).
 
 No wchar_t is locale dependent (per POSIX).

What do you mean?  The compiler can't know the locale in advance for
the width and endianness.  The value might depend on the locale, yes,
but that's not a problem as long as you convert into UTF-8 before
communicating with other applications.

One same systems (Debian systems are), it's just always UCS-4.

 BTW on gcc:
 
 -fwide-exec-charset=charset
 Set the wide execution character set, used for wide string and
 character constants.

It hurts when I shoot myself in the foot.

 The default is UTF-32 or UTF-16, whichever corresponds to the width of
 wchar_t.

This documentation is bogus BTW.  It should read UCS-4 or UCS-2.

 Note that default encoding is UTF-8, thus giving a UTF-32 wchar_t
 in most developer machines.

I don't understand this sentence.

Samuel


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-12 Thread Roger Leigh

On Wed, Aug 12, 2009 at 09:56:49AM +0200, Samuel Thibault wrote:
 Giacomo A. Catenazzi, le Wed 12 Aug 2009 08:03:30 +0200, a écrit :
  Bastian Blank wrote:
   On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote:
   In article 20090811183800.ge5...@const.famille.thibault.fr you wrote:
   Not necessarily.  Any sane implementation should just use wchar_t
   Which could be UTF16 and therefore still has complicatd length 
   semantics. 
   
   No, wchar_t is UCS-4 (or UCS-2 in esoteric implementations like
   Windows).
  
  No wchar_t is locale dependent (per POSIX).
 
 What do you mean?  The compiler can't know the locale in advance for
 the width and endianness.  The value might depend on the locale, yes,
 but that's not a problem as long as you convert into UTF-8 before
 communicating with other applications.
 
 One same systems (Debian systems are), it's just always UCS-4.

Specifically, __STDC_ISO_10646__ is defined to indicate that wchar_t
is always UCS-4 in all locales.

  BTW on gcc:
  
  -fwide-exec-charset=charset
  Set the wide execution character set, used for wide string and
  character constants.
 
 It hurts when I shoot myself in the foot.

This feature of GCC is one of the more obscure areas of locale
handling.  How does the encoding of strings at the level of
individial translation units work with a single per-process
global locale and C formatted I/O?  Curious minds would like to
know!

  The default is UTF-32 or UTF-16, whichever corresponds to the width of
  wchar_t.
 
 This documentation is bogus BTW.  It should read UCS-4 or UCS-2.

It's strictly correct according to the standard.
http://en.wikipedia.org/wiki/UTF-32/UCS-4 for an overview.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?   http://gutenprint.sourceforge.net/
   `-GPG Public Key: 0x25BFB848   Please GPG sign your mail.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-12 Thread Roger Leigh

On Wed, Aug 12, 2009 at 07:54:33AM +0200, Giacomo A. Catenazzi wrote:
 Samuel Thibault wrote:
  Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit :
  while length(str) in any language up to the 1990s was a mere
  substraction, now we must go through the string checking each byte to
  see if it is a Unicode marker and substract the appropriate number of
  bytes.
  
  Not necessarily.  Any sane implementation should just use wchar_t and
  substraction gets back.
 
 An implementation that use wchar_t is usually not sane, but usually
 it is (also) buggy. It is very difficult (AFAIK not impossible,
 but I'm not so sure) to write portable (POSIX way, so with changing
 locales) programs using wchar_t.

Do you have any concrete examples to back up these assertions?

They worked perfectly well for me last time I checked.  There were
bugs in the distant past, but I don't see any issues with current
GCC/libc.

BTW, since POSIX/SUS are a superset of the standard C library, they
contain all of the same wide character handling functionality.  I'm
not sure what you're getting at with the changing locales; SUS
locale functionality like setlocale() comes directly from C with no
changes.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?   http://gutenprint.sourceforge.net/
   `-GPG Public Key: 0x25BFB848   Please GPG sign your mail.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-12 Thread Thomas Koch

It's impressing how quickly threads on this list grow big. :-)

I'm not sure, whether a conclusion is already reached.

1. apt-get install mysql
2. enter mysql client
3. create database test; create table test( test char(10) );

Replace mysql with whatever application you like.

What should be the encoding of database and table test in cases like the 
above?

Currently it's iso-something, discriminating everybody from other countries.
If it would be utf-8 instead, it would have at least two advantages

- The clueless user would get a sane default
- utf-8 isn't as discriminating as iso-8859-1

Best regards,

Thomas Koch

 Hi,

 I've an issue, that I forgot to set the character encoding of tomcat to
 utf-8 after reinstalling a server.
 Now, before I report a wishlist(?) bug to tomcat, I want to ask (and invite
 to discuss) shouldn't utf8 be the default character set everywhere? So when
 installing a package from Debian I can assume that where a character
 encoding can be set, it't set to utf8.
 MySQL would be another example, which to my knowledge uses isoXYZ as
 default character encoding.

 Best regards,

 Thomas Koch, http://www.koch.ro

Thomas Koch, http://www.koch.ro


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-12 Thread Roger Leigh

On Wed, Aug 12, 2009 at 01:18:12PM +0200, Thomas Koch wrote:
 I'm not sure, whether a conclusion is already reached.
 
 1. apt-get install mysql
 2. enter mysql client
 3. create database test; create table test( test char(10) );
 
 Replace mysql with whatever application you like.
 
 What should be the encoding of database and table test in cases like the 
 above?
 
 Currently it's iso-something, discriminating everybody from other countries.
 If it would be utf-8 instead, it would have at least two advantages
 
 - The clueless user would get a sane default
 - utf-8 isn't as discriminating as iso-8859-1

UTF-8 is the sane default choice in this situation, so long as MySQL
is capable of handling it.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?   http://gutenprint.sourceforge.net/
   `-GPG Public Key: 0x25BFB848   Please GPG sign your mail.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-12 Thread Samuel Thibault

Roger Leigh, le Wed 12 Aug 2009 11:30:50 +0100, a écrit :
   The default is UTF-32 or UTF-16, whichever corresponds to the width of
   wchar_t.
  
  This documentation is bogus BTW.  It should read UCS-4 or UCS-2.
 
 It's strictly correct according to the standard.
 http://en.wikipedia.org/wiki/UTF-32/UCS-4 for an overview.

« except that the UTF-32 standard has additional Unicode
semantics. »

In UTF-32 mode, gcc introduces a BOM, and in UTF-16 it allows without
warnings characters after U+.

Samuel


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-12 Thread Harald Braumann

On Wed, 12 Aug 2009 13:03:30 +0100
Roger Leigh rle...@codelibre.net wrote:

 On Wed, Aug 12, 2009 at 01:18:12PM +0200, Thomas Koch wrote:
  I'm not sure, whether a conclusion is already reached.
  
  1. apt-get install mysql
  2. enter mysql client
  3. create database test; create table test( test char(10) );
  
  Replace mysql with whatever application you like.
  
  What should be the encoding of database and table test in cases
  like the above?
  
  Currently it's iso-something, discriminating everybody from other
  countries. If it would be utf-8 instead, it would have at least two
  advantages
  
  - The clueless user would get a sane default
  - utf-8 isn't as discriminating as iso-8859-1
 
 UTF-8 is the sane default choice in this situation, so long as MySQL
 is capable of handling it.

Is that a real problem? Usually applications that use a SQL DB come
with some script to set up the schema. If they want UTF-8, they will
create a table with UTF-8 encoding. I wouldn't change MySQL's default
without reason, because old scripts might rely on that behaviour.

Those applications, however, should be configured to use UTF-8 by
default (if they support it) and their DB setup scripts accordingly.

Cheers,
harry


signature.asc
Description: PGP signature

Re: default character encoding for everything in debian

2009-08-12 Thread Roger Leigh

On Wed, Aug 12, 2009 at 11:44:36PM +0200, Harald Braumann wrote:
 On Wed, 12 Aug 2009 13:03:30 +0100
 Roger Leigh rle...@codelibre.net wrote:
 
  On Wed, Aug 12, 2009 at 01:18:12PM +0200, Thomas Koch wrote:
   I'm not sure, whether a conclusion is already reached.
   
   1. apt-get install mysql
   2. enter mysql client
   3. create database test; create table test( test char(10) );
   
   Replace mysql with whatever application you like.
   
   What should be the encoding of database and table test in cases
   like the above?
   
   Currently it's iso-something, discriminating everybody from other
   countries. If it would be utf-8 instead, it would have at least two
   advantages
   
   - The clueless user would get a sane default
   - utf-8 isn't as discriminating as iso-8859-1
  
  UTF-8 is the sane default choice in this situation, so long as MySQL
  is capable of handling it.
 
 Is that a real problem? Usually applications that use a SQL DB come
 with some script to set up the schema. If they want UTF-8, they will
 create a table with UTF-8 encoding. I wouldn't change MySQL's default
 without reason, because old scripts might rely on that behaviour.

Those old scripts which don't specify an encoding *are already buggy*
due to not saying what they want, implying that the default (whatever
that might be) is fine.

There's the possibility that this might cause some problems, but they
are problems in the script, not in MySQL.  Keeping using an obsolete
encoding like Latin 1 (or whatever the default currently is) prevents
any breakage, but at the expense of moving to a sane default for the
future.

 Those applications, however, should be configured to use UTF-8 by
 default (if they support it) and their DB setup scripts accordingly.

They should indeed, but if they don't then they need to explicitly
spell out what they *do* support.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?   http://gutenprint.sourceforge.net/
   `-GPG Public Key: 0x25BFB848   Please GPG sign your mail.


signature.asc
Description: Digital signature

Re: default character encoding for everything in debian

2009-08-12 Thread Harald Braumann

On Thu, 13 Aug 2009 02:03:43 +0100
Roger Leigh rle...@codelibre.net wrote:

 On Wed, Aug 12, 2009 at 11:44:36PM +0200, Harald Braumann wrote:
  On Wed, 12 Aug 2009 13:03:30 +0100
  Roger Leigh rle...@codelibre.net wrote:
  
   On Wed, Aug 12, 2009 at 01:18:12PM +0200, Thomas Koch wrote:
I'm not sure, whether a conclusion is already reached.

1. apt-get install mysql
2. enter mysql client
3. create database test; create table test( test char(10) );

Replace mysql with whatever application you like.

What should be the encoding of database and table test in cases
like the above?

Currently it's iso-something, discriminating everybody from
other countries. If it would be utf-8 instead, it would have at
least two advantages

- The clueless user would get a sane default
- utf-8 isn't as discriminating as iso-8859-1
   
   UTF-8 is the sane default choice in this situation, so long as
   MySQL is capable of handling it.
  
  Is that a real problem? Usually applications that use a SQL DB come
  with some script to set up the schema. If they want UTF-8, they will
  create a table with UTF-8 encoding. I wouldn't change MySQL's
  default without reason, because old scripts might rely on that
  behaviour.
 
 Those old scripts which don't specify an encoding *are already
 buggy* due to not saying what they want, implying that the default
 (whatever that might be) is fine.
Agreed. Still no need to break them on purpose.

 
 There's the possibility that this might cause some problems, but they
 are problems in the script, not in MySQL.  Keeping using an obsolete
 encoding like Latin 1 (or whatever the default currently is) prevents
 any breakage, but at the expense of moving to a sane default for the
 future.
I really don't care too much about the specific case of MySQL, as I
hardly ever create or manipulate SQL data by hand. All I was saying
and you seem to be saying as well, if I understand you correctly,
is that it is the duty of the application that creates and uses SQL
tables to specify the encoding, if it cares about it. If the
application does that, it will work, no matter what default is
specified for MySQL. So this specific case is a non-issue, IMO, and
MySQL's default doesn't need to be changed. But if it is, just for the
sake of it, then that's fine with me. Some scripts might break, but OK.

 
  Those applications, however, should be configured to use UTF-8 by
  default (if they support it) and their DB setup scripts accordingly.
 
 They should indeed, but if they don't then they need to explicitly
 spell out what they *do* support.
The should.

Cheers,
harry


signature.asc
Description: PGP signature

Re: default character encoding for everything in debian

2009-08-11 Thread Gunnar Wolf

Norbert Preining dijo [Mon, Aug 10, 2009 at 08:55:27PM +0200]:
 On Mo, 10 Aug 2009, Roger Leigh wrote:
  Of course there's a penalty for certain operations.  But UTF-8 is about
  as compact as an extended encoding is going to get.
 
 Rubbish. You know why in Japan and other Asian countries UTF8 is not
 so common? Because many of their glyphs need 4 (four!) bytes, while
 for example jis-2022 (AFAIR) is much more compact.
 
 We are not living in an ASCII world anymore.

It's not that much about the size as it is about backwards
compatibility. We users of Latin-based alphabets migrate easily to
UTF8, with occassional problems where we use diacritics. Eastern Asian
encodings are _completely_ incompatible with UTF8, so it is just not
possible to tolerate broken text every now and then. Everything just
breaks completely.

-- 
Gunnar Wolf • gw...@gwolf.org • (+52-55)5623-0154 / 1451-2244


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-11 Thread Gunnar Wolf

Harald Braumann dijo [Tue, Aug 11, 2009 at 01:33:58AM +0200]:
  There are a lot of users out there that are not willing to pay the
  price for increased generality.
 
 Don't you mean s/users/programmers? As a user I don't see what price I
 pay. I only see advantages in having a consistent encoding. Which,
 btw., doesn't have to be UTF-8. In an ideal world every programme would
 adhere to LC_CTYPE. But if the encoding has to be configured then I
 would also prefer UTF-8 as the default.
 
 Of course, for the programmer there might be a price to pay. And if
 he's not willing to pay it, he can't be forced, anyway.
 
 Or do you mean the user pays the price, because if the encoding is set
 to UTF-8 then performance would suffer? In that case, I'd love to see
 some real life numbers. I doubt the difference would be noticeable. 

Yes, performance will suffer. We enjoyed many decades of blissfully
ignoring the difference between a character and a byte. So, while
length(str) in any language up to the 1990s was a mere substraction,
now we must go through the string checking each byte to see if it is a
Unicode marker and substract the appropriate number of bytes. Also,
for a very long time we didn't really care much what was a buffer's
content - Everything could be printed, even if it had control
characters which made you beep (with the ocassional control sequence
re-injecting output into the terminal as input). Now... Well, printing
an unprintable string can cause segfaults in some cases.

-- 
Gunnar Wolf • gw...@gwolf.org • (+52-55)5623-0154 / 1451-2244


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-11 Thread Samuel Thibault

Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit :
 while length(str) in any language up to the 1990s was a mere
 substraction, now we must go through the string checking each byte to
 see if it is a Unicode marker and substract the appropriate number of
 bytes.

Not necessarily.  Any sane implementation should just use wchar_t and
substraction gets back.  The width of the text is another matter, but
it's a problem for truetype rendering anyway.  What is still costly is
then the conversion, which in principle only happens while talking with
other programs (files/socket/etc.)

Samuel


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-11 Thread Bernd Eckenfels

In article 20090811182041.gd19...@cajita.gateway.2wire.net you wrote:
 encodings are _completely_ incompatible with UTF8, so it is just not
 possible to tolerate broken text every now and then. Everything just
 breaks completely.

Or everything works out of the box, when you use it correctly...

Gruss
Bernd


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-11 Thread Bernd Eckenfels

In article 20090811183800.ge5...@const.famille.thibault.fr you wrote:
 Not necessarily.  Any sane implementation should just use wchar_t

Which could be UTF16 and therefore still has complicatd length semantics. 
And even with UTF32 there are combining characters.  Sadly.  But the length
could be defined in code units - its just a question how usefull it is.

Gruss
Bernd


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-11 Thread Bastian Blank

On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote:
 In article 20090811183800.ge5...@const.famille.thibault.fr you wrote:
  Not necessarily.  Any sane implementation should just use wchar_t
 Which could be UTF16 and therefore still has complicatd length semantics. 

No, wchar_t is UCS-4 (or UCS-2 in esoteric implementations like
Windows).

Bastian

-- 
Phasers locked on target, Captain.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-11 Thread Samuel Thibault

Bernd Eckenfels, le Tue 11 Aug 2009 21:40:35 +0200, a écrit :
 In article 20090811183800.ge5...@const.famille.thibault.fr you wrote:
  Not necessarily.  Any sane implementation should just use wchar_t
 
 Which could be UTF16 and therefore still has complicatd length semantics. 

??

wchar_t may be 32 or 16bit (in which case it can't express unicode after
U+), but it's still meant to have the simple length semantics.

 And even with UTF32 there are combining characters.

Which account for one character. Then there is a problem of rendering
width of course, but as I said it's there anyway as soon as you have
a font with varying letter widths, string manipulation don't pose any
problem anyway.

 But the length could be defined in code units - its just a question
 how usefull it is.

Of course.  It's rarely useful to take into account character width
yourself, unless you are rendering on a tty, but then speed usually
doesn't matter and you can afford calling wcswidth() on your string
as late as possible.

Samuel


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-11 Thread Jakub Wilk


* Bastian Blank wa...@debian.org, 2009-08-11, 22:24:

 Not necessarily.  Any sane implementation should just use wchar_t
Which could be UTF16 and therefore still has complicatd length semantics.


No, wchar_t is UCS-4 (or UCS-2 in esoteric implementations like
Windows).


And in the most esoteric (while still conforming to the C standard) 
implementations it is not related to Unicode at all.


--
Jakub Wilk


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-11 Thread Adam Borowski

On Mon, Aug 10, 2009 at 09:04:37PM +0100, Roger Leigh wrote:
 If having a C.UTF-8 locale always available for system services is
 required for them to fully support UTF-8, then that needs adding to
 glibc.

It would also bring significant speed increase.  Since about everything
calls setlocale(), having the locale internal speeds up the typical process
startup sequence by 20%!  And that's 20% of the whole thing from fork(),
through link, up to getopt(), so it's not a speedup you can shake a stick at.
I'm speaking about having the locale supported natively by glibc, of course;
what the udeb does is merely shipping a generated locale file.

 For a locale available after /usr is mounted, a simple localedef
 invocation is all that's needed; for all times, after starting init,
 it needs the tables compiling into glibc as for the standard C locale.
 I've been looking at how to do the latter, but I'm not expert with the
 3-level locale tables and other glibc internals, so if anyone who
 knows the details of glibc locales could provide me with
 assistance/guidance here, that would be much appreciated.
 
 For reference, this is bug #522776.  This would be great to have as a
 release goal for Squeeze, and (speculatively) a native C UTF-8 locale
 for Squeeze+1 to give us a default pure UTF-8 system from end-to-end.

I'm not an expert with glibc internals too, but a couple of years ago I
researched the issue a bit.  Apparently, there are only two first-class
locales: C and POSIX, all other get loaded from the disk.  In the past,
en_US.ISO-8859-1 and ru_RU.KOI8-R were such first-class ones as well, but
that's no more.  What I'd propose would be making C.UTF-8 built in.

Another possible optimization would be building the table used by 8-bit
isalpha/etc on the fly for all locales.  Iconving 128 characters is
certainly faster than opening a file on the disk, and (sanely) glibc doesn't
support character classification contrary to Unicode so this could result in
completely nuking all LC_CTYPE files for other locales as well.

-- 
1KB // Microsoft corollary to Hanlon's razor:
//  Never attribute to stupidity what can be
//  adequately explained by malice.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-11 Thread Harald Braumann

On Tue, 11 Aug 2009 13:28:08 -0500
Gunnar Wolf gw...@gwolf.org wrote:

 Harald Braumann dijo [Tue, Aug 11, 2009 at 01:33:58AM +0200]:
   There are a lot of users out there that are not willing to pay the
   price for increased generality.
  
  Don't you mean s/users/programmers? As a user I don't see what
  price I pay. I only see advantages in having a consistent encoding.
  Which, btw., doesn't have to be UTF-8. In an ideal world every
  programme would adhere to LC_CTYPE. But if the encoding has to be
  configured then I would also prefer UTF-8 as the default.
  
  Of course, for the programmer there might be a price to pay. And if
  he's not willing to pay it, he can't be forced, anyway.
  
  Or do you mean the user pays the price, because if the encoding is
  set to UTF-8 then performance would suffer? In that case, I'd love
  to see some real life numbers. I doubt the difference would be
  noticeable. 
 
 Yes, performance will suffer. We enjoyed many decades of blissfully
 ignoring the difference between a character and a byte. 

Well, a byte with the most significant bit always set to 0.

 So, while
 length(str) in any language up to the 1990s was a mere substraction,
 now we must go through the string checking each byte to see if it is a
 Unicode marker and substract the appropriate number of bytes. Also,
 for a very long time we didn't really care much what was a buffer's
 content - 

And in these glorious times more often than not unintelligible
rubbish was produced if you happened to not use a language that can be
written in ASCII. But this is besides the point. I do appreciate that
support for different character encodings causes pain for the
programmer. But the original post was about software that already
has got support for UTF-8 and whether it wouldn't be good idea to
configure it this way by default.
 
 Everything could be printed, even if it had control
 characters which made you beep (with the ocassional control sequence
 re-injecting output into the terminal as input). Now... Well, printing
 an unprintable string can cause segfaults in some cases.

My terminal supports UTF-8. I thought that this is not an issue any
more.

Cheers,
harry


signature.asc
Description: PGP signature

Re: default character encoding for everything in debian

2009-08-11 Thread Giacomo A. Catenazzi

Samuel Thibault wrote:
 Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit :
 while length(str) in any language up to the 1990s was a mere
 substraction, now we must go through the string checking each byte to
 see if it is a Unicode marker and substract the appropriate number of
 bytes.
 
 Not necessarily.  Any sane implementation should just use wchar_t and
 substraction gets back.

An implementation that use wchar_t is usually not sane, but usually
it is (also) buggy. It is very difficult (AFAIK not impossible,
but I'm not so sure) to write portable (POSIX way, so with changing
locales) programs using wchar_t.

The only way I know is to use sanely the wchar_t is to use as the simple
C standard requirements: only one runtime environment and locale.

PS: note that the binary encoding depend on compiler environment (but
such info is not exported).

ciao
cate


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-10 Thread Siggy Brentrup

On Mon, Aug 10, 2009 at 13:09 +0200, Thomas Koch wrote:
 Hi,
 
 I've an issue, that I forgot to set the character encoding of tomcat to utf-8 
 after reinstalling a server.
 Now, before I report a wishlist(?) bug to tomcat, I want to ask (and invite 
 to 
 discuss) shouldn't utf8 be the default character set everywhere? So when 
 installing a package from Debian I can assume that where a character encoding 
 can be set, it't set to utf8.
 MySQL would be another example, which to my knowledge uses isoXYZ as default 
 character encoding.

While utf-8 covers the broadest set of character glyphs possible, it
suffers from size as well as performance penalties. Characters no
longer are guaranteed to fit in a byte, how do you define
strlen(utf8_string) c pp.  All these issues have been solved but not
for free.

There are a lot of users out there that are not willing to pay the price
for increased generality.

just my 2¢
  Siggy
-- 
Please don't Cc: me when replying, I might not see either copy.
   bsb-at-psycho-dot-informationsanarchistik-dot-de
   or:bsb-at-psycho-dot-i21k-dot-de
O ascii ribbon campaign - stop html mail - www.asciiribbon.org


signature.asc
Description: Digital signature

Re: default character encoding for everything in debian

2009-08-10 Thread Giacomo A. Catenazzi


Thomas Koch wrote:

Hi,

I've an issue, that I forgot to set the character encoding of tomcat to utf-8 
after reinstalling a server.
Now, before I report a wishlist(?) bug to tomcat, I want to ask (and invite to 
discuss) shouldn't utf8 be the default character set everywhere? So when 
installing a package from Debian I can assume that where a character encoding 
can be set, it't set to utf8.
MySQL would be another example, which to my knowledge uses isoXYZ as default 
character encoding.


There are different problems.

Future debian systems will have a UTF-8 charset as default.
Look at debian-policy archives.
A lot of debian files will be encoded in utf-8 (control, changelog
and manpages), and transformed in the needed charset runtime.

But for databases there are different issues. I think the best solution
is to do it as mediawiki: the UTF-8 data in put as binary blob: it is
difficult to have database engines and system libraries syncronized, and
it is also difficult to implement support for all Unicode characters.

But let to concentrate to the first task: having a good UTF-8 support
in all programs/terminals/etc.

ciao
cate


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-10 Thread Michal Čihař

Hi

Dne Mon, 10 Aug 2009 13:09:21 +0200
Thomas Koch tho...@koch.ro napsal(a):

 I've an issue, that I forgot to set the character encoding of tomcat to utf-8 
 after reinstalling a server.
 Now, before I report a wishlist(?) bug to tomcat, I want to ask (and invite 
 to 
 discuss) shouldn't utf8 be the default character set everywhere? So when 
 installing a package from Debian I can assume that where a character encoding 
 can be set, it't set to utf8.
 MySQL would be another example, which to my knowledge uses isoXYZ as default 
 character encoding.

I don't know tomcat, but for MySQL it would definitely break some
existing applications (which are broken and do not care about charsets,
but that's different topic).

-- 
Michal Čihař | http://cihar.com | http://blog.cihar.com


signature.asc
Description: PGP signature

Re: default character encoding for everything in debian

2009-08-10 Thread Josselin Mouette

Le lundi 10 août 2009 à 14:06 +0200, Giacomo A. Catenazzi a écrit :
 But let to concentrate to the first task: having a good UTF-8 support
 in all programs/terminals/etc.

This task should have been completed for etch.

Now we could concentrate on removing from the archive programs without
proper UTF8 support.

Cheers,
-- 
 .''`.  Josselin Mouette
: :' :
`. `'   “I recommend you to learn English in hope that you in
  `- future understand things”  -- Jörg Schilling


signature.asc
Description: Ceci est une partie de message numériquement signée

Re: default character encoding for everything in debian

2009-08-10 Thread Russ Allbery

Josselin Mouette j...@debian.org writes:

 Now we could concentrate on removing from the archive programs without
 proper UTF8 support.

There are, sadly, some very useful programs with no adequate replacement
that don't have UTF-8 support.  tf5, for instance.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-10 Thread Roger Leigh

On Mon, Aug 10, 2009 at 01:45:40PM +0200, Siggy Brentrup wrote:
 On Mon, Aug 10, 2009 at 13:09 +0200, Thomas Koch wrote:
  Hi,
  
  I've an issue, that I forgot to set the character encoding of tomcat to 
  utf-8 
  after reinstalling a server.
  Now, before I report a wishlist(?) bug to tomcat, I want to ask (and invite 
  to 
  discuss) shouldn't utf8 be the default character set everywhere? So when 
  installing a package from Debian I can assume that where a character 
  encoding 
  can be set, it't set to utf8.
  MySQL would be another example, which to my knowledge uses isoXYZ as 
  default 
  character encoding.
 
 While utf-8 covers the broadest set of character glyphs possible, it
 suffers from size as well as performance penalties. Characters no
 longer are guaranteed to fit in a byte, how do you define
 strlen(utf8_string) c pp.  All these issues have been solved but not
 for free.

Of course there's a penalty for certain operations.  But UTF-8 is about
as compact as an extended encoding is going to get.

 There are a lot of users out there that are not willing to pay the price
 for increased generality.

These users will need to change their character encoding to something else.
But the Debian default should remain UTF-8.  Those not willing to pay the
flexibility/performance tradeoff are the exception, and will need to
customise their environment accordingly.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?   http://gutenprint.sourceforge.net/
   `-GPG Public Key: 0x25BFB848   Please GPG sign your mail.


signature.asc
Description: Digital signature

Re: default character encoding for everything in debian

2009-08-10 Thread Norbert Preining

On Mo, 10 Aug 2009, Roger Leigh wrote:
 Of course there's a penalty for certain operations.  But UTF-8 is about
 as compact as an extended encoding is going to get.

Rubbish. You know why in Japan and other Asian countries UTF8 is not
so common? Because many of their glyphs need 4 (four!) bytes, while
for example jis-2022 (AFAIR) is much more compact.

We are not living in an ASCII world anymore.

Best wishes

Norbert

---
Dr. Norbert Preining prein...@logic.atVienna University of Technology
Debian Developer prein...@debian.org Debian TeX Group
gpg DSA: 0x09C5B094  fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
---
CHICAGO (n.)
The foul-smelling wind which precedes an underground railway train.
--- Douglas Adams, The Meaning of Liff


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-10 Thread Philipp Kern

On 2009-08-10, Norbert Preining prein...@logic.at wrote:
 On Mo, 10 Aug 2009, Roger Leigh wrote:
 Of course there's a penalty for certain operations.  But UTF-8 is about
 as compact as an extended encoding is going to get.
 Rubbish. You know why in Japan and other Asian countries UTF8 is not
 so common? Because many of their glyphs need 4 (four!) bytes, while
 for example jis-2022 (AFAIR) is much more compact.
 We are not living in an ASCII world anymore.

Really because of the size?  We are not living in a byte beancounting
world anymore.  At worst you double the *text* size (we're not talking
about images or anything, which are far larger), going from 2 bytes
that you need anyway to four.  ISO 2022 also wastes one bit per byte
to be 7bit safe.  If I read the Wikipedia article correctly at least
the JP escaping only needs to be put into the document once.  (Well,
or maybe several times switching back and forth if you're embedding
latin-encoded words into the text.)

Maybe I'm an ignorant European but I'm not sure that equation still
holds.  Of course there are certain tradeoffs about latin characters
being the privileged few to get a short encoding, but that doesn't
make UTF-8 bad per se to call it rubbish.

Kind regards,
Philipp Kern


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-10 Thread Siggy Brentrup

On Mon, Aug 10, 2009 at 19:53 +0100, Roger Leigh wrote:
 On Mon, Aug 10, 2009 at 01:45:40PM +0200, Siggy Brentrup wrote:

  While utf-8 covers the broadest set of character glyphs possible, it
  suffers from size as well as performance penalties. Characters no
  longer are guaranteed to fit in a byte, how do you define
  strlen(utf8_string) c pp.  All these issues have been solved but not
  for free.

 Of course there's a penalty for certain operations.  But UTF-8 is about
 as compact as an extended encoding is going to get.

It's not Huffman (just kidding :), stating the obvious you're trading
time efficiency for space efficiency.

  There are a lot of users out there that are not willing to pay the price
  for increased generality.

 These users will need to change their character encoding to something else.
 But the Debian default should remain UTF-8.  Those not willing to pay the
 flexibility/performance tradeoff are the exception, and will need to
 customise their environment accordingly.

Either my memory is wrong or I seem to have missed some fundamental
change in Debian Policy during my 5 year of absence.  From those days
I seem to remember that Debian supported use of low end machines in
the past while they seem to be deprecated now as I was told in another
thread on d-u iirc.  Call me a dinosaur, I'm not yet decided how to
think about this.

Regards
  Siggy
-- 
Please don't Cc: me when replying, I might not see either copy.
   bsb-at-psycho-dot-informationsanarchistik-dot-de
   or:bsb-at-psycho-dot-i21k-dot-de
O ascii ribbon campaign - stop html mail - www.asciiribbon.org


signature.asc
Description: Digital signature

Re: default character encoding for everything in debian

2009-08-10 Thread Norbert Preining

On Mo, 10 Aug 2009, Philipp Kern wrote:
  Of course there's a penalty for certain operations.  But UTF-8 is about
  as compact as an extended encoding is going to get.
[...]
 make UTF-8 bad per se to call it rubbish.

I didn't call utf-8 itself rubbish, I am myself a strong proponent for
utf-8, only your quote that it is about as compact as an extended encoding
is going to get.

OTOH, I agree that UTF-8 is the way to go in general computing, I have had
too much pain with all those local encodings around the world.

Best wishes

Norbert

---
Dr. Norbert Preining prein...@logic.atVienna University of Technology
Debian Developer prein...@debian.org Debian TeX Group
gpg DSA: 0x09C5B094  fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
---
HUTTOFT (n.)
The fibrous algae which grows in the dark, moist environment of
trouser turn-ups.
--- Douglas Adams, The Meaning of Liff


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Re: default character encoding for everything in debian

2009-08-10 Thread Roger Leigh

On Mon, Aug 10, 2009 at 02:06:44PM +0200, Giacomo A. Catenazzi wrote:
 Thomas Koch wrote:
 I've an issue, that I forgot to set the character encoding of
 tomcat to utf-8 after reinstalling a server.
 Now, before I report a wishlist(?) bug to tomcat, I want to ask
 (and invite to discuss) shouldn't utf8 be the default character
 set everywhere? So when installing a package from Debian I can
 assume that where a character encoding can be set, it't set to
 utf8.
 MySQL would be another example, which to my knowledge uses isoXYZ
 as default character encoding.
 
 Future debian systems will have a UTF-8 charset as default.
 Look at debian-policy archives.

For system users, yes, assuming you are talking about the C.UTF-8
proposal.  For normal users, UTF-8 has been the default since
Lenny.

If having a C.UTF-8 locale always available for system services is
required for them to fully support UTF-8, then that needs adding to
glibc. For a locale available after /usr is mounted, a simple localedef
invocation is all that's needed; for all times, after starting init,
it needs the tables compiling into glibc as for the standard C locale.
I've been looking at how to do the latter, but I'm not expert with the
3-level locale tables and other glibc internals, so if anyone who
knows the details of glibc locales could provide me with
assistance/guidance here, that would be much appreciated.

For reference, this is bug #522776.  This would be great to have as a
release goal for Squeeze, and (speculatively) a native C UTF-8 locale
for Squeeze+1 to give us a default pure UTF-8 system from end-to-end.

 A lot of debian files will be encoded in utf-8 (control, changelog
 and manpages), and transformed in the needed charset runtime.

I think will here implies it's something to be done in the future,
but it's a requirement right now, and all but a few exceptions are
already converted.

 But for databases there are different issues. I think the best solution
 is to do it as mediawiki: the UTF-8 data in put as binary blob: it is
 difficult to have database engines and system libraries syncronized, and
 it is also difficult to implement support for all Unicode characters.

PostgreSQL seems to manage it without problems.  Putting text in as a
binary blob obviates most uses for having in a database in the first
place.  Sorting, indexing and querying requires being able to read it!

Note that there are separate client and server (database) encodings for
text as well.  You may well get recoding between what the user sees and
what's actually stored in the database, potentially at several points.
Having UTF-8 on the server does not require it on the client (and vice
versa).

 But let to concentrate to the first task: having a good UTF-8 support
 in all programs/terminals/etc.

I think that part was already done quite some time ago.  Any program
that doesn't support UTF-8 is an exception, and should be fixed or
removed.

For the specific case of databases, what's being proposed here is
making the default UTF-8.  Existing databases should not be affected,
since they would retain their current encoding.  New databases should,
however, use UTF-8.  If a specific application needs a specific
encoding in order to function correctly, then it's that application's
responsibility to specify that when creating it i.e. overriding the
default.  If it doesn't do that already, it's already broken since it's
currently unspecified.


Regards,
Roger
-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?   http://gutenprint.sourceforge.net/
   `-GPG Public Key: 0x25BFB848   Please GPG sign your mail.


signature.asc
Description: Digital signature

Re: default character encoding for everything in debian

2009-08-10 Thread Roger Leigh

On Mon, Aug 10, 2009 at 09:49:34PM +0200, Norbert Preining wrote:
 On Mo, 10 Aug 2009, Philipp Kern wrote:
   Of course there's a penalty for certain operations.  But UTF-8 is about
   as compact as an extended encoding is going to get.
 [...]
  make UTF-8 bad per se to call it rubbish.
 
 I didn't call utf-8 itself rubbish, I am myself a strong proponent for
 utf-8, only your quote that it is about as compact as an extended encoding
 is going to get.

I should have qualified it with that is both 8-bit and backward-
compatible with ASCII.  Other encodings will be more compact, but
AFAIK there isn't a more compact UCS encoding, though UTF-16 /might/
be more compact for certain languages, albeit without any 8-bit
backward compatibility.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?   http://gutenprint.sourceforge.net/
   `-GPG Public Key: 0x25BFB848   Please GPG sign your mail.


signature.asc
Description: Digital signature

Re: default character encoding for everything in debian

2009-08-10 Thread brian m. carlson

On Mon, Aug 10, 2009 at 09:42:18PM +0100, Roger Leigh wrote:
 On Mon, Aug 10, 2009 at 09:49:34PM +0200, Norbert Preining wrote:
  I didn't call utf-8 itself rubbish, I am myself a strong proponent for
  utf-8, only your quote that it is about as compact as an extended encoding
  is going to get.
 
 I should have qualified it with that is both 8-bit and backward-
 compatible with ASCII.  Other encodings will be more compact, but
 AFAIK there isn't a more compact UCS encoding, though UTF-16 /might/
 be more compact for certain languages, albeit without any 8-bit
 backward compatibility.

Actually, SCSU and BOCU-1 are potentially more compact, assuming the
text can be compressed.  However, they are not backward-compatible with
ASCII; SCSU comes closer than BOCU-1.  As a practical matter, nobody of
any importance actually uses SCSU or BOCU-1, except for Reuters (with
SCSU).

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 713 440 7475 | http://crustytoothpaste.ath.cx/~bmc | My opinion only
OpenPGP: RSA v4 4096b 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187


signature.asc
Description: Digital signature

Re: default character encoding for everything in debian

2009-08-10 Thread Harald Braumann

On Mon, 10 Aug 2009 13:45:40 +0200
Siggy Brentrup deb...@psycho.i21k.de wrote:

 On Mon, Aug 10, 2009 at 13:09 +0200, Thomas Koch wrote:
  Hi,
  
  I've an issue, that I forgot to set the character encoding of
  tomcat to utf-8 after reinstalling a server.
  Now, before I report a wishlist(?) bug to tomcat, I want to ask
  (and invite to discuss) shouldn't utf8 be the default character set
  everywhere? So when installing a package from Debian I can assume
  that where a character encoding can be set, it't set to utf8.
  MySQL would be another example, which to my knowledge uses isoXYZ
  as default character encoding.
 
 While utf-8 covers the broadest set of character glyphs possible, it
 suffers from size as well as performance penalties. Characters no
 longer are guaranteed to fit in a byte, how do you define
 strlen(utf8_string) c pp.  All these issues have been solved but not
 for free.
 
 There are a lot of users out there that are not willing to pay the
 price for increased generality.

Don't you mean s/users/programmers? As a user I don't see what price I
pay. I only see advantages in having a consistent encoding. Which,
btw., doesn't have to be UTF-8. In an ideal world every programme would
adhere to LC_CTYPE. But if the encoding has to be configured then I
would also prefer UTF-8 as the default.

Of course, for the programmer there might be a price to pay. And if
he's not willing to pay it, he can't be forced, anyway.

Or do you mean the user pays the price, because if the encoding is set
to UTF-8 then performance would suffer? In that case, I'd love to see
some real life numbers. I doubt the difference would be noticeable. 

Cheers,
harry


signature.asc
Description: PGP signature

Re: default character encoding for everything in debian

2009-08-10 Thread Samuel Thibault

Harald Braumann, le Tue 11 Aug 2009 01:33:58 +0200, a écrit :
 Or do you mean the user pays the price, because if the encoding is set
 to UTF-8 then performance would suffer? In that case, I'd love to see
 some real life numbers. I doubt the difference would be noticeable. 

Google utf-8 grep performance loss.

Samuel


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

38 matches

Mail list logo