Re: default character encoding for everything in debian

2009-08-14 Thread Osamu Aoki
Hi, (I want to see as much UTF-8 support. These days, it is not bad. Try using sed with UTF-8. It works! Of course with some understandable gliches.) On Mon, Aug 10, 2009 at 08:55:27PM +0200, Norbert Preining wrote: On Mo, 10 Aug 2009, Roger Leigh wrote: Of course there's a penalty for

Re: default character encoding for everything in debian

2009-08-12 Thread Giacomo A. Catenazzi
Bastian Blank wrote: On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote: In article 20090811183800.ge5...@const.famille.thibault.fr you wrote: Not necessarily. Any sane implementation should just use wchar_t Which could be UTF16 and therefore still has complicatd length

Re: default character encoding for everything in debian

2009-08-12 Thread Samuel Thibault
Giacomo A. Catenazzi, le Wed 12 Aug 2009 07:54:33 +0200, a écrit : Samuel Thibault wrote: Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit : while length(str) in any language up to the 1990s was a mere substraction, now we must go through the string checking each byte to see if it

Re: default character encoding for everything in debian

2009-08-12 Thread Samuel Thibault
Giacomo A. Catenazzi, le Wed 12 Aug 2009 08:03:30 +0200, a écrit : Bastian Blank wrote: On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote: In article 20090811183800.ge5...@const.famille.thibault.fr you wrote: Not necessarily. Any sane implementation should just use wchar_t

Re: default character encoding for everything in debian

2009-08-12 Thread Roger Leigh
On Wed, Aug 12, 2009 at 09:56:49AM +0200, Samuel Thibault wrote: Giacomo A. Catenazzi, le Wed 12 Aug 2009 08:03:30 +0200, a écrit : Bastian Blank wrote: On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote: In article 20090811183800.ge5...@const.famille.thibault.fr you wrote:

Re: default character encoding for everything in debian

2009-08-12 Thread Roger Leigh
On Wed, Aug 12, 2009 at 07:54:33AM +0200, Giacomo A. Catenazzi wrote: Samuel Thibault wrote: Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit : while length(str) in any language up to the 1990s was a mere substraction, now we must go through the string checking each byte to see if

Re: default character encoding for everything in debian

2009-08-12 Thread Thomas Koch
It's impressing how quickly threads on this list grow big. :-) I'm not sure, whether a conclusion is already reached. 1. apt-get install mysql 2. enter mysql client 3. create database test; create table test( test char(10) ); Replace mysql with whatever application you like. What should be the

Re: default character encoding for everything in debian

2009-08-12 Thread Roger Leigh
On Wed, Aug 12, 2009 at 01:18:12PM +0200, Thomas Koch wrote: I'm not sure, whether a conclusion is already reached. 1. apt-get install mysql 2. enter mysql client 3. create database test; create table test( test char(10) ); Replace mysql with whatever application you like. What should

Re: default character encoding for everything in debian

2009-08-12 Thread Samuel Thibault
Roger Leigh, le Wed 12 Aug 2009 11:30:50 +0100, a écrit : The default is UTF-32 or UTF-16, whichever corresponds to the width of wchar_t. This documentation is bogus BTW. It should read UCS-4 or UCS-2. It's strictly correct according to the standard.

Re: default character encoding for everything in debian

2009-08-12 Thread Harald Braumann
On Wed, 12 Aug 2009 13:03:30 +0100 Roger Leigh rle...@codelibre.net wrote: On Wed, Aug 12, 2009 at 01:18:12PM +0200, Thomas Koch wrote: I'm not sure, whether a conclusion is already reached. 1. apt-get install mysql 2. enter mysql client 3. create database test; create table test(

Re: default character encoding for everything in debian

2009-08-12 Thread Roger Leigh
On Wed, Aug 12, 2009 at 11:44:36PM +0200, Harald Braumann wrote: On Wed, 12 Aug 2009 13:03:30 +0100 Roger Leigh rle...@codelibre.net wrote: On Wed, Aug 12, 2009 at 01:18:12PM +0200, Thomas Koch wrote: I'm not sure, whether a conclusion is already reached. 1. apt-get install mysql

Re: default character encoding for everything in debian

2009-08-12 Thread Harald Braumann
On Thu, 13 Aug 2009 02:03:43 +0100 Roger Leigh rle...@codelibre.net wrote: On Wed, Aug 12, 2009 at 11:44:36PM +0200, Harald Braumann wrote: On Wed, 12 Aug 2009 13:03:30 +0100 Roger Leigh rle...@codelibre.net wrote: On Wed, Aug 12, 2009 at 01:18:12PM +0200, Thomas Koch wrote: I'm

Re: default character encoding for everything in debian

2009-08-11 Thread Gunnar Wolf
Norbert Preining dijo [Mon, Aug 10, 2009 at 08:55:27PM +0200]: On Mo, 10 Aug 2009, Roger Leigh wrote: Of course there's a penalty for certain operations. But UTF-8 is about as compact as an extended encoding is going to get. Rubbish. You know why in Japan and other Asian countries UTF8 is

Re: default character encoding for everything in debian

2009-08-11 Thread Gunnar Wolf
Harald Braumann dijo [Tue, Aug 11, 2009 at 01:33:58AM +0200]: There are a lot of users out there that are not willing to pay the price for increased generality. Don't you mean s/users/programmers? As a user I don't see what price I pay. I only see advantages in having a consistent

Re: default character encoding for everything in debian

2009-08-11 Thread Samuel Thibault
Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit : while length(str) in any language up to the 1990s was a mere substraction, now we must go through the string checking each byte to see if it is a Unicode marker and substract the appropriate number of bytes. Not necessarily. Any sane

Re: default character encoding for everything in debian

2009-08-11 Thread Bernd Eckenfels
In article 20090811182041.gd19...@cajita.gateway.2wire.net you wrote: encodings are _completely_ incompatible with UTF8, so it is just not possible to tolerate broken text every now and then. Everything just breaks completely. Or everything works out of the box, when you use it correctly...

Re: default character encoding for everything in debian

2009-08-11 Thread Bernd Eckenfels
In article 20090811183800.ge5...@const.famille.thibault.fr you wrote: Not necessarily. Any sane implementation should just use wchar_t Which could be UTF16 and therefore still has complicatd length semantics. And even with UTF32 there are combining characters. Sadly. But the length could be

Re: default character encoding for everything in debian

2009-08-11 Thread Bastian Blank
On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote: In article 20090811183800.ge5...@const.famille.thibault.fr you wrote: Not necessarily. Any sane implementation should just use wchar_t Which could be UTF16 and therefore still has complicatd length semantics. No, wchar_t is

Re: default character encoding for everything in debian

2009-08-11 Thread Samuel Thibault
Bernd Eckenfels, le Tue 11 Aug 2009 21:40:35 +0200, a écrit : In article 20090811183800.ge5...@const.famille.thibault.fr you wrote: Not necessarily. Any sane implementation should just use wchar_t Which could be UTF16 and therefore still has complicatd length semantics. ?? wchar_t may be

Re: default character encoding for everything in debian

2009-08-11 Thread Jakub Wilk
* Bastian Blank wa...@debian.org, 2009-08-11, 22:24: Not necessarily. Any sane implementation should just use wchar_t Which could be UTF16 and therefore still has complicatd length semantics. No, wchar_t is UCS-4 (or UCS-2 in esoteric implementations like Windows). And in the most esoteric

Re: default character encoding for everything in debian

2009-08-11 Thread Adam Borowski
On Mon, Aug 10, 2009 at 09:04:37PM +0100, Roger Leigh wrote: If having a C.UTF-8 locale always available for system services is required for them to fully support UTF-8, then that needs adding to glibc. It would also bring significant speed increase. Since about everything calls setlocale(),

Re: default character encoding for everything in debian

2009-08-11 Thread Harald Braumann
On Tue, 11 Aug 2009 13:28:08 -0500 Gunnar Wolf gw...@gwolf.org wrote: Harald Braumann dijo [Tue, Aug 11, 2009 at 01:33:58AM +0200]: There are a lot of users out there that are not willing to pay the price for increased generality. Don't you mean s/users/programmers? As a user I don't

Re: default character encoding for everything in debian

2009-08-11 Thread Giacomo A. Catenazzi
Samuel Thibault wrote: Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit : while length(str) in any language up to the 1990s was a mere substraction, now we must go through the string checking each byte to see if it is a Unicode marker and substract the appropriate number of bytes.

Re: default character encoding for everything in debian

2009-08-10 Thread Siggy Brentrup
On Mon, Aug 10, 2009 at 13:09 +0200, Thomas Koch wrote: Hi, I've an issue, that I forgot to set the character encoding of tomcat to utf-8 after reinstalling a server. Now, before I report a wishlist(?) bug to tomcat, I want to ask (and invite to discuss) shouldn't utf8 be the default

Re: default character encoding for everything in debian

2009-08-10 Thread Giacomo A. Catenazzi
Thomas Koch wrote: Hi, I've an issue, that I forgot to set the character encoding of tomcat to utf-8 after reinstalling a server. Now, before I report a wishlist(?) bug to tomcat, I want to ask (and invite to discuss) shouldn't utf8 be the default character set everywhere? So when installing

Re: default character encoding for everything in debian

2009-08-10 Thread Michal Čihař
Hi Dne Mon, 10 Aug 2009 13:09:21 +0200 Thomas Koch tho...@koch.ro napsal(a): I've an issue, that I forgot to set the character encoding of tomcat to utf-8 after reinstalling a server. Now, before I report a wishlist(?) bug to tomcat, I want to ask (and invite to discuss) shouldn't utf8

Re: default character encoding for everything in debian

2009-08-10 Thread Josselin Mouette
Le lundi 10 août 2009 à 14:06 +0200, Giacomo A. Catenazzi a écrit : But let to concentrate to the first task: having a good UTF-8 support in all programs/terminals/etc. This task should have been completed for etch. Now we could concentrate on removing from the archive programs without proper

Re: default character encoding for everything in debian

2009-08-10 Thread Russ Allbery
Josselin Mouette j...@debian.org writes: Now we could concentrate on removing from the archive programs without proper UTF8 support. There are, sadly, some very useful programs with no adequate replacement that don't have UTF-8 support. tf5, for instance. -- Russ Allbery (r...@debian.org)

Re: default character encoding for everything in debian

2009-08-10 Thread Roger Leigh
On Mon, Aug 10, 2009 at 01:45:40PM +0200, Siggy Brentrup wrote: On Mon, Aug 10, 2009 at 13:09 +0200, Thomas Koch wrote: Hi, I've an issue, that I forgot to set the character encoding of tomcat to utf-8 after reinstalling a server. Now, before I report a wishlist(?) bug to tomcat, I

Re: default character encoding for everything in debian

2009-08-10 Thread Norbert Preining
On Mo, 10 Aug 2009, Roger Leigh wrote: Of course there's a penalty for certain operations. But UTF-8 is about as compact as an extended encoding is going to get. Rubbish. You know why in Japan and other Asian countries UTF8 is not so common? Because many of their glyphs need 4 (four!) bytes,

Re: default character encoding for everything in debian

2009-08-10 Thread Philipp Kern
On 2009-08-10, Norbert Preining prein...@logic.at wrote: On Mo, 10 Aug 2009, Roger Leigh wrote: Of course there's a penalty for certain operations. But UTF-8 is about as compact as an extended encoding is going to get. Rubbish. You know why in Japan and other Asian countries UTF8 is not so

Re: default character encoding for everything in debian

2009-08-10 Thread Siggy Brentrup
On Mon, Aug 10, 2009 at 19:53 +0100, Roger Leigh wrote: On Mon, Aug 10, 2009 at 01:45:40PM +0200, Siggy Brentrup wrote: While utf-8 covers the broadest set of character glyphs possible, it suffers from size as well as performance penalties. Characters no longer are guaranteed to fit in a

Re: default character encoding for everything in debian

2009-08-10 Thread Norbert Preining
On Mo, 10 Aug 2009, Philipp Kern wrote: Of course there's a penalty for certain operations. But UTF-8 is about as compact as an extended encoding is going to get. [...] make UTF-8 bad per se to call it rubbish. I didn't call utf-8 itself rubbish, I am myself a strong proponent for utf-8,

Re: default character encoding for everything in debian

2009-08-10 Thread Roger Leigh
On Mon, Aug 10, 2009 at 02:06:44PM +0200, Giacomo A. Catenazzi wrote: Thomas Koch wrote: I've an issue, that I forgot to set the character encoding of tomcat to utf-8 after reinstalling a server. Now, before I report a wishlist(?) bug to tomcat, I want to ask (and invite to discuss) shouldn't

Re: default character encoding for everything in debian

2009-08-10 Thread Roger Leigh
On Mon, Aug 10, 2009 at 09:49:34PM +0200, Norbert Preining wrote: On Mo, 10 Aug 2009, Philipp Kern wrote: Of course there's a penalty for certain operations. But UTF-8 is about as compact as an extended encoding is going to get. [...] make UTF-8 bad per se to call it rubbish. I

Re: default character encoding for everything in debian

2009-08-10 Thread brian m. carlson
On Mon, Aug 10, 2009 at 09:42:18PM +0100, Roger Leigh wrote: On Mon, Aug 10, 2009 at 09:49:34PM +0200, Norbert Preining wrote: I didn't call utf-8 itself rubbish, I am myself a strong proponent for utf-8, only your quote that it is about as compact as an extended encoding is going to get.

Re: default character encoding for everything in debian

2009-08-10 Thread Harald Braumann
On Mon, 10 Aug 2009 13:45:40 +0200 Siggy Brentrup deb...@psycho.i21k.de wrote: On Mon, Aug 10, 2009 at 13:09 +0200, Thomas Koch wrote: Hi, I've an issue, that I forgot to set the character encoding of tomcat to utf-8 after reinstalling a server. Now, before I report a wishlist(?) bug

Re: default character encoding for everything in debian

2009-08-10 Thread Samuel Thibault
Harald Braumann, le Tue 11 Aug 2009 01:33:58 +0200, a écrit : Or do you mean the user pays the price, because if the encoding is set to UTF-8 then performance would suffer? In that case, I'd love to see some real life numbers. I doubt the difference would be noticeable. Google utf-8 grep