On Sat, 2010-09-04 at 16:08 +0200, Bill Allombert wrote:
On Sat, Sep 04, 2010 at 12:37:07AM +0200, Samuel Thibault wrote:
Aurelien Jarno, le Fri 03 Sep 2010 19:16:40 +0200, a écrit :
On Fri, Sep 03, 2010 at 04:20:27PM +0200, Samuel Thibault wrote:
Roger Leigh, le Fri 03 Sep 2010 14:52:39
On Sat, Sep 04, 2010 at 12:37:07AM +0200, Samuel Thibault wrote:
Aurelien Jarno, le Fri 03 Sep 2010 19:16:40 +0200, a écrit :
On Fri, Sep 03, 2010 at 04:20:27PM +0200, Samuel Thibault wrote:
Roger Leigh, le Fri 03 Sep 2010 14:52:39 +0100, a écrit :
There were no objections to having a
Russ Allbery dixit:
I agree with others in this thread that having a UTF-8 locale without the
collation changes implied by en_US is very useful for various software
packages such as automated test suites that want reproducible results and
were originally written for the C locale.
Same for
On 03.09.2010 01:46, Russ Allbery wrote:
Samuel Thibaultsthiba...@debian.org writes:
Well, it's mostly
- some people saying it's useless,
- while other people saying I need it,
and also
- en_US.UTF-8 is just fine vs.
- en_US.UTF-8 sucks, we really need C.UTF-8 instead
without any
Thorsten Glaser, le Fri 03 Sep 2010 13:02:31 +, a écrit :
Russ Allbery dixit:
I agree with others in this thread that having a UTF-8 locale without the
collation changes implied by en_US is very useful for various software
packages such as automated test suites that want reproducible
Giacomo A. Catenazzi, le Fri 03 Sep 2010 15:26:47 +0200, a écrit :
BTW I think we should wait some more time. Last week I was on
debian-glibc list a bug: printf fails if it find an invalid UTF-8
character (when the locale uses UTF-8). Note it is allowed in POSIX,
which distinguish raw strings
On Fri, Sep 03, 2010 at 01:37:24AM +0200, Samuel Thibault wrote:
Russ Allbery, le Thu 02 Sep 2010 16:24:56 -0700, a écrit :
Generally what that means is that someone needs to digest the discussion
in the thread
Well, it's mostly
- some people saying it's useless,
- while other people
Roger Leigh, le Fri 03 Sep 2010 14:52:39 +0100, a écrit :
On Fri, Sep 03, 2010 at 01:37:24AM +0200, Samuel Thibault wrote:
without any convergence.
I think reading back through the entire log,
Thanks for having done it!
people who were initially
rather opposed to the proposal did come
Roger Leigh rle...@codelibre.net writes:
There were no objections to having a UTF-8 locale installed and
available by default, just to it *being* the default.
[…]
Would a less confusing way to make this distinction be to say something
like: “The minimal Debian installation must have a locale
Samuel Thibault dixit:
LC_CTYPE has differences between locales, transliterations notably. For
Oh, okay – good to know…
I'd say go on :)
OK.
(of course we'll need to wait for libc to provide the locale
(post-squeeze I guess) before changing the policy).
Sure. Maybe think of something to
Samuel Thibault dixit:
believe that's something that shouldn't break Squeeze at all.
I also believe it cannot possibly do that.
bye,
//mirabilos
--
“It is inappropriate to require that a time represented as
seconds since the Epoch precisely represent the number of
seconds between the
On Fri, Sep 03, 2010 at 04:20:27PM +0200, Samuel Thibault wrote:
Roger Leigh, le Fri 03 Sep 2010 14:52:39 +0100, a écrit :
On Fri, Sep 03, 2010 at 01:37:24AM +0200, Samuel Thibault wrote:
without any convergence.
I think reading back through the entire log,
Thanks for having done it!
Ben Finney ben+deb...@benfinney.id.au writes:
Would a less confusing way to make this distinction be to say something
like: “The minimal Debian installation must have a locale available that
uses the UTF-8 character encoding.”?
The other angle here is that it can't just be any UTF-8 locale,
Aurelien Jarno, le Fri 03 Sep 2010 19:16:40 +0200, a écrit :
On Fri, Sep 03, 2010 at 04:20:27PM +0200, Samuel Thibault wrote:
Roger Leigh, le Fri 03 Sep 2010 14:52:39 +0100, a écrit :
There were no objections to having a UTF-8 locale installed and
available by default, just to it *being*
Hello,
No news on this?
Hurd's console needs a UTF-8 locale to be able to use wcwidth() for
proper double-width support.
Note: debian-installer is already providing a C.UTF-8 locale to d-i
components, so it works there.
Samuel
--
To UNSUBSCRIBE, email to
Samuel Thibault sthiba...@debian.org writes:
No news on this?
Hurd's console needs a UTF-8 locale to be able to use wcwidth() for
proper double-width support.
Note: debian-installer is already providing a C.UTF-8 locale to d-i
components, so it works there.
Does libc in Debian provide a
Russ Allbery, le Thu 02 Sep 2010 15:53:50 -0700, a écrit :
Samuel Thibault sthiba...@debian.org writes:
No news on this?
Hurd's console needs a UTF-8 locale to be able to use wcwidth() for
proper double-width support.
Note: debian-installer is already providing a C.UTF-8 locale to d-i
Samuel Thibault sthiba...@debian.org writes:
Russ Allbery, le Thu 02 Sep 2010 15:53:50 -0700, a écrit :
Does libc in Debian provide a C.UTF-8 locale?
It doesn't yet but it's easy to do, that's not the question. See the
questions in the bug thread.
I think that's a prerequisite for doing
Russ Allbery, le Thu 02 Sep 2010 16:07:25 -0700, a écrit :
Samuel Thibault sthiba...@debian.org writes:
Russ Allbery, le Thu 02 Sep 2010 15:53:50 -0700, a écrit :
Does libc in Debian provide a C.UTF-8 locale?
It doesn't yet but it's easy to do, that's not the question. See the
Samuel Thibault sthiba...@debian.org writes:
Russ Allbery, le Thu 02 Sep 2010 16:07:25 -0700, a écrit :
Ah, then no, in that case there has been no progress. I don't believe
anyone is currently working on this.
Well, no work is needed, what is needed is to agree on what work to do.
That's
Russ Allbery, le Thu 02 Sep 2010 16:24:56 -0700, a écrit :
Generally what that means is that someone needs to digest the discussion
in the thread
Well, it's mostly
- some people saying it's useless,
- while other people saying I need it,
and also
- en_US.UTF-8 is just fine vs.
- en_US.UTF-8
Samuel Thibault sthiba...@debian.org writes:
Well, it's mostly
- some people saying it's useless,
- while other people saying I need it,
and also
- en_US.UTF-8 is just fine vs.
- en_US.UTF-8 sucks, we really need C.UTF-8 instead
without any convergence.
I think the way to get past
Thorsten Glaser wrote:
Albert Cahalan dixit:
Unless plain C goes UTF-8
Not going to happen, it’s not binary-safe. (I fought that in
MirBSD with the OPTU-8/16 encoding scheme.)
Why not? Note that usual functions work on bytes, not on characters, and
on POSIX utilities the old/classical
Giacomo A. Catenazzi dixit:
Not going to happen, it’s not binary-safe. (I fought that in
MirBSD with the OPTU-8/16 encoding scheme.)
Why not? Note that usual functions work on bytes
Not really.
The difference between 'tr u x' on binary files can, depending on
the implementation of tr (if it
Albert Cahalan dixit:
Any imperfection in a locale results in C, as ASCII as can be.
Yes, and C shall not imply latin1 but 7-bit ASCII but 8-bit
transparent.
//mirabilos
--
Sometimes they [people] care too much: pretty printers [and syntax highligh-
ting, d.A.] mechanically produce pretty
Albert Cahalan dixit:
Unless plain C goes UTF-8
Not going to happen, it’s not binary-safe. (I fought that in
MirBSD with the OPTU-8/16 encoding scheme.)
The stupid broken en_US.UTF-8 fucks up the sort order.
So true… (and paper size!)
We really need a do-nothing locale that follows the
Albert Cahalan dixit:
Giacomo A. Catenazzi writes:
I think nobody should use C or C.UTF-8 as user encoding.
I’d use it.
Debian doesn't ship a proper locale. I want sorting according
to the raw Unicode values.
Also called ASCIIbetically ☺ But C exists, C.UTF-8 doesn’t.
* All ISO8859 locales
Steve Langasek writes:
On Mon, Apr 06, 2009 at 05:33:35PM +, Thorsten Glaser wrote:
If you need a specific locale (as seems from mksh, not
sure if it is a bug in that program), you need to set it.
You can only set a locale on a glibc-based system if it's
installed beforehand, which root
Roger Leigh writes:
On Tue, Apr 07, 2009 at 09:24:38PM +0200, Adeodato Simó wrote:
+ Thorsten Glaser (Tue, 07 Apr 2009 18:54:59 +):
Except the ton which sets LC_ALL=C to get sane (parsable,
dependable, historically compatible) output.
These would then unset all other LC_* and LANG and
Andrew McMillan writes:
On Wed, 2009-04-08 at 10:15 +0200, Giacomo A. Catenazzi wrote:
So I've a question: what does UTF-8 mean in this context (C.UTF-8) ?
...
So given a character which is outside of the 0x00 = 0x7f range, in an
environment which does not specify an encoding, I would like to
Giacomo A. Catenazzi writes:
[Andrew McMillan probably]
I think nobody should use C or C.UTF-8 as user encoding.
And I really hope that Debian will try to convince user to
use a proper locale.
Debian doesn't ship a proper locale. I want sorting according
to the raw Unicode values. I want
FWIW, the installation-locale udeb provides a C.UTF-8 locale,
which d-i runs under. Takes about 168k.
--
see shy jo
signature.asc
Description: Digital signature
Thorsten Glaser wrote:
Giacomo A. Catenazzi dixit:
I think you misunderstand the mksh part of the problem.
mksh has two modi: a legacy mode, in which it does not make any
assumptions about charsets or encodings and is 8-bit clean and
mostly 8-bit transparent, safe a few mostly past bugs and
Giacomo A. Catenazzi dixit:
This is good way to do things!
Thanks.
Or a debhelper (or like) utility
that construct it for build needs.
That’s already done, as I said – vorlon gave me an idea, I implemented
it, it works, I uploaded a new mksh package… and then I saw someone’s
added it to the
Thorsten Glaser wrote:
Giacomo A. Catenazzi dixit:
a real locale), but in this case I would also test some UTF-16
or Asian locale (mksh should not assume UTF-8 in these cases).
It doesn’t. This test is already run for the C locale.
Besides, there are no UTF-16 or somesuch locales on UNIX®
Roger Leigh wrote:
I wasn't aware that this level of checking was performed, though
it does make sense. But, does it not reject non 7-bit input in the C
locale for completeness?
Should tools doing raw I/O not be using lower level interfaces
such as fread() and fwrite() rather than the
Andrew McMillan wrote:
On Tue, 2009-04-07 at 22:32 +0200, Adeodato Simó wrote:
It is my impression that more packages than mksh could use an UTF-8
locale at build time (I’m afraid I don’t have pointers, but I’m sure
I’ve come across at least a couple).
Wouldn’t it be just better to change
On Tue, Apr 07, 2009 at 10:47:00PM +, Thorsten Glaser wrote:
Roger Leigh dixit:
Are you sure?
Not entirely, but I recall fgetc (or was it fgetwc?)
being affected.
Ah, fgetc/fputc are specified in the standard as byte oriented
rather than character-oriented, so are probably
Roger Leigh wrote:
On Tue, Apr 07, 2009 at 09:24:38PM +0200, Adeodato Simó wrote:
+ Thorsten Glaser (Tue, 07 Apr 2009 18:54:59 +):
Except the ton which sets LC_ALL=C to get sane (parsable,
dependable, historically compatible) output.
These would then unset all other LC_* and LANG and
On Wed, Apr 08, 2009 at 09:41:18AM +0200, Giacomo A. Catenazzi wrote:
Roger Leigh wrote:
I wasn't aware that this level of checking was performed, though
it does make sense. But, does it not reject non 7-bit input in the C
locale for completeness?
Should tools doing raw I/O not be using
Roger Leigh wrote:
On Tue, Apr 07, 2009 at 10:36:20AM +0200, Giacomo A. Catenazzi wrote:
Roger Leigh wrote:
I can't help but feel that your reply completely missed the
purpose of what I want to do, and why. I hope the following
response clears things up.
I know that I missed the original
On Wed, 2009-04-08 at 10:15 +0200, Giacomo A. Catenazzi wrote:
So I've a question: what does UTF-8 mean in this context (C.UTF-8) ?
It is not a stupid question, and the answer is not the UTF-8 algorithm
to code/decode unicode.
I'm still thinking that you are confusing the various meanings.
On Wed, Apr 08, 2009 at 10:22:15AM +0200, Giacomo A. Catenazzi wrote:
Roger Leigh wrote:
On Tue, Apr 07, 2009 at 09:24:38PM +0200, Adeodato Simó wrote:
+ Thorsten Glaser (Tue, 07 Apr 2009 18:54:59 +):
Except the ton which sets LC_ALL=C to get sane (parsable,
dependable, historically
Andrew McMillan wrote:
On Wed, 2009-04-08 at 10:15 +0200, Giacomo A. Catenazzi wrote:
So I've a question: what does UTF-8 mean in this context (C.UTF-8) ?
It is not a stupid question, and the answer is not the UTF-8 algorithm
to code/decode unicode.
I'm still thinking that you are confusing
Giacomo A. Catenazzi dixit:
The locale C is already a UTF-8 compatible locale.
It is UTF-8 transparent but that's its pro and con.
It does not tell the system that UTF-8 encoding is to be used.
It basically says the encoding is none/unknown.
Why build need to depend to a locale?
[...]
For
On Wed, 2009-04-08 at 15:31 +0200, Giacomo A. Catenazzi wrote:
We have the same objective, but two different ways.
Indeed, but it seems to me that you are pushing for a much bigger change
than I am.
So the smallest step which is in the same direction both of us want to
go, is for *a* UTF-8
Roger Leigh wrote:
On Mon, Apr 06, 2009 at 11:09:17AM -0700, Steve Langasek wrote:
On Mon, Apr 06, 2009 at 05:33:35PM +, Thorsten Glaser wrote:
If you need a specific locale (as seems from mksh, not
sure if it is a bug in that program), you need to set it.
You can only set a locale on a
+ Thorsten Glaser (Tue, 07 Apr 2009 18:54:59 +):
Except the ton which sets LC_ALL=C to get sane (parsable,
dependable, historically compatible) output.
These would then unset all other LC_* and LANG and LANGUAGE,
and only set LC_CTYPE to C.UTF-8 to get old behaviour but
with UTF-8 (and
On Mon, Apr 06, 2009 at 10:56:25PM +0100, Roger Leigh wrote:
On Mon, Apr 06, 2009 at 04:18:59PM +0200, Bill Allombert wrote:
On Mon, Apr 06, 2009 at 02:06:55PM +0200, Thorsten Glaser wrote:
Package: debian-policy
Version: 3.8.1.0
Severity: wishlist
For the mksh regression tests,
Adeodato Simó dixit:
+ Thorsten Glaser (Tue, 07 Apr 2009 18:54:59 +):
Except the ton which sets LC_ALL=C to get sane (parsable,
dependable, historically compatible) output.
These would then unset all other LC_* and LANG and LANGUAGE,
and only set LC_CTYPE to C.UTF-8 to get old behaviour
On Tue, Apr 07, 2009 at 06:54:59PM +, Thorsten Glaser wrote:
Bill Allombert dixit:
Fortunately, since Sarge, debian-installer set LANG in
/etc/environment so programs almost never run under C locale anymore.
Except the ton which sets LC_ALL=C to get sane (parsable,
dependable,
Bill Allombert dixit:
Fortunately, since Sarge, debian-installer set LANG in
/etc/environment so programs almost never run under C locale anymore.
Except the ton which sets LC_ALL=C to get sane (parsable,
dependable, historically compatible) output.
These would then unset all other LC_* and
On Tue, Apr 07, 2009 at 09:24:38PM +0200, Adeodato Simó wrote:
+ Thorsten Glaser (Tue, 07 Apr 2009 18:54:59 +):
Except the ton which sets LC_ALL=C to get sane (parsable,
dependable, historically compatible) output.
These would then unset all other LC_* and LANG and LANGUAGE,
and
+ Steve Langasek (Mon, 06 Apr 2009 11:09:17 -0700):
On Mon, Apr 06, 2009 at 05:33:35PM +, Thorsten Glaser wrote:
If you need a specific locale (as seems from mksh, not
sure if it is a bug in that program), you need to set it.
You can only set a locale on a glibc-based system if it’s
Roger Leigh dixit:
However, I would ideally like the C/POSIX locales to be UTF-8
by default as on other systems (with a C.ASCII variant if required).
No, this has the potential to break, for example, tr(1).
I lived through that on MirBSD.
//mirabilos
--
“It is inappropriate to require that a
Adeodato Simó dixit:
I would go as far as suggesting that some package like libc6 itself
FWIW:
-rw-r--r-- 1 tg tg 238336 Apr 7 22:59 en_US.UTF-8/LC_CTYPE
It's not *that* much...
Finally, this stuff that Roger proposes about making “C” be UTF-8, and
create some C.ASCII for people needing
On Tue, 2009-04-07 at 22:32 +0200, Adeodato Simó wrote:
It is my impression that more packages than mksh could use an UTF-8
locale at build time (I’m afraid I don’t have pointers, but I’m sure
I’ve come across at least a couple).
Wouldn’t it be just better to change Debian’s default to
On Tue, Apr 07, 2009 at 10:36:20AM +0200, Giacomo A. Catenazzi wrote:
I can't help but feel that your reply completely missed the
purpose of what I want to do, and why. I hope the following
response clears things up.
Roger Leigh wrote:
On Mon, Apr 06, 2009 at 11:09:17AM -0700, Steve Langasek
On Tue, Apr 07, 2009 at 09:00:50PM +, Thorsten Glaser wrote:
Adeodato Simó dixit:
I would go as far as suggesting that some package like libc6 itself
FWIW:
-rw-r--r-- 1 tg tg 238336 Apr 7 22:59 en_US.UTF-8/LC_CTYPE
It's not *that* much...
Finally, this stuff that Roger
Roger Leigh dixit:
But, does it not reject non 7-bit input in the C
locale for completeness?
No, it doesn't - we (before my time though, I think) fought
hard for eight-bit transparence and eight-bit cleanliness.
Should tools doing raw I/O not be using lower level interfaces
such as fread() and
On Tue, Apr 07, 2009 at 10:01:16PM +, Thorsten Glaser wrote:
Roger Leigh dixit:
But, does it not reject non 7-bit input in the C
locale for completeness?
No, it doesn't - we (before my time though, I think) fought
hard for eight-bit transparence and eight-bit cleanliness.
Should
Roger Leigh dixit:
Are you sure?
Not entirely, but I recall fgetc (or was it fgetwc?)
being affected.
//mirabilos
--
“It is inappropriate to require that a time represented as
seconds since the Epoch precisely represent the number of
seconds between the referenced time and the Epoch.”
Package: debian-policy
Version: 3.8.1.0
Severity: wishlist
For the mksh regression tests, I need a UTF-8 locale working; most
systems either provide “en_US.UTF-8” or “en_US.utf8” with the former
being recommended.
Build-depending on locales-all has worked for me so far, except it
won’t do in
Thorsten Glaser wrote:
For the mksh regression tests, I need a UTF-8 locale working; most
systems either provide “en_US.UTF-8” or “en_US.utf8” with the former
being recommended.
Build-depending on locales-all has worked for me so far, except it
won’t do in Kubuntu where said package does not
On Mon, Apr 06, 2009 at 02:06:55PM +0200, Thorsten Glaser wrote:
Package: debian-policy
Version: 3.8.1.0
Severity: wishlist
For the mksh regression tests, I need a UTF-8 locale working; most
systems either provide “en_US.UTF-8” or “en_US.utf8” with the former
being recommended.
Hello
Giacomo A. Catenazzi dixit:
If you need a specific locale (as seems from mksh, not
sure if it is a bug in that program), you need to set it.
You can only set a locale on a glibc-based system if it’s
installed beforehand, which root needs to do.
Why does mksh need UTF-8?
The regression tests
Bill Allombert dixit:
What about LC_COLLATE (which is a major problem with sort(1)) ?
1:1, just like the C locale does.
What about packages that run before /usr is mounted ?
They do not have /usr/*/locale/ anyway. This is a glibc problem.
What about embedded systems with tight space
On Mon, Apr 06, 2009 at 05:33:35PM +, Thorsten Glaser wrote:
If you need a specific locale (as seems from mksh, not
sure if it is a bug in that program), you need to set it.
You can only set a locale on a glibc-based system if it’s
installed beforehand, which root needs to do.
You can
On Mon, Apr 06, 2009 at 11:09:17AM -0700, Steve Langasek wrote:
On Mon, Apr 06, 2009 at 05:33:35PM +, Thorsten Glaser wrote:
If you need a specific locale (as seems from mksh, not
sure if it is a bug in that program), you need to set it.
You can only set a locale on a glibc-based
On Mon, Apr 06, 2009 at 11:09:17AM -0700, Steve Langasek wrote:
On Mon, Apr 06, 2009 at 05:33:35PM +, Thorsten Glaser wrote:
If you need a specific locale (as seems from mksh, not
sure if it is a bug in that program), you need to set it.
You can only set a locale on a glibc-based
On Mon, Apr 06, 2009 at 04:18:59PM +0200, Bill Allombert wrote:
On Mon, Apr 06, 2009 at 02:06:55PM +0200, Thorsten Glaser wrote:
Package: debian-policy
Version: 3.8.1.0
Severity: wishlist
For the mksh regression tests, I need a UTF-8 locale working; most
systems either provide
71 matches
Mail list logo