Re: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2009-04-08 Thread Giacomo A. Catenazzi

Roger Leigh wrote:
  I wasn't aware that this level of checking was performed, though

it does make sense.  But, does it not reject non 7-bit input in the C
locale for completeness?

Should tools doing raw I/O not be using lower level interfaces
such as fread() and fwrite() rather than the formatted print
functions which are specified to behave in a locale-dependent
manner? 


printf is not locale dependent, but on numeric display
(and eventually on some extensions).

ciao
cate


--
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2009-04-08 Thread Giacomo A. Catenazzi

Andrew McMillan wrote:

On Tue, 2009-04-07 at 22:32 +0200, Adeodato Simó wrote:

It is my impression that more packages than mksh could use an UTF-8
locale at build time (I’m afraid I don’t have pointers, but I’m sure
I’ve come across at least a couple).

Wouldn’t it be just better to change Debian’s default to make an UTF-8
locale available by default, rather than to force all those packages to
play tricks with LOCPATH?


I too would really like to see a UTF-8 locale available by default, and
would prefer to see this be the C.UTF-8 locale, which doesn't screw with
the collation / character type settings like any other UTF-8 locale
would.

It seems to me that the consensus here is that having a UTF-8 locale
available is a good idea and I don't hear any very strong argument
against such a change.

Consequently I think we should move on from the discussion and start
working out a patch to resolve this in policy.


So I've a question: what does UTF-8 mean in this context (C.UTF-8) ?

It is not a stupid question, and the answer is not the UTF-8 algorithm
to code/decode unicode.
I'm still thinking that you are confusing the various meanings.
And until I understand the problem, I cannot propose a solution.

- terminals should be sensible to charsets, on choosing how to display
  things
- programs should be sensible to locales (topic of this discussion):
  the locales provides some charsets dependent strings, and interpretation
  of the various characters, but (usually) they MUST NOT translate characters.

Anyway:

The locale C is already a UTF-8 compatible locale.
No? so what it misses?
- other alphabetic, numeric, currency, whitespace characters?  But not UTF-8
  local provides all characters: they define only the needed range for the
  language [see wikipedia, which should code UTF-8 as binary for this reason].
  The C spoken language require only ASCII-7 (or maybe only a subrange of 
it).
  So why we need further characters?
  Note: whitespace are restricted in C locale by POSIX, in only two values

  We could use charset UTF-8 for C locale, declaring unused/illegal all
  c  127.  Whould this solve the problems with mksh? I don't think so,
  so what you need in this C.UTF-8?

I still think that en_US.UTF-8 is the right default (note:
I'm not a US citizen, nor I speak English).

The installation will install the correct locale, so the en_US period is very
short (we'll dominate them ;-) ).

On debootstrap/pbuild/... things are different.  But if it this the problem,
let check a solution for building environment (and I still think that in this
env en_US.UTF-8 could be nice.

But I'll prefer a simple basic ASCII-7 C for basic/plain build, and only
after packager thinks if it is a bug or a feature to have a specific build with
UTF-8, it should manually set it.
Why build need to depend to a locale?
UNIX way is to allow to compile things for remote (maybe other OS, other arch)
system.
For testing? So why not test various locales (UTF-8, but also other non
ascii based encodings)

ciao
cate




--
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2009-04-08 Thread Roger Leigh
On Tue, Apr 07, 2009 at 10:47:00PM +, Thorsten Glaser wrote:
 Roger Leigh dixit:
 
 Are you sure?
 
 Not entirely, but I recall fgetc (or was it fgetwc?)
 being affected.

Ah, fgetc/fputc are specified in the standard as byte oriented
rather than character-oriented, so are probably locale-independent
for binary I/O.  OTOH, the wide variants are for wide character I/O
and may require conversion between the narrow and wide forms which
might well need to involve the locale.  I thought I spotted this
reading the standard last night, but I can't find the text this
morning.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?   http://gutenprint.sourceforge.net/
   `-GPG Public Key: 0x25BFB848   Please GPG sign your mail.


--
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2009-04-08 Thread Giacomo A. Catenazzi

Roger Leigh wrote:

On Tue, Apr 07, 2009 at 09:24:38PM +0200, Adeodato Simó wrote:

+ Thorsten Glaser (Tue, 07 Apr 2009 18:54:59 +):


Except the ton which sets LC_ALL=C to get sane (parsable,
dependable, historically compatible) output.
These would then unset all other LC_* and LANG and LANGUAGE,
and only set LC_CTYPE to C.UTF-8 to get old behaviour but
with UTF-8 (and mbrtowc and iswctype and and and) available.

Isn’t setting LC_ALL=C.UTF-8 going to be about the same and less work?
I’m genuinely interested if that would behave any different to what you
said (unsetting all, setting LC_CTYPE).


% sudo localedef -c -i POSIX -f UTF-8 C.UTF-8

% LANG=C.UTF8 locale charmap
UTF-8

% LANG=C locale charmap
ANSI_X3.4-1968

This appears to work correctly at first glance.

However, I would ideally like the C/POSIX locales to be UTF-8
by default as on other systems (with a C.ASCII variant if required).


POSIX doesn't mandate C to be ASCII7.

BTW ASCII7 is a subset of UTF-8, so what would be different with
normal C?  I don't expect any differences on any program (which
are POSIX compatible). The output characters will still be only on
the c128 range.

ciao
cate



--
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: does /var/games have to be deleted on purge? (if it's empty..)

2009-04-08 Thread Julien Cristau
On Wed, 2009-04-08 at 02:30 +0200, Holger Levsen wrote:
 Or is RC too much? Or fine now? 

Anything  normal would be too much IMO.

Cheers,
Julien


-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2009-04-08 Thread Giacomo A. Catenazzi

Roger Leigh wrote:

On Tue, Apr 07, 2009 at 10:36:20AM +0200, Giacomo A. Catenazzi wrote:


 Roger Leigh wrote:


I can't help but feel that your reply completely missed the
purpose of what I want to do, and why.  I hope the following
response clears things up.


I know that I missed the original point, but IMHO you was and still
misunderstandings locale, charset and C language behaviour.

So I'm trying to explain you how these things works, and after
this, we can go to the real problem.
[Note: maybe I am on the wrong side. Often standards are not
so consistent on these behaviours, and thus maybe I interpreted them
wrongly]








- input charset is the source charset (used to parse C code)
- exec charset is the charset of the target machine (which run the program).


That's pretty much what I said.


- C99 must support unicode identifier (written with \u or in other
  non portable implementation defined way)


OK.  But that's really nothing to do with the fact that you can use
UTF-8 sources directly.  It's akin to having to support trigraphs,
but we don't use trigraphs because they are bloody annoying and nowadays
competelely unnecessary.  But mainly, it doesn't affect the exec charset
whether you use UTF-8 encoded sources or \u.


ok.


- standard libraries can use locales (but only if you initialized the locale),
  but not all the functions, not all uses.
- wide charaters are yet an other things (as you note in your example,
  the wide string is not in UTF-8, but I think UTF-32)

Same input and exec charset really means: don't translate strings
(e.g. in
   if(c = 'a') printf(bcde\n);
 'a' and bcde\n will have the same values as in the input file, else
 it will put in binary the representation of exec charset)


Of course.  However, the test program I posted showed what that if the
locale has been appropriately initialised, there is an additional
translation between the exec charset and the output charset specified
by the locale (see the Latin characters correctly preserved and output
as ISO-8859-1 in an ISO-8859-1 locale).


No ;-)  Ok, it take me some modifications of your program and
looking to POSIX to discover the reason.

You forget to check error codes. In this case we have
Invalid or incomplete multibyte or wide character in the
non UTF-8 locale.

So looking to POSIX:
Wide-character codes for other characters are locale and 
implementation-defined.
so you (and me) compiled the code with UTF-8, so in binary there is
different wchar representation. Which is invalid on non-UTF-8 locale.

Note that that it is locale dependent, so same charset with different
language could give different results (I don't know if there are such
cases on glibc).



Usually the interpretation of bytes is done by terminal, not by compiler.


It's done at several points:
compiler: source-exec
runtime: locale-dependent exec-output (and optional use of gettext)
terminal: output-display


to go to the point: what is the problem in mksh?
At which level it fails?



yes, in a perfect world we need only one charset (and maybe only
one language and one locale). From all the proposals to reach this
target, unicode and UTF-8 seems the best solution.
But... for now take care about locales and don't assume UTF-8,
or you will cause trouble with a lot of non-UTF-8 users.
Converting locale (from non-UTF-8 to UTF-8) is simple for
English and few European languages, but it is a tedious work
for many user: it need a flag day, in which I should convert
all my files to UTF-8 or annotate every file with the right
encoding (most of editors and tools understands such annotations).


I have never *ever* suggested that we only use one charset.  I'm only
suggesting that the *C locale* must be UTF-8 in order to allow for
full UTF-8 support.  Normal user locales can use whatever charset
they like.


(see the other mail: what do full UTF-8 mean)



Non-UTF-8 users won't be disadvantaged because the UTF-8 exec charset
will be recoded to their locale-specific output codeset, either by
libc or gettext.


Not sure to understand. Debian is moving all file to UTF-8
(manual pages, documentation, debian control files, ...).
So I totally agree.
But was not the point of the original proglem?



The C locale is special in that normal users won't use it, but
system programs and code needing locale independence do use it.
Any program wanting to work correctly in a C locale must only use
ASCII or it *breaks*.  This means we are /de facto/ restricted to
ASCII unless we take special effort to work around the fact (and
this was the point of my l10n/i18n comments above).

Most programs do need to work correctly in a C locale, and so can't
use UTF-8 either as a source or exec charset.  This is a severe
limitation.


No. locale is not really charset. A program can use
as input and output any charset (note: most of editor handle
different file charsets, indipendently).
The problem are the terminals. If you print a non-ASCII char,
terminal will confuse. It is 

Re: does /var/games have to be deleted on purge? (if it's empty..)

2009-04-08 Thread Bill Allombert
On Wed, Apr 08, 2009 at 02:30:23AM +0200, Holger Levsen wrote:
 Hi,
 
 On Dienstag, 7. April 2009, Paul Wise wrote:
  A single rmdir in every game using /var/games isn't that hard,
  especially since they have to remove the files from there.
 
 I agree and plan to file RC bugs on this. 
 
 (There have been 24781 binary packages been successfully tested in sid and 
 squeeze atm, 369 have failures, of which eleven packages keep /var/games 
 around, of which 4 also keep other files in /var/games/* - seven more RC bugs 
 sound reasonable to me. Plus potentially a few more in packages not tested.)
 
 Or is RC too much? Or fine now? 

Unless policy is changed to make clear that /var/games can be removed 
at any time, and thus that package cannot just ship /var/games in the
deb and expect it to be available when running the postinst, or at any
latter time, I have to object with this bug reports because this 
introduces a race condition. 

Cheers,
-- 
Bill. ballo...@debian.org

Imagine a large red swirl here. 


-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2009-04-08 Thread Andrew McMillan
On Wed, 2009-04-08 at 10:15 +0200, Giacomo A. Catenazzi wrote:
 
 So I've a question: what does UTF-8 mean in this context (C.UTF-8) ?
 
 It is not a stupid question, and the answer is not the UTF-8 algorithm
 to code/decode unicode.
 I'm still thinking that you are confusing the various meanings.
 And until I understand the problem, I cannot propose a solution.

While it is true that the C locale is (already) a UTF-8 compatible
locale, it provides no clues to the system for the encoding of
characters outside that locale.

We can all be pure about the C locale and believe that all characters
have 7 bits, but we all know that reality is not like that.  It's not
like that even in the northern part of the content pair that 'ASCII'
gets it's name from.

I believe that Debian should endorse Unicode as the preferred method for
mapping between numbers to characters.  I do not expect there is any
real argument against this, although I do understand that current
versions of Unicode may not yet comprehensively/satisfactorily represent
all glyphs in some languages.  I think there is hope that these problems
will eventually be ironed out.

There are, of course, a number of systems for encoding unicode
characters, but I do not seriously expect that anyone is recommending
that Debian should use UTF-16, UTF-32 (or, $DEITY forbid, Punycode :-)
as something which should be available everywhere.

So given a character which is outside of the 0x00 = 0x7f range, in an
environment which does not specify an encoding, I would like to one day
be able to categorically state that Debian will by default assume that
character is unicode, encoded according to UTF-8.

In such an environment, with a C.UTF-8 encoding selected, when I start a
word processing program and insert an a-umlaut in there, I would expect
that my file will be written with a UTF-8 encoded unicode character in
it.  I would not expect that if I sort the lines in that file, that the
lines beginning with a-umlaut would sort before 'z'.  I would not expect
that if I grep such a file for '^[[:alpha:]]$' that my a-umlaut line
would appear.

At present I don't believe that this does happen.  At present we
continue to perpetuate encodings such as ISO 8859-1 in these situations,
making pain for our children and grandchildren to resolve.


So as a first step in this process of 'cleaning up our world', this bug
is proposing a smaller change than that, and a smaller change than I
believe you think it is.


The proposal, at this stage is only that the C.UTF-8 locale is
*installed* and *available* by default.  Not that it *be* the default,
but that it *be there* as a default. People will naturally continue to
be free to uninstall it, or to leave their locale to 'C'.


Once this minimum step is made, and we've all calmed down, we can think
further on radical and dramatic changes over coming years where more
significant shifts are made, like:

* The default locale at installation is C.UTF-8 rather than C.
* The default locale at installation is assigned based on the
installation language.
* If a locale is set which doesn't specify an encoding, the system
defaults to assuming UTF-8.
* All ISO8859 locales are moved to a new locales-legacy-encodings
package.
* ... and so on.


Yes, I think that the C.UTF-8 locale offers something different that the
C locale doesn't.  Primarily it offers us a way out of the current
default encodings which are legacy encodings, without jumping boots and
all into a world where suddenly our sort ordering is changed, and our
users are screaming at us that en_US.UTF-8 is wrong for *them*, or that
'sort' is suddenly putting 'A' next to 'a' and all of their legacy shell
scripts expect are broken because they expect a different behaviour.


I believe that the list above might be the set of smallest useful
incremental changes in this process.  I would really like to see that
second step taken too, where the default locale is set to the most basic
UTF-8 locale possible, but I'm happy to see a second bug and further
discussion, if that's what we need to do to get agreement.


 - terminals should be sensible to charsets, on choosing how to display
things
 - programs should be sensible to locales (topic of this discussion):
the locales provides some charsets dependent strings, and interpretation
of the various characters, but (usually) they MUST NOT translate 
 characters.
Not so.  They have to consider how to handle input also, unless by
'terminal' you mean any program which might handle character input and
output...

An example I have had in the last week was that some software processing
information from the internet was converting nbsp; into the character
0xa0.  While I have now stopped using that particular software
(Html::Strip, if anyone's interested), it illustrates exactly how
software currently doesn't know, and through not knowing it can
perpetuate encoding systems which need to die.


 Anyway:
 
 The locale C is already a UTF-8 compatible 

Re: does /var/games have to be deleted on purge? (if it's empty..)

2009-04-08 Thread Holger Levsen
Hi,

On Mittwoch, 8. April 2009, Paul Wise wrote:
 How about this:

 Game a gets installed and ships /var/games
 Game b gets installed and ships /var/games
 Game a gets purged and removes /var/games
 User starts game b and gets a high score
 Game b tries to save the high score but fails because /var/games doesn't
 exist

Uhm, I thought it was obvious that /var/games may only be deleted if it's 
empty...


regards,
Holger


signature.asc
Description: This is a digitally signed message part.


Re: does /var/games have to be deleted on purge? (if it's empty..)

2009-04-08 Thread Paul Wise
On Wed, Apr 8, 2009 at 7:51 PM, Holger Levsen hol...@layer-acht.org wrote:

 On Mittwoch, 8. April 2009, Bill Allombert wrote:
 Unless policy is changed to make clear that /var/games can be removed
 at any time, and thus that package cannot just ship /var/games in the
 deb and expect it to be available when running the postinst, or at any
 latter time, I have to object with this bug reports because this
 introduces a race condition.

 I dont understand, can you please explain what race condition you mean?

How about this:

Game a gets installed and ships /var/games
Game b gets installed and ships /var/games
Game a gets purged and removes /var/games
User starts game b and gets a high score
Game b tries to save the high score but fails because /var/games doesn't exist

-- 
bye,
pabs

http://wiki.debian.org/PaulWise
http://synfig.org/User:PaulWise
http://bonedaddy.net/pabs3/


-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: does /var/games have to be deleted on purge? (if it's empty..)

2009-04-08 Thread Julien Cristau
On Wed, 2009-04-08 at 14:17 +0200, Holger Levsen wrote:
 Hi,
 
 On Mittwoch, 8. April 2009, Paul Wise wrote:
  How about this:
 
  Game a gets installed and ships /var/games
  Game b gets installed and ships /var/games
  Game a gets purged and removes /var/games
  User starts game b and gets a high score
  Game b tries to save the high score but fails because /var/games doesn't
  exist
 
 Uhm, I thought it was obvious that /var/games may only be deleted if it's 
 empty...

And it is empty until you try and play game b.  Which might be after
purging game a, which removed /var/games.  Hilarity ensues.  So either
game a must not remove /var/games, or game b must ship
with /var/games/.b-placeholder to make sure that /var/games isn't empty
while b is installed.  The former seems saner to me.

Cheers,
Julien


-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: does /var/games have to be deleted on purge? (if it's empty..)

2009-04-08 Thread Holger Levsen
Hi,

On Mittwoch, 8. April 2009, Adeodato Simó wrote:
 Additionally, what happens if package A and B both ship an empty
 /var/games (they both write their score files directly there, rather
 than a subdirectory), get both installed, then B gets purged and its
 postinst removes /var/games, and then A runs and tries to write to
 /var/games a score file, but the directory does no longer exist nor has
 the game write permission to create it. Is there or is there going to be
 a policy mandating that packages should not ship /var/games without
 shipping /var/games/name?

Isn't the right approach for these packages to register the files they rely on 
in /var/games/ with dpkg?


regards,
Holger


signature.asc
Description: This is a digitally signed message part.


Re: does /var/games have to be deleted on purge? (if it's empty..)

2009-04-08 Thread Adeodato Simó
+ Russ Allbery (Mon, 06 Apr 2009 11:33:41 -0700):

 I don't see much real benefit in going out of our way to remove /var/games
 and it looks like it would be a bit annoying (at the least, require adding
 purge code to all games that put files in /var/games that would usually
 never be triggered).  My inclination would be to say that this behavior is
 fine and perhaps we should officially bless it somewhere.

I agree with this. We’re trying to move away (eg. with triggers) from
stuff that has to be propagated to every maintainer scripts, and I
really don’t see how removing an empty /var/games is such a big benefit
that would make it worth our time to enforce rmdir’s everywhere.

Additionally, what happens if package A and B both ship an empty
/var/games (they both write their score files directly there, rather
than a subdirectory), get both installed, then B gets purged and its
postinst removes /var/games, and then A runs and tries to write to
/var/games a score file, but the directory does no longer exist nor has
the game write permission to create it. Is there or is there going to be
a policy mandating that packages should not ship /var/games without
shipping /var/games/name?

Thanks,

-- 
- Are you sure we're good?
- Always.
-- Rory and Lorelai


-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: does /var/games have to be deleted on purge? (if it's empty..)

2009-04-08 Thread Holger Levsen
Hi Bill,

On Mittwoch, 8. April 2009, Bill Allombert wrote:
 Unless policy is changed to make clear that /var/games can be removed
 at any time, and thus that package cannot just ship /var/games in the
 deb and expect it to be available when running the postinst, or at any
 latter time, I have to object with this bug reports because this
 introduces a race condition.

I dont understand, can you please explain what race condition you mean?


regards,
Holger

P.S.: Thanks for all your CC:s but I'm subscribed to -qa@ :-)


signature.asc
Description: This is a digitally signed message part.


Re: does /var/games have to be deleted on purge? (if it's empty..)

2009-04-08 Thread Bill Allombert
On Wed, Apr 08, 2009 at 01:51:25PM +0200, Holger Levsen wrote:
 Hi Bill,
 
 On Mittwoch, 8. April 2009, Bill Allombert wrote:
  Unless policy is changed to make clear that /var/games can be removed
  at any time, and thus that package cannot just ship /var/games in the
  deb and expect it to be available when running the postinst, or at any
  latter time, I have to object with this bug reports because this
  introduces a race condition.
 
 I dont understand, can you please explain what race condition you mean?

One scenario among others:

Package A ships with /var/games in the deb, Package B remove /var/games
in the purge postrm.
Package A is unpacked: /var/games is created
Package B is purged: /var/games is removed
Package A is configured: postinst do 'touch /var/games/foo.hiscore'
which fails.

Cheers,
-- 
Bill. ballo...@debian.org

Imagine a large red swirl here. 


-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2009-04-08 Thread Roger Leigh
On Wed, Apr 08, 2009 at 10:22:15AM +0200, Giacomo A. Catenazzi wrote:
 Roger Leigh wrote:
 On Tue, Apr 07, 2009 at 09:24:38PM +0200, Adeodato Simó wrote:
 + Thorsten Glaser (Tue, 07 Apr 2009 18:54:59 +):

 Except the ton which sets LC_ALL=C to get sane (parsable,
 dependable, historically compatible) output.
 These would then unset all other LC_* and LANG and LANGUAGE,
 and only set LC_CTYPE to C.UTF-8 to get old behaviour but
 with UTF-8 (and mbrtowc and iswctype and and and) available.
 Isn’t setting LC_ALL=C.UTF-8 going to be about the same and less work?
 I’m genuinely interested if that would behave any different to what you
 said (unsetting all, setting LC_CTYPE).

 % sudo localedef -c -i POSIX -f UTF-8 C.UTF-8

 % LANG=C.UTF8 locale charmap
 UTF-8

 % LANG=C locale charmap
 ANSI_X3.4-1968

 This appears to work correctly at first glance.

 However, I would ideally like the C/POSIX locales to be UTF-8
 by default as on other systems (with a C.ASCII variant if required).

 POSIX doesn't mandate C to be ASCII7.

 BTW ASCII7 is a subset of UTF-8, so what would be different with
 normal C?  I don't expect any differences on any program (which
 are POSIX compatible). The output characters will still be only on
 the c128 range.

Exactly.  For a conforming program only using c128, there will
be zero differences running in a UTF-8 C locale and running and
an ASCII C locale, just like there are no differences today when
running in any UTF-8 locale (except maybe collation, but for the
UTF-8 C locale we would need to keep it fully backward compatible
with the existing behaviour).

However, what is different is that programs may /optionally/ choose
to use the UTF-8 superset of ASCII7 and have output and string
formatting and wide/narrow character conversion work correctly.
This is what is currently lacking.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?   http://gutenprint.sourceforge.net/
   `-GPG Public Key: 0x25BFB848   Please GPG sign your mail.



--
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: does /var/games have to be deleted on purge? (if it's empty..)

2009-04-08 Thread Bill Allombert
On Wed, Apr 08, 2009 at 02:04:13PM +0200, Adeodato Simó wrote:
 + Russ Allbery (Mon, 06 Apr 2009 11:33:41 -0700):
 
  I don't see much real benefit in going out of our way to remove /var/games
  and it looks like it would be a bit annoying (at the least, require adding
  purge code to all games that put files in /var/games that would usually
  never be triggered).  My inclination would be to say that this behavior is
  fine and perhaps we should officially bless it somewhere.
 
 I agree with this. We’re trying to move away (eg. with triggers) from
 stuff that has to be propagated to every maintainer scripts, and I
 really don’t see how removing an empty /var/games is such a big benefit
 that would make it worth our time to enforce rmdir’s everywhere.
 
 Additionally, what happens if package A and B both ship an empty
 /var/games (they both write their score files directly there, rather
 than a subdirectory), get both installed, then B gets purged and its
 postinst removes /var/games, and then A runs and tries to write to
 /var/games a score file, but the directory does no longer exist nor has
 the game write permission to create it. Is there or is there going to be
 a policy mandating that packages should not ship /var/games without
 shipping /var/games/name?

The restriction I see is that A would need root priviledges to creat a file
in /var/games because policy says:

 The permissions on `/var/games' are mode 755, owner `root' and group
 `root'.

But that is not absolutly impossible.

Cheers,
-- 
Bill. ballo...@debian.org

Imagine a large red swirl here. 


-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2009-04-08 Thread Roger Leigh
On Wed, Apr 08, 2009 at 09:41:18AM +0200, Giacomo A. Catenazzi wrote:
 Roger Leigh wrote:
   I wasn't aware that this level of checking was performed, though
 it does make sense.  But, does it not reject non 7-bit input in the C
 locale for completeness?

 Should tools doing raw I/O not be using lower level interfaces
 such as fread() and fwrite() rather than the formatted print
 functions which are specified to behave in a locale-dependent
 manner? 

 printf is not locale dependent, but on numeric display
 (and eventually on some extensions).

Each C FILE* stream has an associated locale.
Look at struct _IO_FILE_complete in libio.h.
The example program I posted demonstrates that this does actually
happen; the output streams use the current locale, and there is
a UTF-8 [narrow]/UCS-4 [wide] conversion to the locale codeset on
output.

When you output a string to a stream, there is a conversion step
from the exec charset (either narrow or wide) to the stream's
associated locale.  I haven't yet found documented exactly where
this happens (it's all in the libc internals), but I would
hazard a guess that all the string functions use this step,
where the lower-level byte-based I/O functions skip this step.

This machinery is also used by the C++ iostream locale imbue()
mechanism.

So while printf itself might not do the conversion, it's done
at some point, probably when printf copies the formatted string
to the stream buffer.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?   http://gutenprint.sourceforge.net/
   `-GPG Public Key: 0x25BFB848   Please GPG sign your mail.


-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: does /var/games have to be deleted on purge? (if it's empty..)

2009-04-08 Thread Andrew McMillan
On Wed, 2009-04-08 at 14:17 +0200, Holger Levsen wrote:
 Hi,
 
 On Mittwoch, 8. April 2009, Paul Wise wrote:
  How about this:
 
  Game a gets installed and ships /var/games
  Game b gets installed and ships /var/games
  Game a gets purged and removes /var/games
  User starts game b and gets a high score
  Game b tries to save the high score but fails because /var/games doesn't
  exist
 
 Uhm, I thought it was obvious that /var/games may only be deleted if it's 
 empty...

But Paul is describing a situation where it is empty (Game b installed
it, but has not yet written a high score into it), but the simple rmdir
logic will delete it.  == very bad.

Cheers,
Andrew.


andrew (AT) morphoss (DOT) com+64(272)DEBIAN
   You have no real enemies.




-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: does /var/games have to be deleted on purge? (if it's empty..)

2009-04-08 Thread Andrew McMillan
On Wed, 2009-04-08 at 14:04 +0200, Adeodato Simó wrote:
 + Russ Allbery (Mon, 06 Apr 2009 11:33:41 -0700):
 
  I don't see much real benefit in going out of our way to remove /var/games
  and it looks like it would be a bit annoying (at the least, require adding
  purge code to all games that put files in /var/games that would usually
  never be triggered).  My inclination would be to say that this behavior is
  fine and perhaps we should officially bless it somewhere.
 
 I agree with this. We’re trying to move away (eg. with triggers) from
 stuff that has to be propagated to every maintainer scripts, and I
 really don’t see how removing an empty /var/games is such a big benefit
 that would make it worth our time to enforce rmdir’s everywhere.

/me too, for what it's worth.


 Additionally, what happens if package A and B both ship an empty
 /var/games (they both write their score files directly there, rather
 than a subdirectory), get both installed, then B gets purged and its
 postinst removes /var/games, and then A runs and tries to write to
 /var/games a score file, but the directory does no longer exist nor has
 the game write permission to create it. Is there or is there going to be
 a policy mandating that packages should not ship /var/games without
 shipping /var/games/name?


I think the suggestion was shorthand for purge behaviour something along
the lines of:

rm /var/games/myscorefiles.*
rmdir --ignore-fail-on-non-empty /var/games

So that if the rmdir failed it was just kind of 'well, we tried'
behaviour.


Really, though, I don't think that sort of attitude is what we should
ideally be enshrining in policy and I would rather bless the existence
of /var/games than impose a more rigorous procedure for deleting it in a
tasteful and elegant way.

Cheers,
Andrew.


andrew (AT) morphoss (DOT) com+64(272)DEBIAN
   Time to be aggressive.  Go after a tattooed Virgo.




-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#522776: locale dependend compilation

2009-04-08 Thread Giacomo A. Catenazzi

Ok, maybe I found the problem.

Giacomo A. Catenazzi wrote:
  No ;-)  Ok, it take me some modifications of your program and

looking to POSIX to discover the reason.

You forget to check error codes. In this case we have
Invalid or incomplete multibyte or wide character in the
non UTF-8 locale.

So looking to POSIX:
Wide-character codes for other characters are locale and 
implementation-defined.

so you (and me) compiled the code with UTF-8, so in binary there is
different wchar representation. Which is invalid on non-UTF-8 locale.

Note that that it is locale dependent, so same charset with different
language could give different results (I don't know if there are such
cases on glibc).


So it means that NO portable programs could use constant (i.e. as fixed
value in sources) wchars and wstrings, because a compiled program has
now way to distinguish a wstring build at compiler time and a wstring from
outside, thus with possible two different locales/charsets.
[GCC uses as default UTF-16 or UTF-32 for wchar, according to the space need
for chars in current locale]

BTW we have a similar problem with normal strings.

This is very unfortunate, and it is *worse* than the initial problem.
Changing locale will not solve this, but probably will reduce the
visibility of the error. [no locale specified means UTF-8 for GCC].

So maybe we need to specify the locale to be passed to debian/rule
or the parameter to gcc, in order to have a (default) fix source
encoding.

But this doesn't not solve the problem. An encoded UTF-8 or
UTF-32 (for wchar) is not decoded correctly on non UTF-8 terminals.

But in this case we have iconv() function (because NOW we know the
inizial encoding), to convert constant-string to the right locale.


So: programs that use constant wchar or string with chars outside ASCII
must be compiled with the right encoding (ev. with right locale), specified
in debian/rule (or every developer will see a different output).
Such program should convert the string to the right locale, before to
print it to terminal.


Alternatively, the string must be put outside source code, and read
from a file. The iconv() apply also in this case.


PS: requiring us_EN.UTF-8 as default to debian/rule seems also
nice, so logs can be read from all developers.

Possibly also C in UTF-8 could be good. Such C should have
only charset UTF-8 and not other additional meaning to
characters outside ASCII-7.  But this should be carefully tested:
I really things that there are existing wrong assumption and
cases we forgot.


So ok: I think I've understood the problem (but part of the bug
is in the program / Makefile).

ciao
cate



--
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2009-04-08 Thread Giacomo A. Catenazzi

Andrew McMillan wrote:

On Wed, 2009-04-08 at 10:15 +0200, Giacomo A. Catenazzi wrote:

So I've a question: what does UTF-8 mean in this context (C.UTF-8) ?

It is not a stupid question, and the answer is not the UTF-8 algorithm
to code/decode unicode.
I'm still thinking that you are confusing the various meanings.
And until I understand the problem, I cannot propose a solution.


While it is true that the C locale is (already) a UTF-8 compatible
locale, it provides no clues to the system for the encoding of
characters outside that locale.

We can all be pure about the C locale and believe that all characters
have 7 bits, but we all know that reality is not like that.  It's not
like that even in the northern part of the content pair that 'ASCII'
gets it's name from.

I believe that Debian should endorse Unicode as the preferred method for
mapping between numbers to characters.  I do not expect there is any
real argument against this, although I do understand that current
versions of Unicode may not yet comprehensively/satisfactorily represent
all glyphs in some languages.  I think there is hope that these problems
will eventually be ironed out.

There are, of course, a number of systems for encoding unicode
characters, but I do not seriously expect that anyone is recommending
that Debian should use UTF-16, UTF-32 (or, $DEITY forbid, Punycode :-)
as something which should be available everywhere.



So given a character which is outside of the 0x00 = 0x7f range, in an
environment which does not specify an encoding, I would like to one day
be able to categorically state that Debian will by default assume that
character is unicode, encoded according to UTF-8.


I agreem but the last sentence.
Debian will use as default unicode, encoded according to UTF-8, but
not *assume*.  It is again portability.  Let (old) programs to works
also on the future Debian.

Note that the problem with ASCII7 arise also to other encoding.
We are Europeans or Americans, so UTF-8 seems an easy transition,
but for people who use other non-ASCII based encoding, this could be
very hard.  If we start assuming UTF-8 we cause a lot of troubles in
other continents.  Files which were readable in Lenny will be readable
in future only using a command line utility, what a nightmare for our
users!


So if your first paragraph are a nice objective, we should not
add assumptions that causes more troubles.
I think the opposite direction will be the best: let assume
less about locale, and let user and system to find and choose
the right encodings.
I.e. let me read file with less in many encodings
(heuristic, magic strings, or command line argument), instead of building
less to assume UTF-8.


We have the same objective, but two different ways. And because
I used and use a lot of different systems, I think my way is the best.



In such an environment, with a C.UTF-8 encoding selected, when I start a
word processing program and insert an a-umlaut in there, I would expect
that my file will be written with a UTF-8 encoded unicode character in
it.  I would not expect that if I sort the lines in that file, that the
lines beginning with a-umlaut would sort before 'z'.  I would not expect
that if I grep such a file for '^[[:alpha:]]$' that my a-umlaut line
would appear.


I think nobody should use C or C.UTF-8 as user encoding.
And I really hope that Debian will try to convince user to use a
proper locale.



At present I don't believe that this does happen.  At present we
continue to perpetuate encodings such as ISO 8859-1 in these situations,
making pain for our children and grandchildren to resolve.


No, I think Debian is really pushing UTF-8, and fortunately we can
distinguish automatically ISO 8859-1 from UTF-8 (but few degenerate
cases). This could help.  But world is not only ASCII based, so
mandate UTF-8 will causes more trouble.

I think we can do more heuristic to find the right encoding,
and encouraging programmers to annotate file with the right
encoding (you see more and more file with tell explicitly
the editor about the encoding).


So as a first step in this process of 'cleaning up our world', this bug
is proposing a smaller change than that, and a smaller change than I
believe you think it is.


It helps you, it helps Europeans and Americans, but it doesn't help
writing program that all world could use (also to read older documents).

Setting a real locale (not POSIX or C) solve this, and BTW is
what Debian is doing.
C.UTF-8 will create a new locale, not destroying one, so not going
in the right direction.



The proposal, at this stage is only that the C.UTF-8 locale is
*installed* and *available* by default.  Not that it *be* the default,
but that it *be there* as a default. People will naturally continue to
be free to uninstall it, or to leave their locale to 'C'.


Once this minimum step is made, and we've all calmed down, we can think
further on radical and dramatic changes over coming years where more
significant shifts are made, 

Re: Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2009-04-08 Thread Thorsten Glaser
Giacomo A. Catenazzi dixit:

 The locale C is already a UTF-8 compatible locale.

It is UTF-8 transparent but that's its pro and con.
It does not tell the system that UTF-8 encoding is to be used.
It basically says the encoding is none/unknown.


 Why build need to depend to a locale?
[...]
 For testing? So why not test various locales (UTF-8, but also other non
 ascii based encodings)

 to go to the point: what is the problem in mksh?
 At which level it fails?
[...]
 But if mksh don't work on C, I'm very worried.
 The problems are on inputs or on outputs?

I think you misunderstand the mksh part of the problem.

mksh has two modi: a legacy mode, in which it does not make any
assumptions about charsets or encodings and is 8-bit clean and
mostly 8-bit transparent, safe a few mostly past bugs and imple-
mentation shortcomings, and a unicode mode, in which it assumes
its input is UTF-8 (although, with ^V, you can still enter non-
UTF-8 sequences, and tabcomplete filenames in legacy encodings
as well). The unicode mode is enabled with mksh -U or set -U.
However, mksh has a feature which automatically enables the uni-
code mode if
- the current CODESET is UTF-8 (or the locale ends in .utf8 or
  .UTF-8 or something similar, in some cases), or
- the input begins with a UTF-8 BOM.

The regression test suite merely checks for this feature. To do
so, it needs a way to set the checked mksh process' CODESET to
UTF-8, which is only possible by setting a non-C/POSIX locale.


Andrew McMillan dixit:

The proposal, at this stage is only that the C.UTF-8 locale is
*installed* and *available* by default.  Not that it *be* the default,
but that it *be there* as a default.

This is about what I was to propose, indeed.


Andrew McMillan dixit:

Once this minimum step is made, and we've all calmed down, we can think
further on radical and dramatic changes over coming years where more
significant shifts are made, like:

* The default locale at installation is C.UTF-8 rather than C.

That would be nice.

* If a locale is set which doesn't specify an encoding, the system
defaults to assuming UTF-8.


Andrew McMillan dixit:

[...] and indeed Steve
Langasek has already suggested a seemingly reasonable workaround for the
immediate problem which was, funnily enough, that mksh wants to have a
UTF-8 locale *available* in order for it to *test the build*...

Yes, his suggestion and searching for someone to actually use it
(Daniel Jacobowitz does) helped that part of the problem. However,
the mksh regression test suite is only one of the manifestations.
Even as a mere user, I'd like to have, see above, a UTF-8 locale
available and, if possible, default. Well, maybe not a UTF-8 locale,
just UTF-8 encoding (especially when I ssh from a MirBSD system to
a Debian system, since on MirBSD there is *only* UTF-8¹), but glibc
defines encodings exclusively via locales, which is why I'm in fa-
vour of C.UTF-8 for myself, but setting LC_CTYPE only has the same
effect (and I often set LC_MESSAGES to en_GB.UTF-8 for gcc's bene-
fit).


Giacomo A. Catenazzi dixit:

 Debian will use as default unicode, encoded according to UTF-8, but
 not *assume*.  It is again portability.

I agree too. You cannot simply assume things.

 Let (old) programs to works
 also on the future Debian.

These need to export LC_ALL=C already, since you've been able to
choose a locale in d-i for a while, so no change there.


bye,
//mirabilos
-- 
23:22⎜«mikap:#grml» mirabilos: und dein bootloader ist geil :)
23:29⎜«mikap:#grml» und ich finds saugeil dass ich ein bsd zum booten mit
 ⎜  grml hab, das muss ich dann gleich mal auf usb-stick installieren
-- Michael Prokop von grml.org über MirGRML und MirOS bsd4grml


--
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#522776: debian-policy: mandate existence of a standardised UTF-8 locale

2009-04-08 Thread Andrew McMillan
On Wed, 2009-04-08 at 15:31 +0200, Giacomo A. Catenazzi wrote:
 
 We have the same objective, but two different ways.

Indeed, but it seems to me that you are pushing for a much bigger change
than I am.

So the smallest step which is in the same direction both of us want to
go, is for *a* UTF-8 locale to be *available* on all Debian systems,
which is what is being proposed by this bug.


Cheers,
Andrew.


andrew (AT) morphoss (DOT) com+64(272)DEBIAN
   Just to have it is enough.






-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#523226: debian-policy: Typo in Python packaging policy

2009-04-08 Thread Nicolas Alvarez
Package: debian-policy
Severity: minor


In chapter 1.1 of Debian Python Policy
http://www.debian.org/doc/packaging-manuals/python-policy/ch-python.html, 
second paragraph, it says The default Debian Python version should
alway be... should say always.

-- System Information:
Debian Release: lenny/sid
  APT prefers hardy-updates
  APT policy: (900, 'hardy-updates'), (500, 'hardy-security'), (500, 'hardy'), 
(400, 'hardy-proposed'), (300, 'hardy-backports')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.24-23-generic (SMP w/2 CPU cores)
Locale: LANG=es_AR.UTF-8, LC_CTYPE=es_AR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash



-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org