Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-08-14 Thread Charles Plessy
tag 701081 pending
thanks

Le Tue, Aug 13, 2013 at 08:36:15AM +0900, Charles Plessy a écrit :
> 
> >   
> > File names
> > 
> > 
> >   The name of the files installed by binary packages in the system 
> > PATH 
> >   (namely /bin, /sbin, /usr/bin,
> >   /usr/sbin and /usr/games/) must be encoded in
> >   ASCII.
> > 
> > 
> > 
> >   The name of the files and directories installed by binary packages
> >   outside the system PATH must be encoded in UTF-8 and should be
> >   restricted to ASCII when they can be represented in that character
> >   set.
> > 
> >   
 
> Unless there are further objections, I will go ahead with the wording above
> (or with the parenthesis turned in a footnote).

Hello everybody,

I pushed it as it is.

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-08-12 Thread Charles Plessy
Hello everybody,

in light with the discussion about UTF-8 on the debian-devel mailing list,
I would like to close the issue 701081 about filename encodings.

I reproduce here the addition that has been worded by me, seconded by Jonathan
Nieder and Julian Gilbey, and supported by others.

>   
> File names
> 
> 
>   The name of the files installed by binary packages in the system 
> PATH 
>   (namely /bin, /sbin, /usr/bin,
>   /usr/sbin and /usr/games/) must be encoded in
>   ASCII.
> 
> 
> 
>   The name of the files and directories installed by binary packages
>   outside the system PATH must be encoded in UTF-8 and should be
>   restricted to ASCII when they can be represented in that character
>   set.
> 
>   

The last objections were that it does not mandate ASCII for configuration files,
and that the system PATH should not be defined here.

For the system PATH, I think that we can move the definition anytime to a new
dedicated section; it only requires somebody to work on it and propose a
wording.  Alternatively, what is in parenthesis above can be turned into a
footnote.

For the configuration files, further restrictions would make some packages
non-compliant, and are not consensual.  On the other hand, the proposed patch
respects the current practice, through its general recommendation of ASCII with
a "should".

Unless there are further objections, I will go ahead with the wording above
(or with the parenthesis turned in a footnote).

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-14 Thread Helmut Grohne
On Sun, Apr 14, 2013 at 11:58:03AM +0200, Bill Allombert wrote:
> I think configuration files should also be included in the first list, 
> because the
> user is supposed to be able to interact dirrectly with them.

I object to this extension of the proposal, because use of UTF-8
characters in conffile names is a current use case of ca-certificates.
If anything it could be treated as a "should" and turned into "must"
after working with the ca-certificates maintainers on a solution.

Helmut


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-14 Thread Bill Allombert
On Sat, Apr 06, 2013 at 08:20:15PM +0900, Charles Plessy wrote:
> Le Mon, Apr 01, 2013 at 10:39:19AM -0700, Don Armstrong a écrit :
> > On Fri, 29 Mar 2013, Russ Allbery wrote:
> > > I think we should require UTF-8 as the character encoding for file
> > > names and fix the non-UTF-8 file names in the archive currently.
> > > None of the other courses of action really make any sense to me.
> > 
> > I think we should also forbid the use of non ASCII file names in PATH
> > and recommend that ASCII file names be used where possible, but I also
> > agree that where ASCII cannot serve, only UTF-8 should be used.
> 
> Hello everybody,
> 
> Here is a somewhat clumsy proposition.
> 
>   
> File names
> 
> 
>   The name of the files installed by binary packages in the system 
> PATH 
>   (namely /bin, /sbin, /usr/bin,
>   /usr/sbin and /usr/games/) must be encoded in
>   ASCII.
> 

I am not sure I like the idea of indirectly defining the system PATH in the 
'File names' section. If we want policy to define the system PATH, we should do
it in 10.1, I think.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-14 Thread Bill Allombert
On Sat, Apr 06, 2013 at 08:20:15PM +0900, Charles Plessy wrote:
> Le Mon, Apr 01, 2013 at 10:39:19AM -0700, Don Armstrong a écrit :
> > On Fri, 29 Mar 2013, Russ Allbery wrote:
> > > I think we should require UTF-8 as the character encoding for file
> > > names and fix the non-UTF-8 file names in the archive currently.
> > > None of the other courses of action really make any sense to me.
> > 
> > I think we should also forbid the use of non ASCII file names in PATH
> > and recommend that ASCII file names be used where possible, but I also
> > agree that where ASCII cannot serve, only UTF-8 should be used.
> 
> Hello everybody,
> 
> Here is a somewhat clumsy proposition.
> 
>   
> File names
> 
> 
>   The name of the files installed by binary packages in the system 
> PATH 
>   (namely /bin, /sbin, /usr/bin,
>   /usr/sbin and /usr/games/) must be encoded in
>   ASCII.
> 
> 
> 
>   The name of the files and directories installed by binary packages
>   outside the system PATH must be encoded in UTF-8 and should be
>   restricted to ASCII when they can be represented in that character
>   set.
> 
>   
> 
> 
> What do you think ?

I think configuration files should also be included in the first list, because 
the
user is supposed to be able to interact dirrectly with them.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-14 Thread Julian Gilbey
On Sun, Apr 14, 2013 at 06:01:10PM +0900, Charles Plessy wrote:
> Le Mon, Apr 08, 2013 at 12:18:37AM +0100, Julian Gilbey a écrit :
> > 
> > For consistency, I guess this should be /usr/games rather than
> > /usr/games/.
>  
> > The final paragraph seems a little bit vague; would "should be
> > restricted to ASCII when it is possible to do so" be clearer?  For if
> > Unicode characters can be represented in ASCII, they almost always
> > would be.  This alternative wording would suggest that using
> > characters such as em-dashes or non-breaking spaces or the like is not
> > good (though I doubt people would use them as filenames of packaged
> > files!).
> 
> Thanks everybody for the feedback.  I am ready to commit the patch,
> updated following Julian's suggestions.  But strictly speaking, I
> need one more formal seconding statement for this :)

I'm happy to second the proposal.

   Julian


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-14 Thread Charles Plessy
Le Mon, Apr 08, 2013 at 12:18:37AM +0100, Julian Gilbey a écrit :
> 
> For consistency, I guess this should be /usr/games rather than
> /usr/games/.
 
> The final paragraph seems a little bit vague; would "should be
> restricted to ASCII when it is possible to do so" be clearer?  For if
> Unicode characters can be represented in ASCII, they almost always
> would be.  This alternative wording would suggest that using
> characters such as em-dashes or non-breaking spaces or the like is not
> good (though I doubt people would use them as filenames of packaged
> files!).

Thanks everybody for the feedback.  I am ready to commit the patch,
updated following Julian's suggestions.  But strictly speaking, I
need one more formal seconding statement for this :)

Cheers,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-08 Thread Helmut Grohne
On Sat, Apr 06, 2013 at 08:20:15PM +0900, Charles Plessy wrote:
>   
> File names
> 
> 
>   The name of the files installed by binary packages in the system 
> PATH 
>   (namely /bin, /sbin, /usr/bin,
>   /usr/sbin and /usr/games/) must be encoded in
>   ASCII.
> 
> 
> 
>   The name of the files and directories installed by binary packages
>   outside the system PATH must be encoded in UTF-8 and should be
>   restricted to ASCII when they can be represented in that character
>   set.
> 
>   
> 
> 
> What do you think ?

Thanks to all involved parties for your work on this issue. I am very
much satisfied with the result and happy that it is met with consensus.
The suggestions of Julian Gilbey appear sensible, but do not touch the
general direction.

Helmut


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-07 Thread Julian Gilbey
On Sat, Apr 06, 2013 at 08:20:15PM +0900, Charles Plessy wrote:
> Here is a somewhat clumsy proposition.
> 
>   
> File names
> 
> 
>   The name of the files installed by binary packages in the system 
> PATH 
>   (namely /bin, /sbin, /usr/bin,
>   /usr/sbin and /usr/games/) must be encoded in
>   ASCII.
> 

For consistency, I guess this should be /usr/games rather than
/usr/games/.

> 
>   The name of the files and directories installed by binary packages
>   outside the system PATH must be encoded in UTF-8 and should be
>   restricted to ASCII when they can be represented in that character
>   set.
> 
>   
> 
> 
> What do you think ?

That sounds a very reasonable proposal.

The final paragraph seems a little bit vague; would "should be
restricted to ASCII when it is possible to do so" be clearer?  For if
Unicode characters can be represented in ASCII, they almost always
would be.  This alternative wording would suggest that using
characters such as em-dashes or non-breaking spaces or the like is not
good (though I doubt people would use them as filenames of packaged
files!).

   Julian


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-07 Thread Michael Shuler
On 04/07/2013 05:28 PM, Russ Allbery wrote:
> Charles Plessy  writes:
>> Here is a somewhat clumsy proposition.

It sounds clear and concise to me.

>>   
>> File names
> 
>> 
>>   The name of the files installed by binary packages in the system 
>> PATH 
>>   (namely /bin, /sbin, /usr/bin,
>>   /usr/sbin and /usr/games/) must be encoded in
>>   ASCII.
>> 
> 
>> 
>>   The name of the files and directories installed by binary packages
>>   outside the system PATH must be encoded in UTF-8 and should be
>>   restricted to ASCII when they can be represented in that character
>>   set.
>> 
>>   
> 
> This looks good to me.  I think that strikes the right balance without
> going into too many details about what justification should or shouldn't
> be required for using UTF-8.

Agreed. As one of the concerned package maintainers, I think this sounds
fine.

-- 
Kind regards,
Michael Shuler


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-07 Thread Jonathan Nieder
Charles Plessy wrote:

>   
> File names
>
> 
>   The name of the files installed by binary packages in the system 
> PATH 
>   (namely /bin, /sbin, /usr/bin,
>   /usr/sbin and /usr/games/) must be encoded in
>   ASCII.
> 
>
> 
>   The name of the files and directories installed by binary packages
>   outside the system PATH must be encoded in UTF-8 and should be
>   restricted to ASCII when they can be represented in that character
>   set.
> 
>   

Seconded.

Thanks,
Jonathan


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-07 Thread Russ Allbery
Charles Plessy  writes:

> Hello everybody,

> Here is a somewhat clumsy proposition.

>   
> File names

> 
>   The name of the files installed by binary packages in the system 
> PATH 
>   (namely /bin, /sbin, /usr/bin,
>   /usr/sbin and /usr/games/) must be encoded in
>   ASCII.
> 

> 
>   The name of the files and directories installed by binary packages
>   outside the system PATH must be encoded in UTF-8 and should be
>   restricted to ASCII when they can be represented in that character
>   set.
> 
>   

This looks good to me.  I think that strikes the right balance without
going into too many details about what justification should or shouldn't
be required for using UTF-8.

-- 
Russ Allbery (r...@debian.org)   


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-06 Thread Bill Allombert
On Sat, Mar 16, 2013 at 03:40:19PM -0700, Jonathan Nieder wrote:
> Russ Allbery wrote:
> 
> > For me, allowing the correct spellings of
> > words and the correct names of things to be represented in file names is
> > important enough to rise to an ethical goal that I would advocate
> > adopting.
> 
> This.  Among the examples listed the only one I found convincing was
> 
>   Certinomis_-_Autorité_Racine.crt

It might be advantageous for the certification autority to use UTF-8 to encode
its name, but the benefit for the user of the system is something entirely
different.

As long as the user is using UTF-8 locale and the terminal is able to handle
the script properly, there might be little harm done. However this does not
need to be the case. 

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-06 Thread Charles Plessy
Le Mon, Apr 01, 2013 at 10:39:19AM -0700, Don Armstrong a écrit :
> On Fri, 29 Mar 2013, Russ Allbery wrote:
> > I think we should require UTF-8 as the character encoding for file
> > names and fix the non-UTF-8 file names in the archive currently.
> > None of the other courses of action really make any sense to me.
> 
> I think we should also forbid the use of non ASCII file names in PATH
> and recommend that ASCII file names be used where possible, but I also
> agree that where ASCII cannot serve, only UTF-8 should be used.

Hello everybody,

Here is a somewhat clumsy proposition.

  
File names


  The name of the files installed by binary packages in the system PATH 
  (namely /bin, /sbin, /usr/bin,
  /usr/sbin and /usr/games/) must be encoded in
  ASCII.



  The name of the files and directories installed by binary packages
  outside the system PATH must be encoded in UTF-8 and should be
  restricted to ASCII when they can be represented in that character
  set.

  


What do you think ?

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-01 Thread Helmut Grohne
On Sun, Mar 24, 2013 at 08:01:03PM +0900, Charles Plessy wrote:
> after more than one month of discussion, we have not reached a conclusion.

Thanks for the ping.

> In the current situation there is no policy, which means that everything is
> allowed.  Indeed, there is at least one package with filenames using more than
> one set of non-ASCII characters, so no user can see correctly the names of
> every file in this package at the same time.

Some more data here. I checked sid main amd64 binary packages. The only
ones containing invalid UTF-8 sequences (and thus violating the current
proposal) would be aspell-is and jpilot. This suggests that UTF-8 is a
defacto standard already. Fixing two packages shouldn't be that hard. I
have filed a wishlist bug #704446 against lintian to check for this
regardless of the outcome of this bug.

> On my side, I made a proposal with actionable items: fix the few packages that
> are not using UTF-8, and modify the Policy to reflect the current practice
> of using ASCII in most of the times and other UTF-8 characters parcimoniously.

I am in favour of this solution.

 * Requiring any subset of UTF-8 has the direct benefit of being able to
   interpret all filenames used without guesswork.
 * This is in line with Fedora's policy.
 * I saw very little disagreement about whether to permit non-UTF-8
   sequences. Discussion seemed mostly to be around which subset to
   require.

> I understand very well the arguments against having any UTF-8 character at 
> all,
> but we currently have such packages in our archive, so if there is no plan to
> modify these packages, then we can not plan to solve this bug.

I see little benefit with restricting to ASCII compared to the benefit
with restricting to UTF-8. Remember that the goal of this bug was to
make filenames machine readable. I think that further restrictions
should happen in the context of #99933. I asked for not merging these
issues, because I would like to keep the scope of this issue limited and
thus implementable.

> Can others comment how they would like to see this bug solved ?

Any proposal that limits to a subset of UTF-8 and a superset of
printable ASCII is fine with me. My preferred choice would be just
UTF-8. I have no objections to recommending the use of a subset of
printable ASCII either.

To me it appears to be a matter of wording right now. Consensus is
basically there. Implementing it would cause two policy violations
(aspell-is and jpilot), which imo is little impact.

Helmut


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-01 Thread Russ Allbery
Don Armstrong  writes:
> On Fri, 29 Mar 2013, Russ Allbery wrote:

>> I think we should require UTF-8 as the character encoding for file
>> names and fix the non-UTF-8 file names in the archive currently.
>> None of the other courses of action really make any sense to me.

> I think we should also forbid the use of non ASCII file names in PATH
> and recommend that ASCII file names be used where possible, but I also
> agree that where ASCII cannot serve, only UTF-8 should be used.

Yes, those sound like good ideas to me too.

-- 
Russ Allbery (r...@debian.org)   


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-04-01 Thread Don Armstrong
On Fri, 29 Mar 2013, Russ Allbery wrote:
> I think we should require UTF-8 as the character encoding for file
> names and fix the non-UTF-8 file names in the archive currently.
> None of the other courses of action really make any sense to me.

I think we should also forbid the use of non ASCII file names in PATH
and recommend that ASCII file names be used where possible, but I also
agree that where ASCII cannot serve, only UTF-8 should be used.


Don Armstrong

-- 
Unix, MS-DOS, and Windows NT (also known as the Good, the Bad, and
the Ugly).
 -- Matt Welsh

http://www.donarmstrong.com  http://rzlab.ucr.edu


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-31 Thread Jonathan Nieder
Charles Plessy wrote:

> after more than one month of discussion, we have not reached a conclusion.
[...]
> Can others comment how they would like to see this bug solved ?

I think wording (requiring UTF-8 filenames) is probably the
appropriate next step.  Yes, maybe not everyone will agree on the
initial wording, but having a base to build on makes constructive
feedback a lot easier.

Some issues were mentioned before regarding different characters with
similar looking glyphs, normalization forms, and unusual characters
that are not widely supported.  But if the initial wording doesn't
manage to nudge the packager in the right direction on those issues, I
don't mind.

Thanks,
Jonathan


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-29 Thread Russ Allbery
Charles Plessy  writes:

> On my side, I made a proposal with actionable items: fix the few
> packages that are not using UTF-8, and modify the Policy to reflect the
> current practice of using ASCII in most of the times and other UTF-8
> characters parcimoniously.

> I understand very well the arguments against having any UTF-8 character
> at all, but we currently have such packages in our archive, so if there
> is no plan to modify these packages, then we can not plan to solve this
> bug.

> Can others comment how they would like to see this bug solved ?

I think we should require UTF-8 as the character encoding for file names
and fix the non-UTF-8 file names in the archive currently.  None of the
other courses of action really make any sense to me.

To me, that's obviously the right thing to do, so I have a hard time
stepping back far enough to even understand why it's an argument, I guess.
I certainly do agree that using non-ASCII characters in file names that
are unlikely to be in people's fonts or otherwise be difficult to display
is a problem, but I guess that seems like common sense.  But I don't mind
saying something to that effect in Policy.

We have files in the archive already using non-ASCII encodings, and asking
them to convert to ASCII feels like a real step back.

-- 
Russ Allbery (r...@debian.org)   


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-24 Thread Charles Plessy
Le Thu, Feb 21, 2013 at 12:43:28PM +0100, Helmut Grohne a écrit :
> 
> Apparently the debian-policy currently says nothing about the characters
> used in filenames contained in binary packages. Most packages use common
> sense and only use a small subset of US-ASCII. In Debian sid main most
> filenames can be represented using the following subset of US-ASCII
> characters (written as a regular expression):
> 
>   [][a-zA-Z0-9{}<>() ^/,=:&!*%#$~@+._-]
> 
> The number of exceptions is about 200 contained in about 50 binary
> packages. In those packages some filenames are not representable as
> UTF-8 (for example aspell-is) and others don't make any sense in
> ISO-8859-15 (for example ca-certificates).
> 
> It would be nice if some common ground concerning filename encoding
> could be reached. The options range from a rather restrictive definition
> of acceptable characters via requiring filenames to be representable in
> US-ASCII to mandating a particular encoding (such as UTF-8). This could
> be first introduced as a SHOULD and later turned into a MUST.
> 
> Personally I do not really care about what the precise restriction is as
> long as it permits a mechanical transformation to unicode.

Dear all,

after more than one month of discussion, we have not reached a conclusion.

In the current situation there is no policy, which means that everything is
allowed.  Indeed, there is at least one package with filenames using more than
one set of non-ASCII characters, so no user can see correctly the names of
every file in this package at the same time.

However, I think that it is clear from the discussion is that it would not
satisfy anybody if we would modify the Policy to implement the current
practice, that everything is permitted.

Given that this bug report asks for a policy about the encoding of filenames,
doing nothing is equivalent to reject it.  I therefore propose one more round
of concertation, and if it is not conclusive, I will tag this bug wontfix and
close it (we have 185 other bugs in the queue).

Of course, every developer is free to tackle the issue by working with all the
other package maintainers in order to change the current practice until it
matches something that we do not feel uncomfortable documenting in the Policy. 

On my side, I made a proposal with actionable items: fix the few packages that
are not using UTF-8, and modify the Policy to reflect the current practice
of using ASCII in most of the times and other UTF-8 characters parcimoniously.

I understand very well the arguments against having any UTF-8 character at all,
but we currently have such packages in our archive, so if there is no plan to
modify these packages, then we can not plan to solve this bug.

Can others comment how they would like to see this bug solved ?

Have a nice day,

-- 
Charles


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-17 Thread Bill Allombert
On Sat, Mar 16, 2013 at 03:28:30PM -0700, Russ Allbery wrote:
> Bill Allombert  writes:
> > On Sat, Mar 16, 2013 at 03:13:04PM -0700, Russ Allbery wrote:
> 
> >> Many were posted to this thread.  I guess I just disagree with you on
> >> whether those uses are "good."  For me, allowing the correct spellings
> >> of words and the correct names of things to be represented in file
> >> names is important enough to rise to an ethical goal that I would
> >> advocate adopting.  A pure ASCII stance feels like a very
> >> English-centric stance.
> 
> > Filename are not translatable, so a better mechanism is needed anyway.
> 
> This discussion isn't about translations, and I don't agree that they're
> relevant to this decision.

Precisely, the situation is very different. Instead of displaying text in the
user prefered language and scripts, UTF-8 filenames will be in an arbitrary
scripts which might not be well supported on the user terminal both for output
and input (which might miss support for the correct fonts, left-to-right
support, ligature, input methods etc.), and that the user might not know how to
spell.

And that assuming the user use UTF-8 locale (so the C locale does not qualify).
By contrast ASCII 7-bit is well supported by all Debian systems and is generally
sufficient to carry the small quantity of information needed by filenames, 
and in any case ASCII 7-bit is the current standard practice for filenames so 
users
are used to them.

I am concerned that UTF-8 filenames in binary packages might hamper the ability 
of the user/sysadmin to query and troubleshout their system, because the name
are not readable, cannot be typed in and cannot be googled easily.

Dealing with a system where ls -R /usr/share/foo report

%ls -R /usr/share/foo/
/usr/share/foo/:
??
??
?

?
???
??
??
?
??
???
???

???
???
???
??

/usr/share/foo/???/:
?
??


is likely to be painful.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-16 Thread Charles Plessy
Le Sat, Mar 16, 2013 at 10:58:11PM +0100, Bill Allombert a écrit :
> On Sat, Mar 09, 2013 at 10:51:45AM +0900, Charles Plessy wrote:
> > 
> > I think that it emerges from the discussion that there are good uses of
> > Unicode, and that somebody would need to step up and ensure that a dozen of
> > packages are corrected if we were to restrict further the encoding of file
> > names.  Moreover, there seems to be a good self-discipline, and Unicode is
> > not used in paths that are central on non-Unicode systems.
> 
> I have yet to see any good use of 8bit finename in Debian binary packages.

Hi Bill,

I undestand that you are critical with the idea of allowing 8bit file names.
I am sorry if my brief summary could have given the impression that there
is no "good" reason to refrain from using 8bit file names as well.

At that point of the discussion, I do not see new arguments being added.  We
therefore need to move towards the resolution of this bug.  I see the
possible outcomes.

 a) Status quo: currently there is no policy, and we can decide to not write
any policy instead of taking one that does not reach consensus. (not my
favorite).

 b) Disallow non-UTF-8 encodings.  This requires little work (which I started),
answers to the original issue raised in this bug (there is no policy), and
does not preclude further restrictions if there is consensus for doing so.

 c) Disallow non-ASCII encodings.  This requires more work, and I am fairly
confident to write that if nobody takes action and leads the correction of
the affected packages in the archive, nothing will happen and we will not
be able to make the corresponding change in the Policy.

If we would tackle the issue with a Condorcet vote, I think that b) would
be chosen, unless there are worries that once we reach b) it will not be
possible to propose c) anymore.  Personally, I trust that Debian's do-ocracy
will work well, and that if there are developers determined to propose c)
and make it happen if our community sees a benefit, being in the b) state
will be a bonus, not a drawback.  I nevertheless need to add that I personally
think that b) is better than c), as shown by the summary that I wrote with
too much bias (sorry again).

Shall we go for b) ?

Cheers,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-16 Thread Jonathan Nieder
Russ Allbery wrote:

> For me, allowing the correct spellings of
> words and the correct names of things to be represented in file names is
> important enough to rise to an ethical goal that I would advocate
> adopting.

This.  Among the examples listed the only one I found convincing was

Certinomis_-_Autorité_Racine.crt

For test cases, it seems more sensible to just use a tarball, since
restricting oneself to UTF-8 filenames hurts test coverage in the same
way as sticking to ASCII.  But naming files after real entities (like
Certinomis) is both harmless and a good application of a universal
character encoding.


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-16 Thread Russ Allbery
Bill Allombert  writes:
> On Sat, Mar 16, 2013 at 03:13:04PM -0700, Russ Allbery wrote:

>> Many were posted to this thread.  I guess I just disagree with you on
>> whether those uses are "good."  For me, allowing the correct spellings
>> of words and the correct names of things to be represented in file
>> names is important enough to rise to an ethical goal that I would
>> advocate adopting.  A pure ASCII stance feels like a very
>> English-centric stance.

> Filename are not translatable, so a better mechanism is needed anyway.

This discussion isn't about translations, and I don't agree that they're
relevant to this decision.

-- 
Russ Allbery (r...@debian.org)   


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-16 Thread Bill Allombert
On Sat, Mar 16, 2013 at 03:13:04PM -0700, Russ Allbery wrote:
> Bill Allombert  writes:
> > On Sat, Mar 09, 2013 at 10:51:45AM +0900, Charles Plessy wrote:
> 
> >> I think that it emerges from the discussion that there are good uses of
> >> Unicode, and that somebody would need to step up and ensure that a
> >> dozen of packages are corrected if we were to restrict further the
> >> encoding of file names.  Moreover, there seems to be a good
> >> self-discipline, and Unicode is not used in paths that are central on
> >> non-Unicode systems.
> 
> > I have yet to see any good use of 8bit finename in Debian binary packages.
> 
> Many were posted to this thread.  I guess I just disagree with you on
> whether those uses are "good."  For me, allowing the correct spellings of
> words and the correct names of things to be represented in file names is
> important enough to rise to an ethical goal that I would advocate
> adopting.  A pure ASCII stance feels like a very English-centric stance.

Filename are not translatable, so a better mechanism is needed anyway.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-16 Thread Russ Allbery
Bill Allombert  writes:
> On Sat, Mar 09, 2013 at 10:51:45AM +0900, Charles Plessy wrote:

>> I think that it emerges from the discussion that there are good uses of
>> Unicode, and that somebody would need to step up and ensure that a
>> dozen of packages are corrected if we were to restrict further the
>> encoding of file names.  Moreover, there seems to be a good
>> self-discipline, and Unicode is not used in paths that are central on
>> non-Unicode systems.

> I have yet to see any good use of 8bit finename in Debian binary packages.

Many were posted to this thread.  I guess I just disagree with you on
whether those uses are "good."  For me, allowing the correct spellings of
words and the correct names of things to be represented in file names is
important enough to rise to an ethical goal that I would advocate
adopting.  A pure ASCII stance feels like a very English-centric stance.

-- 
Russ Allbery (r...@debian.org)   


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-16 Thread Bill Allombert
On Sat, Mar 09, 2013 at 10:51:45AM +0900, Charles Plessy wrote:
> tag 701081 patch
> thanks
> 
> Dear all,
> 
> I think that it emerges from the discussion that there are good uses of
> Unicode, and that somebody would need to step up and ensure that a dozen of
> packages are corrected if we were to restrict further the encoding of file
> names.  Moreover, there seems to be a good self-discipline, and Unicode is
> not used in paths that are central on non-Unicode systems.

I have yet to see any good use of 8bit finename in Debian binary packages.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-16 Thread Charles Plessy
Le Sat, Mar 16, 2013 at 01:11:37PM +0100, Julien Cristau a écrit :
> On Sat, Mar  9, 2013 at 10:51:45 +0900, Charles Plessy wrote:
> > 
> > I think that it emerges from the discussion that there are good uses of
> > Unicode, and that somebody would need to step up and ensure that a dozen of
> > packages are corrected if we were to restrict further the encoding of file
> > names.  Moreover, there seems to be a good self-discipline, and Unicode is
> > not used in paths that are central on non-Unicode systems.
> > 
> > Given that currently the Policy does not mention anything about file names, 
> > I
> > think that it would be fair to fill the gap by documenting the use of 
> > Unicode
> > as current practice and recommend ASCII for most cases.  This does not 
> > preculde
> > further restrictions if needed.  I volunteer to contact the maintainer of
> > lletters-media and ooohg, the only packages with non-Unicode file names.
> > 
> > I attached a slightly updated patch.  I have not added that the policy is 
> > for
> > 'the files that have been created after the binary package is "Installed"',
> > because I think that it is clear throughrough chapter 10 that "installed 
> > files"
> > means this.  Nevertheless, it would be nice to have such a definition black 
> > on
> > white somewhere else, to be discussed in another thread.
> > 
> You say unicode everywhere but you seem to actually mean utf-8...

Indeed I meant UTF-8, sorry for being confusing.

The patch to the Policy already mentions UTF-8:

+  
+   File names
+
+   
+ The name of the files and directories installed by binary packages
+ must be encoded in UTF-8 and should be restricted to ASCII when they
+ can be represented in that character set.
+   

I have just opened #703177 on ooohg, and figured out that there is already
#659345 for lletters-media.

Have a nice week-end,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-16 Thread Julien Cristau
On Sat, Mar  9, 2013 at 10:51:45 +0900, Charles Plessy wrote:

> tag 701081 patch
> thanks
> 
> Dear all,
> 
> I think that it emerges from the discussion that there are good uses of
> Unicode, and that somebody would need to step up and ensure that a dozen of
> packages are corrected if we were to restrict further the encoding of file
> names.  Moreover, there seems to be a good self-discipline, and Unicode is
> not used in paths that are central on non-Unicode systems.
> 
> Given that currently the Policy does not mention anything about file names, I
> think that it would be fair to fill the gap by documenting the use of Unicode
> as current practice and recommend ASCII for most cases.  This does not 
> preculde
> further restrictions if needed.  I volunteer to contact the maintainer of
> lletters-media and ooohg, the only packages with non-Unicode file names.
> 
> I attached a slightly updated patch.  I have not added that the policy is for
> 'the files that have been created after the binary package is "Installed"',
> because I think that it is clear throughrough chapter 10 that "installed 
> files"
> means this.  Nevertheless, it would be nice to have such a definition black on
> white somewhere else, to be discussed in another thread.
> 
You say unicode everywhere but you seem to actually mean utf-8...

Cheers,
Julien


signature.asc
Description: Digital signature


Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-08 Thread Charles Plessy
tag 701081 patch
thanks

Dear all,

I think that it emerges from the discussion that there are good uses of
Unicode, and that somebody would need to step up and ensure that a dozen of
packages are corrected if we were to restrict further the encoding of file
names.  Moreover, there seems to be a good self-discipline, and Unicode is
not used in paths that are central on non-Unicode systems.

Given that currently the Policy does not mention anything about file names, I
think that it would be fair to fill the gap by documenting the use of Unicode
as current practice and recommend ASCII for most cases.  This does not preculde
further restrictions if needed.  I volunteer to contact the maintainer of
lletters-media and ooohg, the only packages with non-Unicode file names.

I attached a slightly updated patch.  I have not added that the policy is for
'the files that have been created after the binary package is "Installed"',
because I think that it is clear throughrough chapter 10 that "installed files"
means this.  Nevertheless, it would be nice to have such a definition black on
white somewhere else, to be discussed in another thread.

Have a nice week-end,

-- 
Charles
>From e0d128e11506ca142999bc79104cd1442d0105df Mon Sep 17 00:00:00 2001
From: Charles Plessy 
Date: Sun, 24 Feb 2013 12:11:00 +0900
Subject: [PATCH] Installed file names must be in UTF-8 and should use only
 ASCII characters.

Closes: #701081
---
 policy.sgml | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/policy.sgml b/policy.sgml
index d81891c..c619d44 100644
--- a/policy.sgml
+++ b/policy.sgml
@@ -9463,6 +9463,16 @@ done
 	  
 	
   
+
+  
+	File names
+
+	
+	  The name of the files and directories installed by binary packages
+	  must be encoded in UTF-8 and should be restricted to ASCII when they
+	  can be represented in that character set.
+	
+  
 
 
 
-- 
1.8.2.rc0



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-06 Thread Bill Allombert
On Wed, Mar 06, 2013 at 01:45:14PM +0900, Charles Plessy wrote:
> Le Sat, Mar 02, 2013 at 04:38:49PM +0100, Guillem Jover a écrit :
> > 
> > I'd second something like this, but I'd first like us to consider if
> > we really want any non-ASCII characters in filenames. Currently on sid
> > there does not appear to be many such filenames (64 from my check, if
> > that's not bogus):
> > 
> >   $ LC_ALL=C zgrep '[^[:print:]]' \
> > ftp.debian.org_debian_dists_sid_*_Contents-amd64.gz | wc -l
> 
> Hi Guillem and everybody,
> 
> I had a closer look at these files.
> 
>  * There are dictionaries where the filename is the native name of the
>language, like català, español, bokmål, etc.  In all the case the
>characters are valid Unicode.  I think that it would be fair to allow
>such cases.

This is not the current practice:
In /usr/share/dict/ and /usr/lib/ispell/, only bokmål is 8bit. 
Most dictionnary names are in English,
with sometime an alias in the language
(catala, dansk, foeroyskt, bokmål, svenska).

In /usr/lib/aspell/, most dictionnary are named using the ISO-639 2-letter code
or the english name. There are some non-english aliases like francais.alias,
which is missing the cedilla.  Only català, español and íslenska  are not 8bit.

So currently, there is no standard practice to name dictionnaries after the
UTF-8 encoding of the native spelling for the language, and it would be more
practical to standardize on ISO 639 language code instead.

>  * There are names that look rather arbitrary and replaceable
>with ASCII alternatives if needed.  For instance in python-pyramid,
>usr/lib/python2.6/dist-packages/pyramid/tests/fixtures/static/héhé.html

Probably some test files that could be removed form the binary packages.

>  * There are CA certificates with names like Certinomis_-_Autorité_Racine.crt.
>Since I do not know how these certificates work, I do not know if they
>can be renamed.

The main reason they have such name is to avoid name clash with other .crt file.

>  * There is a file that need to be in non-ASCII Unicode to fit its purpose:
>usr/share/doc/console-tools/examples/♪♬ in console-tools.  The package
>also distributes a file called README.strange-name in the same directory.

The value of such file is pretty low.

>  * There are some more dubious names like 6Sze¶æ_Jab³ek.png in lletters-media,
>or Miroir_Sphérique in optgeo.  However, they do not cause much 
> inconvenience
>with a Unicode locale.

Miroir_Sphe♦rique is a bug in itself: it should be
Miroir_Sphérique.
'6Sze¶æ_Jab³ek.png' is probably misencoded (it is intended to be 6 in Polish, 
i.e.
sześć).

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-05 Thread Russ Allbery
Charles Plessy  writes:

>  * There are names that look rather arbitrary and replaceable
>with ASCII alternatives if needed.  For instance in python-pyramid,
>usr/lib/python2.6/dist-packages/pyramid/tests/fixtures/static/héhé.html

At least some of these (for things located in a directory named tests) are
probably explicit tests of non-ASCII file names.

>  * There are CA certificates with names like Certinomis_-_Autorité_Racine.crt.
>Since I do not know how these certificates work, I do not know if they
>can be renamed.

This to me feels like a good use of Unicde.  One of the reasons why I'm in
favor of a general policy saying we should use UTF-8, rather than a policy
saying to use only ASCII names, is that names of things in the real world
(people and organizations) are often put into file names.  And it really
bothers me when we tell people they can't use their *actual* name or are
required to misspell it in some arbitrary way in order to shoehorn
themselves into ASCII.

In this case, I assume the name of the relevant certificate authority is
Certinomis - Autorité Racine.  I think it's quite reasonable to use the
actual name for the certificate authority in the file name.

>  * The pitivi package gives entries with no obvious Unicode characters,
>  like usr/share/gnome/help/pitivi/C/figures/codecscontainers.jpg.  I
>  think that we should at least strongly recommend that if a name looks
>  ASCII then it should be ASCII.

It's mildly difficult to be clear about this, since this can depend very
heavily on the font.  In general, the way this sort of requirement is
stated in the Unicode world is to require a normalized form, but I think
that's rather heavy-weight for what we're trying to accomplish.

But yes, we can just make a general (but not formally precise)
recommendation.

> Requiring that all file and directory names are encoded in Unicode and
> preferably in ASCII would therefore make only one package RC-buggy.
> Requiring all-ASCII would be also possible with a bit more work, but I
> am not sure that it would be worth the effort, as most of the current
> examples above do not require specialised fonts.  Altogether, there
> seems to be a good self-discipline.  However, if there are ways to test
> the following automatically, maybe we should consider requesting that
> what is displayed ASCII should be ASCII.

I think it's reasonable to say that file names that can be represented in
ASCII should be in ASCII.  But I do think that it's entirely reasonable to
use Unicode for names that truly aren't ASCII names, and it would bother
me to tell people to misspell those names to squeeze them into ASCII.

For the other half of what's been discussed, I don't think that Debian
should have a position about what's *inside* files other than files where
we're already standardizing the contents (such as the copyright file).
There may be reasons why files should be encoded in legacy encodings for
specific uses, and I don't feel like it's the proper role of Policy to
dictate to all package maintainers that they can't work with those use
cases.

-- 
Russ Allbery (r...@debian.org)   


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-05 Thread Charles Plessy
Le Sat, Mar 02, 2013 at 04:38:49PM +0100, Guillem Jover a écrit :
> 
> I'd second something like this, but I'd first like us to consider if
> we really want any non-ASCII characters in filenames. Currently on sid
> there does not appear to be many such filenames (64 from my check, if
> that's not bogus):
> 
>   $ LC_ALL=C zgrep '[^[:print:]]' \
> ftp.debian.org_debian_dists_sid_*_Contents-amd64.gz | wc -l

Hi Guillem and everybody,

I had a closer look at these files.

 * There are dictionaries where the filename is the native name of the
   language, like català, español, bokmål, etc.  In all the case the
   characters are valid Unicode.  I think that it would be fair to allow
   such cases.

 * There are names that look rather arbitrary and replaceable
   with ASCII alternatives if needed.  For instance in python-pyramid,
   usr/lib/python2.6/dist-packages/pyramid/tests/fixtures/static/héhé.html

 * There are CA certificates with names like Certinomis_-_Autorité_Racine.crt.
   Since I do not know how these certificates work, I do not know if they
   can be renamed.

 * There is a file that need to be in non-ASCII Unicode to fit its purpose:
   usr/share/doc/console-tools/examples/♪♬ in console-tools.  The package
   also distributes a file called README.strange-name in the same directory.

 * There are some more dubious names like 6Sze¶æ_Jab³ek.png in lletters-media,
   or Miroir_Sphérique in optgeo.  However, they do not cause much 
inconvenience
   with a Unicode locale.

 * The pitivi package gives entries with no obvious Unicode characters, like 
   usr/share/gnome/help/pitivi/C/figures/codecscontainers.jpg.
   I think that we should at least strongly recommend that if a name looks ASCII
   then it should be ASCII.

 * Lastly, there seems to be only a single package that ships non-Unicode 
filenames,
   non-free/ooohg with for instance 13_Afr dcol.gif.

Requiring that all file and directory names are encoded in Unicode and
preferably in ASCII would therefore make only one package RC-buggy.  Requiring
all-ASCII would be also possible with a bit more work, but I am not sure that it
would be worth the effort, as most of the current examples above do not require
specialised fonts.  Altogether, there seems to be a good self-discipline.
However, if there are ways to test the following automatically, maybe we should
consider requesting that what is displayed ASCII should be ASCII.

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-05 Thread Russ Allbery
Roger Leigh  writes:

> Could we remove the non-UTF-8 locales which /do/ have complete UTF-8
> coverage and replace them with aliases?  That would at least achieve the
> transition for the vast majority of locales.

I don't think this is a good idea.  There are a lot of legacy ISO 8859-1
or KOI8-R or SJIS documents out there and people who work with those
documents may prefer to continue to operate in that locale.

-- 
Russ Allbery (r...@debian.org)   


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-05 Thread Bill Allombert
On Tue, Mar 05, 2013 at 10:45:35AM +, Roger Leigh wrote:
> Which locales don't currently have charsets mapping to Unicode?
> 
> Could we remove the non-UTF-8 locales which /do/ have complete
> UTF-8 coverage and replace them with aliases?  That would at least
> achieve the transition for the vast majority of locales.

The way locales work, this would prevent people to read text files written
under such an encoding. Not an option I think.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-05 Thread Roger Leigh
On Tue, Mar 05, 2013 at 10:17:58AM +0100, Thomas Preud'homme wrote:
> Le mardi 5 mars 2013 01:16:52, Bill Allombert a écrit :
> > On Tue, Mar 05, 2013 at 12:06:06AM +, Roger Leigh wrote:
> > > We have defaulted to UTF-8 locales for over a decade now.  Unless
> > > there are compelling reasons not to use UTF-8 locales, maybe we
> > > could perhaps consider retiring them and having everything be
> > > UTF-8 by default at this point. If we do require this in
> > > userspace, then the naming restrictions could also be enforced
> > > in-kernel e.g. with create/open with O_CREAT to disallow non-UTF-8
> > > filename creation.
> > 
> > My understanding is that we are supporting some character set that are
> > still not included in unicode.
> 
> Forgive me if I missed something but it seems to me that even if we are 
> supporting only charset included in unicode, people could have files created 
> with another distribution / OS not encoded in UTF-8. So I don't think it's 
> possible / desirable to deny opening UTF-8 filename in the kernel.

For opening, this is is necessary for backward compatibility.  For
/creation/, we could certainly mandate UTF-8 for the addition of
new files, which is why I qualified with O_CREAT.  The same would
apply for other syscalls which create file paths (e.g. mknod, mkdir,
bind).  This would permit UTF-8 to be enforced going forward while
retaining a means for users to migrate broken naming to UTF-8.

Which locales don't currently have charsets mapping to Unicode?

Could we remove the non-UTF-8 locales which /do/ have complete
UTF-8 coverage and replace them with aliases?  That would at least
achieve the transition for the vast majority of locales.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linuxhttp://people.debian.org/~rleigh/
 `. `'   schroot and sbuild  http://alioth.debian.org/projects/buildd-tools
   `-GPG Public Key  F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-05 Thread Thomas Preud'homme
Le mardi 5 mars 2013 01:16:52, Bill Allombert a écrit :
> On Tue, Mar 05, 2013 at 12:06:06AM +, Roger Leigh wrote:
> > We have defaulted to UTF-8 locales for over a decade now.  Unless
> > there are compelling reasons not to use UTF-8 locales, maybe we
> > could perhaps consider retiring them and having everything be
> > UTF-8 by default at this point. If we do require this in
> > userspace, then the naming restrictions could also be enforced
> > in-kernel e.g. with create/open with O_CREAT to disallow non-UTF-8
> > filename creation.
> 
> My understanding is that we are supporting some character set that are
> still not included in unicode.

Forgive me if I missed something but it seems to me that even if we are 
supporting only charset included in unicode, people could have files created 
with another distribution / OS not encoded in UTF-8. So I don't think it's 
possible / desirable to deny opening UTF-8 filename in the kernel.

> 
> Cheers,

Best regards,

Thomas


signature.asc
Description: This is a digitally signed message part.


Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-04 Thread Bill Allombert
On Tue, Mar 05, 2013 at 12:06:06AM +, Roger Leigh wrote:
> On Sat, Mar 02, 2013 at 01:24:55PM +0100, Bill Allombert wrote:
> > I would like to see examples of UTF-8 filenames in source packages that are
> > not bugs and do not cause issues with some users before allowing them in
> > policy.  Policy still allow to use non utf-8 locales.

Hu I meant binary packages.

> We have defaulted to UTF-8 locales for over a decade now.  Unless
> there are compelling reasons not to use UTF-8 locales, maybe we
> could perhaps consider retiring them and having everything be
> UTF-8 by default at this point. 

My understanding is that we are supporting some character set that are still 
not 
included in unicode.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-04 Thread Roger Leigh
On Sat, Mar 02, 2013 at 01:24:55PM +0100, Bill Allombert wrote:
> I would like to see examples of UTF-8 filenames in source packages that are
> not bugs and do not cause issues with some users before allowing them in
> policy.  Policy still allow to use non utf-8 locales.

We have defaulted to UTF-8 locales for over a decade now.  Unless
there are compelling reasons not to use UTF-8 locales, maybe we
could perhaps consider retiring them and having everything be
UTF-8 by default at this point.  If we do require this in
userspace, then the naming restrictions could also be enforced
in-kernel e.g. with create/open with O_CREAT to disallow non-UTF-8
filename creation.  This would bring some much needed sanity to
filename handling, so it's a wider issue than just what's
permitted in packages.

WRT the point about allowing non-UTF-8 filenames for purposes
such as testsuites, if we require UTF-8 across the board, such
tests become unnecessary ;-)


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linuxhttp://people.debian.org/~rleigh/
 `. `'   schroot and sbuild  http://alioth.debian.org/projects/buildd-tools
   `-GPG Public Key  F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-02 Thread Guillem Jover
Hi!

On Sun, 2013-02-24 at 11:54:01 +0900, Charles Plessy wrote:
> This could be done by an addition like the following, after section 10.9
> (Permissions and owners).  The wording is still a bit clumsy also, I am not
> sure if "installed" includes files created by maintainer scripts (which would
> be the intent here).  I named the section "File names", and not "File name
> character set", in case we would add other restrictions (such as length) in 
> the
> future.

To make the installed situation pretty clear, it might make sense to
say something along the lines: «the files that have been created after
the binary package is "Installed"».

> +  
> +   File names
> +
> +   
> + The name of the files installed by binary packages must be encoded 
> in
> + UTF-8 and should be restricted to ASCII unless there is a justified
> + need for using other characters.
> +   
> +  
> 
> Some packages do not comply with the above.  Given the pace of the releases
> of the Policy, I am not sure that it is worth having first a should and then
> a must, if you or somebody else would have the time to tackle the issue
> after the Wheezy release.

I'd second something like this, but I'd first like us to consider if
we really want any non-ASCII characters in filenames. Currently on sid
there does not appear to be many such filenames (64 from my check, if
that's not bogus):

  $ LC_ALL=C zgrep '[^[:print:]]' \
ftp.debian.org_debian_dists_sid_*_Contents-amd64.gz | wc -l

> By the way, how about directories ?

This is a matter of terminology, directories are also filenames, and
part of pathnames, which point to a directory instead of a file. I
don't see why we'd want to exclude directories from filenames.

Thanks,
Guillem


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-02 Thread Guillem Jover
On Sat, 2013-02-23 at 13:31:32 +0900, Charles Plessy wrote:
> Le Thu, Feb 21, 2013 at 03:48:15PM +0100, Bill Allombert a écrit :
>  - Is there anybody following the preparation of the FHS 3.0 or the LSB, who
>could tell us if a broader guideline on name encoding for files distributed
>in core directories is under discussion there ?

I think the new FHS version is currently stalled, so I'd not expect
any update in the near future.

Thanks,
Guillem


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-02 Thread Bill Allombert
On Sat, Feb 23, 2013 at 08:02:10AM +0100, Helmut Grohne wrote:
> On Sat, Feb 23, 2013 at 01:31:32PM +0900, Charles Plessy wrote:
> >  - There are here and there discussions raising possible corner cases
> >where distributing files with a name not representable in UTF-8 might
> >be justified, for instance in test suites.
> 
> Even though the general argument is correct, the particular example
> probably applies to source packages in most cases. We don't control
> source packages (unless we repack them), so I think they should not be
> covered by a filename encoding policy.

Agreed.

> >  - Similar discussion also took place in #99933.  I wonder about merging 
> > this
> >bug (#701081) and #99933.
> 
> I stumbled upon this bug before reporting this one and decided that the
> issues were sufficiently separate from each other to warrant a new bug
> number. I did not read the full bug log and therefore did not discover
> that its scope widened to filenames as well. The discussion found
> therein clearly is valuable. I still think that separating bugs for
> filename encoding and file content encoding is a good idea, because
> those issues can be solved independently. That said merging also makes
> sense to point to the rest of the discussion. In the latter case, please
> select a better summary message.
> 
> I have to admit, that I am slightly in favour of just copying Fedora's
> approach. Making distributions more compatible with each other seems
> like a worthwhile thing to do.

I would like to see examples of UTF-8 filenames in source packages that are not
bugs and do not cause issues with some users before allowing them in policy.
Policy still allow to use non utf-8 locales.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-03-02 Thread Charles Plessy
Le Sun, Feb 24, 2013 at 11:54:01AM +0900, Charles Plessy a écrit :
> Le Sat, Feb 23, 2013 at 08:02:10AM +0100, Helmut Grohne a écrit :
> > 
> > I have to admit, that I am slightly in favour of just copying Fedora's
> > approach. Making distributions more compatible with each other seems
> > like a worthwhile thing to do.
 
> This could be done by an addition like the following, after section 10.9
> 
> +  
> +   File names
> +
> +   
> + The name of the files installed by binary packages must be encoded 
> in
> + UTF-8 and should be restricted to ASCII unless there is a justified
> + need for using other characters.
> +   
> +  
 
> By the way, how about directories ?

Related to this, I just found the following in 
/usr/share/doc/dpkg-dev/triggers.txt.gz.

Because of the restriction on trigger names, it is not possible to
declare a file trigger for a directory whose name contains whitespace,
i18n characters, etc.  Such a trigger should not be necessary.

Cheers,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-02-23 Thread Charles Plessy
Le Sat, Feb 23, 2013 at 08:02:10AM +0100, Helmut Grohne a écrit :
> 
> I have to admit, that I am slightly in favour of just copying Fedora's
> approach. Making distributions more compatible with each other seems
> like a worthwhile thing to do.

This could be done by an addition like the following, after section 10.9
(Permissions and owners).  The wording is still a bit clumsy also, I am not
sure if "installed" includes files created by maintainer scripts (which would
be the intent here).  I named the section "File names", and not "File name
character set", in case we would add other restrictions (such as length) in the
future.

+  
+   File names
+
+   
+ The name of the files installed by binary packages must be encoded in
+ UTF-8 and should be restricted to ASCII unless there is a justified
+ need for using other characters.
+   
+  

Some packages do not comply with the above.  Given the pace of the releases
of the Policy, I am not sure that it is worth having first a should and then
a must, if you or somebody else would have the time to tackle the issue
after the Wheezy release.

By the way, how about directories ?

Have a nice week-end,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-02-22 Thread Helmut Grohne
Thanks for your comments.

On Sat, Feb 23, 2013 at 01:31:32PM +0900, Charles Plessy wrote:
>  - There are here and there discussions raising possible corner cases
>where distributing files with a name not representable in UTF-8 might
>be justified, for instance in test suites.

Even though the general argument is correct, the particular example
probably applies to source packages in most cases. We don't control
source packages (unless we repack them), so I think they should not be
covered by a filename encoding policy.

>  - Similar discussion also took place in #99933.  I wonder about merging this
>bug (#701081) and #99933.

I stumbled upon this bug before reporting this one and decided that the
issues were sufficiently separate from each other to warrant a new bug
number. I did not read the full bug log and therefore did not discover
that its scope widened to filenames as well. The discussion found
therein clearly is valuable. I still think that separating bugs for
filename encoding and file content encoding is a good idea, because
those issues can be solved independently. That said merging also makes
sense to point to the rest of the discussion. In the latter case, please
select a better summary message.

I have to admit, that I am slightly in favour of just copying Fedora's
approach. Making distributions more compatible with each other seems
like a worthwhile thing to do.

Helmut


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-02-22 Thread Charles Plessy
Le Thu, Feb 21, 2013 at 03:48:15PM +0100, Bill Allombert a écrit :
> On Thu, Feb 21, 2013 at 12:43:28PM +0100, Helmut Grohne wrote:
> > 
> > It would be nice if some common ground concerning filename encoding
> > could be reached. The options range from a rather restrictive definition
> > of acceptable characters via requiring filenames to be representable in
> > US-ASCII to mandating a particular encoding (such as UTF-8). This could
> > be first introduced as a SHOULD and later turned into a MUST.
> > 
> > Personally I do not really care about what the precise restriction is as
> > long as it permits a mechanical transformation to unicode.
> 
> I raised a similar issue in 
> http://lists.debian.org/debian-policy/2011/03/msg00212.html
> In most case, 8bit chars in filename are bugs.

Hello everybody,

quick notes in random order:

 - There are here and there discussions raising possible corner cases
   where distributing files with a name not representable in UTF-8 might
   be justified, for instance in test suites.

 - Fedora's policy is: "filenames that contain non-ASCII characters must be
   encoded as UTF-8. Since there's no way to note which encoding the filename
   is in, using the same encoding for all filenames is the best way to ensure
   users can read the filenames properly. If upstream ships filenames that are
   not encoded in UTF-8 you can use a utility like convmv (from the convmv
   package) to convert the filename in your %install section."

 - POSIX.1-2008, section 3.276 (Portable Filename Character Set), mentions:

   The set of characters from which portable filenames are constructed.
   
   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
   a b c d e f g h i j k l m n o p q r s t u v w x y z
   0 1 2 3 4 5 6 7 8 9 . _ -
   
   The last three characters are the , , and 
   characters, respectively.
   
   
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_276

 - Similar discussion also took place in #99933.  I wonder about merging this
   bug (#701081) and #99933.

 - Is there anybody following the preparation of the FHS 3.0 or the LSB, who
   could tell us if a broader guideline on name encoding for files distributed
   in core directories is under discussion there ?

Altogether, I think that it would be useful to have a policy on filename 
encoding.

Have a nice week-end,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-02-21 Thread Bill Allombert
On Thu, Feb 21, 2013 at 12:43:28PM +0100, Helmut Grohne wrote:
> Package: debian-policy
> Severity: wishlist
> 
> Apparently the debian-policy currently says nothing about the characters
> used in filenames contained in binary packages. Most packages use common
> sense and only use a small subset of US-ASCII. In Debian sid main most
> filenames can be represented using the following subset of US-ASCII
> characters (written as a regular expression):
> 
>   [][a-zA-Z0-9{}<>() ^/,=:&!*%#$~@+._-]
> 
> The number of exceptions is about 200 contained in about 50 binary
> packages. In those packages some filenames are not representable as
> UTF-8 (for example aspell-is) and others don't make any sense in
> ISO-8859-15 (for example ca-certificates).
> 
> It would be nice if some common ground concerning filename encoding
> could be reached. The options range from a rather restrictive definition
> of acceptable characters via requiring filenames to be representable in
> US-ASCII to mandating a particular encoding (such as UTF-8). This could
> be first introduced as a SHOULD and later turned into a MUST.
> 
> Personally I do not really care about what the precise restriction is as
> long as it permits a mechanical transformation to unicode.

I raised a similar issue in 
http://lists.debian.org/debian-policy/2011/03/msg00212.html
In most case, 8bit chars in filename are bugs.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-02-21 Thread Helmut Grohne
Package: debian-policy
Severity: wishlist

Apparently the debian-policy currently says nothing about the characters
used in filenames contained in binary packages. Most packages use common
sense and only use a small subset of US-ASCII. In Debian sid main most
filenames can be represented using the following subset of US-ASCII
characters (written as a regular expression):

[][a-zA-Z0-9{}<>() ^/,=:&!*%#$~@+._-]

The number of exceptions is about 200 contained in about 50 binary
packages. In those packages some filenames are not representable as
UTF-8 (for example aspell-is) and others don't make any sense in
ISO-8859-15 (for example ca-certificates).

It would be nice if some common ground concerning filename encoding
could be reached. The options range from a rather restrictive definition
of acceptable characters via requiring filenames to be representable in
US-ASCII to mandating a particular encoding (such as UTF-8). This could
be first introduced as a SHOULD and later turned into a MUST.

Personally I do not really care about what the precise restriction is as
long as it permits a mechanical transformation to unicode.

Helmut


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org