Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-02-22 Thread Charles Plessy
Le Thu, Feb 21, 2013 at 03:48:15PM +0100, Bill Allombert a écrit :
 On Thu, Feb 21, 2013 at 12:43:28PM +0100, Helmut Grohne wrote:
  
  It would be nice if some common ground concerning filename encoding
  could be reached. The options range from a rather restrictive definition
  of acceptable characters via requiring filenames to be representable in
  US-ASCII to mandating a particular encoding (such as UTF-8). This could
  be first introduced as a SHOULD and later turned into a MUST.
  
  Personally I do not really care about what the precise restriction is as
  long as it permits a mechanical transformation to unicode.
 
 I raised a similar issue in 
 http://lists.debian.org/debian-policy/2011/03/msg00212.html
 In most case, 8bit chars in filename are bugs.

Hello everybody,

quick notes in random order:

 - There are here and there discussions raising possible corner cases
   where distributing files with a name not representable in UTF-8 might
   be justified, for instance in test suites.

 - Fedora's policy is: filenames that contain non-ASCII characters must be
   encoded as UTF-8. Since there's no way to note which encoding the filename
   is in, using the same encoding for all filenames is the best way to ensure
   users can read the filenames properly. If upstream ships filenames that are
   not encoded in UTF-8 you can use a utility like convmv (from the convmv
   package) to convert the filename in your %install section.

 - POSIX.1-2008, section 3.276 (Portable Filename Character Set), mentions:

   The set of characters from which portable filenames are constructed.
   
   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
   a b c d e f g h i j k l m n o p q r s t u v w x y z
   0 1 2 3 4 5 6 7 8 9 . _ -
   
   The last three characters are the period, underscore, and hyphen
   characters, respectively.
   
   
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_276

 - Similar discussion also took place in #99933.  I wonder about merging this
   bug (#701081) and #99933.

 - Is there anybody following the preparation of the FHS 3.0 or the LSB, who
   could tell us if a broader guideline on name encoding for files distributed
   in core directories is under discussion there ?

Altogether, I think that it would be useful to have a policy on filename 
encoding.

Have a nice week-end,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


--
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130223043132.gb1...@falafel.plessy.net



Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

2013-02-22 Thread Helmut Grohne
Thanks for your comments.

On Sat, Feb 23, 2013 at 01:31:32PM +0900, Charles Plessy wrote:
  - There are here and there discussions raising possible corner cases
where distributing files with a name not representable in UTF-8 might
be justified, for instance in test suites.

Even though the general argument is correct, the particular example
probably applies to source packages in most cases. We don't control
source packages (unless we repack them), so I think they should not be
covered by a filename encoding policy.

  - Similar discussion also took place in #99933.  I wonder about merging this
bug (#701081) and #99933.

I stumbled upon this bug before reporting this one and decided that the
issues were sufficiently separate from each other to warrant a new bug
number. I did not read the full bug log and therefore did not discover
that its scope widened to filenames as well. The discussion found
therein clearly is valuable. I still think that separating bugs for
filename encoding and file content encoding is a good idea, because
those issues can be solved independently. That said merging also makes
sense to point to the rest of the discussion. In the latter case, please
select a better summary message.

I have to admit, that I am slightly in favour of just copying Fedora's
approach. Making distributions more compatible with each other seems
like a worthwhile thing to do.

Helmut


-- 
To UNSUBSCRIBE, email to debian-policy-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130223070209.ga18...@alf.mars