Re: UTF-8 in jessie

2013-08-12 Thread Charles Plessy
Le Tue, Aug 13, 2013 at 08:12:24AM +0200, Christian PERRIER a écrit :
> Quoting Charles Plessy (ple...@debian.org):
> 
> > About display by GUIs, I think that we should have a system to install all 
> > the
> > fonts necessary to display languages that we support at the installation.
> 
> 
> Such as tasksel and its language tasks? :-)
> 
> In short, we already have that. However, we need people to maintain
> that, namely to decide what fonts should be installed when a given
> language is chosen at install time.

Hi Christian,

what I am proposing is a task that install all languages.  I made a bit of
research earlier, and it is not as simple as installing all the existing tasks,
as the result on my computer was that some browsers started to display Japanese
texts with simplified Chinese glyphs.

http://bugs.debian.org/702050

Unfortunately, I did not get answer.  Feedback is much welcome.

Cheers,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130813063123.ge6...@falafel.plessy.net



Re: UTF-8 in jessie

2013-08-12 Thread Christian PERRIER
Quoting Charles Plessy (ple...@debian.org):

> About display by GUIs, I think that we should have a system to install all the
> fonts necessary to display languages that we support at the installation.


Such as tasksel and its language tasks? :-)

In short, we already have that. However, we need people to maintain
that, namely to decide what fonts should be installed when a given
language is chosen at install time.

This is usually asked to contributing new translators of D-I and,
therefore, most languages that are not Latin- based will
trigger the installation of a font package that is suitable for them.

I also try to move the maintenance of such font packages under the
(large) umbrella of the pkg-fonts maintenance team, as the maintenance
of font packages is usually very loose.



signature.asc
Description: Digital signature


Re: UTF-8 in jessie (debhelper and BOM)

2013-08-12 Thread Osamu Aoki
Hi,

UTF-8 is a good goal indeed as principle.  

(I agree but I am struggling to update package documentation since
Japanese are known to be tough (JIS 2022/EUCJP/SHIFT-JIS/... are used)
EUC/SHIFT-JIS mixed case  can be confused with LATIN-1 easily. )

But I do not understand goal #5.  Why "MUST"?  Do you have rationale?

On Mon, Aug 12, 2013 at 03:50:19PM +0200, Florian Lohoff wrote:
> On Mon, Aug 12, 2013 at 02:51:52AM +0200, Adam Borowski wrote:
> > I propose the following sub-goals:
...
> > 4. all text files should be encoded in UTF-8

Yes.  But it will be nice to have some support by dh_installdocs :-)
  ^^

> 5. All programs consuming UTF8 Text must understand a BOM.
  

I agree as "SHOULD" but should we state "MUST"? 

After all BOM has no value in UTF-8 except to upset some programs.  
See Wikipedia page: http://en.wikipedia.org/wiki/Byte_order_mark

 | The Unicode Standard permits the BOM in UTF-8, but does not require
 | or recommend its use. Byte order has no meaning in UTF-8 ...
(pointer to the Unicode document is listed there.)

If it is only for the first byte, it is relatively easy.  But there are
text data with bogus BOM in the content.  Should program understand them
to be safe, too?

FYI: I had problem recently for PO files containing lots of BOM inside
of a text file which broke running XaTeX.  Please note TeX family of
programs have more elaborate character support than Unicode only UTF-8.
I would rather have XeTeX ...)  To me, program to filter such BOM will
be nice.  But we should not shoot a good UTF-8 program for stupid BOM
containing UTF-8 data.

Osamu



-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130813044403.GB19557@goofy.localdomain



Bug#719556: ITP: logcat-color -- a colorful alternative to "adb logcat"

2013-08-12 Thread Luke Faraone
Package: wnpp
Severity: wishlist
Owner: Luke Faraone 

* Package name: logcat-color
  Version : 0.5
  Upstream Author : Marshall Culpepper 
* URL : https://github.com/marshall/logcat-color
* License : Apache-2.0
  Programming Lang: Python
  Description : a colorful alternative to "adb logcat"

This package is designed to be used in conjunction with the Android
"adb" utility to view logs on an Android device or emulator.

logcat-color is highly configurable and is compatible with upstream
logcat's command-line flags.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20130813031414.19705.12180.report...@cobalt.mit.edu



Re: Switching default dpkg-source compressor for V2+ formats to xz

2013-08-12 Thread Mike Hommey
On Tue, Aug 13, 2013 at 03:52:51AM +0200, Guillem Jover wrote:
> Hi!
> 
> I'd like to switch the default dpkg-source compressor to xz for V2+
> (not for V1) source formats, as suggested by Ansgar Burchardt in [0].
> 
>   [0] 
> 
> After having switched the default dpkg-deb compressor to xz in 1.17.0,
> it only makes sense to update the new source formats too, more so when
> an increasing number of packages are getting switched manually. And
> given that there should be less of an issue wrt compatibility with
> other systems, compared to .deb packages.
> 
> So if there's no strong opposition, this would probably happen around
> dpkg 1.17.3 or thereabouts.

Is there actually much to win with this, since this is going to only
affect -debian.tar.*?

Mike


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130813021216.ga29...@glandium.org



Switching default dpkg-source compressor for V2+ formats to xz

2013-08-12 Thread Guillem Jover
Hi!

I'd like to switch the default dpkg-source compressor to xz for V2+
(not for V1) source formats, as suggested by Ansgar Burchardt in [0].

  [0] 

After having switched the default dpkg-deb compressor to xz in 1.17.0,
it only makes sense to update the new source formats too, more so when
an increasing number of packages are getting switched manually. And
given that there should be less of an issue wrt compatibility with
other systems, compared to .deb packages.

So if there's no strong opposition, this would probably happen around
dpkg 1.17.3 or thereabouts.

Thanks,
Guillem


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130813015251.ga4...@gaara.hadrons.org



Re: UTF-8 in jessie

2013-08-12 Thread Charles Plessy
Le Mon, Aug 12, 2013 at 03:55:03PM +0200, Adam Borowski a écrit :
> On Mon, Aug 12, 2013 at 09:58:30AM +0200, Niels Thykier wrote:
> > For the record, there is a Lintian tag for this now[1], which suggests
> > only a handful of packages violates this.
> > 
> > >  - Recommend ASCII when possible.
> > >  - Require ASCII for files in /bin, /sbin, /usr/bin, /usr/sbin and 
> > > /usr/games.
> > 
> > Requiring ASCII for files in $PATH should be trivial to implement as a
> > separate tag.  I suppose the ASCII requirement could also be implemented
> > as a pedantic check or so.  Regardless, patches welcome.  :)
> 
> I disagree here: I'd want to remove any need for that recommendation
> instead.  You might have a point about files in $PATH, though.

Le Mon, Aug 12, 2013 at 03:56:48PM +, Thorsten Glaser a écrit :
> 
> I disagree with requiring ASCII for $PATH though…

Hi Adam, Thorsten, and everybody,

To my knowledge, in Unstable there is currently no filename in the PATH that is
not encoded in plain ASCII.  The rationale for codifying this practice into a
requirement is to ensure that on multi-user systems, the administrator and the
users will not encounter commands that they can not display or can not type.

For file names outside the PATH, the recommendation to use ASCII when possible
should not be interpreted in an overly restrictive way: there are also good
reasons for using UTF-8 characters that are not in ASCII.

See http://bugs.debian.org/701081 for further discussion.

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130812235702.gb9...@falafel.plessy.net



Re: UTF-8 in jessie

2013-08-12 Thread Vincent Lefevre
On 2013-08-12 20:14:30 +0100, Dmitrijs Ledkovs wrote:
> What about locales though?
> 
> * C.utf8 locale should be always available
> * C.utf8 locale should be the default/fallback locale
> * utf8 locale variants should be default / available / preferred
> (where appropriate)

If scripts intend to use LC_ALL=C.UTF-8 to force everything to
the standard locale with UTF-8 support, then the glibc should
be modified to regard C.UTF-8 like C w.r.t. $LANGUAGE. I mean:

xvii% LANGUAGE=fr_FR LC_ALL=C.UTF-8 cp
cp: opérande de fichier manquant
Saisissez « cp --help » pour plus d'informations.
xvii% LANGUAGE=fr_FR LC_ALL=C cp  
cp: missing file operand
Try 'cp --help' for more information.

Both should have output in English.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130812215017.gb22...@xvii.vinc17.org



Re: UTF-8 in jessie

2013-08-12 Thread Vincent Lefevre
On 2013-08-12 17:58:20 +0200, Adam Borowski wrote:
> On Mon, Aug 12, 2013 at 03:50:19PM +0200, Florian Lohoff wrote:
> > 5. All programs consuning UTF8 Text must understand a BOM.
> 
> I'm afraid I don't agree here: BOMs are nasty stuff that serve no purpose
> once you standardize on UTF8.  They might help with exchange with a minority
> of Windows programs, at a cost at our side.  Windows hardly does plain text:
> most of that is MSVC/etc sources, but then, the C/C++ standards explicitely
> forbid junk in places other than comments.  Most other languages expect a
> hashbang on Unix, which makes BOMs impossible.

I think that BOM has more drawbacks than advantages. It could
be useful only if there were an API to handle it correctly and
transparently, and if the current API's (open(), fopen(), etc.)
were no longer used. Basically this means that one would need a
new OS. This would also mean that a BOM could be seen as some
kind of metadata used by the new API, and having the charset in
the metadata would actually make BOM completely useless.

> Other reasons:
> * concatenating files adds a misplaced BOM
> * taking stuff from the middle loses them
> * tools like grep, patch, etc pick and insert lots of individual lines
> * tools that don't care about encodings would need to learn about them
> * files that appear the same will have a different hash due to presence or
>   absence of an invisible character that can appear/disappear with no
>   explicit request on the user's part
> * with UTF-8, we're 95% there.  For BOMs, there's almost no support.

This would also affect regexp, e.g. "^foo" on the first line of a file.

> So I'm strongly against producing BOMs.  As for accepting them, there's
> little that can break so it would be mostly ok... but certainly not as
> a "must" clause.

Agreed.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130812214212.ga22...@xvii.vinc17.org



Bug#719526: ITP: attic -- deduplicating backup program

2013-08-12 Thread Clint Adams
Package: wnpp
Severity: wishlist
Owner: Clint Adams 

* Package name: attic
  Version : 0.7
  Upstream Author : Jonas Borgström 
* URL : https://pythonhosted.org/Attic/
* License : BSD-3-clause
  Programming Lang: Python
  Description : deduplicating backup program

 Attic is a deduplicating backup program written in Python. The main
 goal of Attic is to provide an efficient and secure way to backup
 data. The data deduplication technique used makes Attic suitable for
 daily backups since only actual changes are stored.


This ITP is blocked by #719525.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130812192836.ga20...@scru.org



Re: UTF-8 in jessie

2013-08-12 Thread Dmitrijs Ledkovs
On 12 August 2013 01:51, Adam Borowski  wrote:
> On Mon, May 06, 2013 at 02:49:57PM +0200, Andreas Beckmann wrote:
> I propose the following sub-goals:
>
> 1. all programs should, in their default configuration, accept UTF-8 input
>and pass it through uncorrupted.  Having to manually specify encoding
>is acceptable only in a programmatic interface, GUI/std{in,out,err}/
>command line/plain files should work with nothing but LC_CTYPE.
>
> 2. all GUI/curses/etc programs should be able to display UTF-8 output where
>appropriate
>
> 3. all file names must be valid UTF-8
>
> 4. all text files should be encoded in UTF-8
>

What about locales though?

* C.utf8 locale should be always available
* C.utf8 locale should be the default/fallback locale
* utf8 locale variants should be default / available / preferred
(where appropriate)

(this is rough idea, adjust above as appropriate & feasible at this
point in time)

Regards,

Dmitrijs.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/canbhluhz9rezyipz1ze5zrptb5zccrspmdf9aoaygqyjvk1...@mail.gmail.com



Re: UTF-8 in jessie

2013-08-12 Thread Vincent Lefevre
On 2013-08-12 15:16:59 +0200, Adam Borowski wrote:
> On Mon, Aug 12, 2013 at 12:50:35PM +0200, Vincent Lefevre wrote:
> > On 2013-08-12 02:51:52 +0200, Adam Borowski wrote:
> > > Detecting non-UTF files is easy:
> > > * false positives are impossible
> > > * false negatives are extremely unlikely: combinations of letters that 
> > > would
> > >   happen to match a valid utf character don't happen naturally, and even 
> > > if
> > >   they did, every single combination in the file tested would need to 
> > > match
> > >   valid utf.
> > 
> > Not that unlikely, and it is rather annoying that Firefox (and
> > therefore Iceweasel) gets this wrong due to an ambiguity with TIS-620.
> > IMHO, in case of ambiguity, UTF-8 should always be preferred by
> > default (applications could have options to change the preferences).
> 
> That's the opposite of what I'm talking about: it is hard to reliably detect
> ancient encodings, because they tend to assign a character to every possible
> bit stream.  On the other hand, only certain combinations of bytes with the
> 8th bit set are valid UTF-8, and thus it is possible to detect UTF-8 with
> good accuracy.  It is obviously trivial to fool such detection deliberately,
> but such combinations don't happen in real languages, and thus if something
> validates as UTF-8, it is safe to assume it indeed is.

I don't know about the exact cause making Firefox to recognize some file
as TIS-620 instead of UTF-8, but it is fooled and not deliberately.

> > > On the other hand, detecting text files is hard.
> > 
> > Deciding whether a file is a text file may be hard even for a human.
> > What about text files with ANSI control sequences?
> 
> Same as, say, a Word97 document: not text for my purposes.  It might be
> just coloured plain text, but there is no generic way to handle that.

I think I've already seen such files as distributed text files
(documentation), or perhaps there were just backspace characters
to get bold (x\bx) and underline (x\b_). The less utility can
handle them.

> > I think better questions could be: why do you want to regard a file as
> > text? For what purpose(s)? For the "all shipped text files in UTF-8"
> > rule only?
> 
> A shipped config file will have some settings the user may edit and comments
> he may read.  Being able to see what's going on is a prerequisite here.

However some config files may be byte-oriented (like procmailrc, AFAIK).

> HTML can include http-equiv which take care of rendering, but editing is
> still a problem.  And if you edit it, or, say, fill in some fields from a
> database, you risk data loss.  If everything is UTF-8 end-to-end, this risk
> goes away.  (I do care about plain text more, though.)

You may still have NFC/NFD problems (this is also true for filenames).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130812172807.ga2...@ioooi.vinc17.net



Re: UTF-8 in jessie

2013-08-12 Thread Adam Borowski
On Mon, Aug 12, 2013 at 03:50:19PM +0200, Florian Lohoff wrote:
> 5. All programs consuning UTF8 Text must understand a BOM.

I'm afraid I don't agree here: BOMs are nasty stuff that serve no purpose
once you standardize on UTF8.  They might help with exchange with a minority
of Windows programs, at a cost at our side.  Windows hardly does plain text:
most of that is MSVC/etc sources, but then, the C/C++ standards explicitely
forbid junk in places other than comments.  Most other languages expect a
hashbang on Unix, which makes BOMs impossible.

Other reasons:
* concatenating files adds a misplaced BOM
* taking stuff from the middle loses them
* tools like grep, patch, etc pick and insert lots of individual lines
* tools that don't care about encodings would need to learn about them
* files that appear the same will have a different hash due to presence or
  absence of an invisible character that can appear/disappear with no
  explicit request on the user's part
* with UTF-8, we're 95% there.  For BOMs, there's almost no support.

So I'm strongly against producing BOMs.  As for accepting them, there's
little that can break so it would be mostly ok... but certainly not as
a "must" clause.


-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130812155820.ga31...@angband.pl



Re: Bug#719323: ITP: jackson-core -- fast and powerful JSON library for Java

2013-08-12 Thread Stephen Nelson
On Sun, Aug 11, 2013 at 1:52 PM, Wolodja Wentland  wrote:
> I am not sure if it makes sense to adapt this naming scheme for source
> packages and I would personally very much prefer to use the name used by
> upstream. There are also quite a number of packages maintained by pkg-java
> that simply use the name used by upstream.
>
> The binary packages will, naturally, follow the naming conventions of pkg-java
> and this particular source package builds two binary packages, namely:
>
> * libjackson2-core-java
> * libjackson2-core-java-doc
>
> Please let me know if that is in line with your expectations and have a nice
> day!
> --
> Wolodja 
>
> 4096R/CAF14EFC
> 081C B7CD FF04 2BA9 94EA  36B2 8B7F 7D30 CAF1 4EFC

Hi Wolodja,

Thanks for the clarification. That sounds fine as it builds binaries
with the expected name. Please excuse my confusion of package name !=
binary name.

Thanks

Stephen



-- 
Stephen Nelson

T: 07595 300729
E: step...@eccostudio.com


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/CAHpHs3=cth27hbhouzgpgquva666k-tkgqnf1a4d4+bqo6a...@mail.gmail.com



Re: UTF-8 in jessie

2013-08-12 Thread Thorsten Glaser
Florian Lohoff  zz.de> writes:

> 5. All programs consuning UTF8 Text must understand a BOM.

The kernel doesn’t, start there:

tglase@tglase:~$ mksh -c 'print '\''\ufeff#!/bin/sh\necho foo'\' >x; chmod +x
x; ./x
./x: line 1: #!/bin/sh: No such file or directory
foo

That’s running GNU bash, with bash as /bin/sh for testing, which deviates
from my normal setup of running mksh… because I fixed mksh to support this
(and the MirBSD kernel, too).


I disagree with requiring ASCII for $PATH though…

bye,
//mirabilos


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/loom.20130812t175549-...@post.gmane.org



Re: UTF-8 in jessie

2013-08-12 Thread Florian Lohoff
On Mon, Aug 12, 2013 at 02:51:52AM +0200, Adam Borowski wrote:
> I propose the following sub-goals:
> 
> 1. all programs should, in their default configuration, accept UTF-8 input
>and pass it through uncorrupted.  Having to manually specify encoding
>is acceptable only in a programmatic interface, GUI/std{in,out,err}/
>command line/plain files should work with nothing but LC_CTYPE.
> 
> 2. all GUI/curses/etc programs should be able to display UTF-8 output where
>appropriate
> 
> 3. all file names must be valid UTF-8
> 
> 4. all text files should be encoded in UTF-8

5. All programs consuning UTF8 Text must understand a BOM.

Flo
-- 
Florian Lohoff f...@zz.de


signature.asc
Description: Digital signature


Re: UTF-8 in jessie

2013-08-12 Thread Adam Borowski
On Mon, Aug 12, 2013 at 09:58:30AM +0200, Niels Thykier wrote:
> For the record, there is a Lintian tag for this now[1], which suggests
> only a handful of packages violates this.
> 
> >  - Recommend ASCII when possible.
> >  - Require ASCII for files in /bin, /sbin, /usr/bin, /usr/sbin and 
> > /usr/games.
> 
> Requiring ASCII for files in $PATH should be trivial to implement as a
> separate tag.  I suppose the ASCII requirement could also be implemented
> as a pedantic check or so.  Regardless, patches welcome.  :)

I disagree here: I'd want to remove any need for that recommendation
instead.  You might have a point about files in $PATH, though.
 
> > About display by GUIs, I think that we should have a system to install all 
> > the
> > fonts necessary to display languages that we support at the installation.

Could be good, yeah.  At least something basic for every valid Unicode
character.

On the other hand, for me at least CJK doesn't functionally differ from
mojibake.  Which can lead to problems: on debconf 11 C&W party, the best
stuff came in a bottle marked only in Japanese, and thus I'll have to find
which one it was the hard way today :p

Jokes aside, enough of Unicode consists of line drawing, symbols and images
like 💩 U+1F4A9 PILE OF POO[2] that are readable by everyone with appropriate
fonts that we might as well just go for 100% coverage by default.  Disk
space is cheap even on weakest of today's phones, complex packaging with
moving parts has serious maintenance cost.


> [1] http://lintian.debian.org/tags/file-name-is-not-valid-UTF-8.html

[2] apt-get install ttf-ancient-fonts
Yeah, aptly named.

-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130812135503.ga24...@angband.pl



Re: UTF-8 in jessie

2013-08-12 Thread Adam Borowski
On Mon, Aug 12, 2013 at 12:50:35PM +0200, Vincent Lefevre wrote:
> On 2013-08-12 02:51:52 +0200, Adam Borowski wrote:
> > Detecting non-UTF files is easy:
> > * false positives are impossible
> > * false negatives are extremely unlikely: combinations of letters that would
> >   happen to match a valid utf character don't happen naturally, and even if
> >   they did, every single combination in the file tested would need to match
> >   valid utf.
> 
> Not that unlikely, and it is rather annoying that Firefox (and
> therefore Iceweasel) gets this wrong due to an ambiguity with TIS-620.
> IMHO, in case of ambiguity, UTF-8 should always be preferred by
> default (applications could have options to change the preferences).

That's the opposite of what I'm talking about: it is hard to reliably detect
ancient encodings, because they tend to assign a character to every possible
bit stream.  On the other hand, only certain combinations of bytes with the
8th bit set are valid UTF-8, and thus it is possible to detect UTF-8 with
good accuracy.  It is obviously trivial to fool such detection deliberately,
but such combinations don't happen in real languages, and thus if something
validates as UTF-8, it is safe to assume it indeed is.
 
> > On the other hand, detecting text files is hard.
> 
> Deciding whether a file is a text file may be hard even for a human.
> What about text files with ANSI control sequences?

Same as, say, a Word97 document: not text for my purposes.  It might be
just coloured plain text, but there is no generic way to handle that.
Binary formats go more into subgoal 1 of my proposal: arbitrary Unicode
input that matches your syntax should be accepted, and go out uncorrupted
(not the same as unmodified).
 
> > One could use location: like, declaring stuff in /etc/ and
> > /usr/share/doc/ to be text unless proven otherwise, but that's an
> > incomplete hack. Only hashbangs can be considered reliable, but
> > scripts are not where most documentation goes.
> > 
> > Also, should HTML be considered text or not?  Updating http-equiv is not
> > rocket surgery, detecting HTML with fancy extensions can be.
> 
> I think better questions could be: why do you want to regard a file as
> text? For what purpose(s)? For the "all shipped text files in UTF-8"
> rule only?

A shipped config file will have some settings the user may edit and comments
he may read.  Being able to see what's going on is a prerequisite here.

A perl/python/etc script is something our kind of folks often edit and/or
read.

A plain text file ships no encoding information, thus it can't be either
rendered nor edited comfortably if the encoding is different from the system
one.

HTML can include http-equiv which take care of rendering, but editing is
still a problem.  And if you edit it, or, say, fill in some fields from a
database, you risk data loss.  If everything is UTF-8 end-to-end, this risk
goes away.  (I do care about plain text more, though.)
 
> What about examples whose purpose is to have a file in a charset
> different from UTF-8?

Well, we don't convert those :)

I don't expect a package with a test suite that includes charset stuff to
make such an error by itself, but if there's a need, we could add a syntax
for exclusions.  For example, writing "verbatim" in the charset field.

> > 4a. perl and pod
> > 
> > Considering perl to be text raises one more issue: pod.  By perl's design,
> > pod without a specified encoding is considered to be ISO-8859-1, even if
> > the file contains "use utf8;".  This is surprising, and many authors use
> > UTF-8 like everywhere else, leading to obvious results ("man gdm3" for one
> > example).  Thus, there should be a tool (preferably the one mentioned
> > above) that checks perl files for pod with undeclared encoding, and raises
> > alarm if the file contains any bytes with high bit set.  If a conversion
> > encoding is specified, such a declaration could be added automatically.
> 
> Yes, undeclared encoding when not ASCII should be regarded as a bug.

And if it's declared but not UTF-8, I'd convert it at package build time.

-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130812131659.ga21...@angband.pl



Bug#719491: ITP: morris -- Nine men's morris game for the gnome desktop

2013-08-12 Thread Miriam Ruiz
Package: wnpp
Severity: wishlist
Owner: Miriam Ruiz 

* Package name: morris
  Version : 0.2
  Upstream Author : Dirk Farin 
* URL : http://nine-mens-morris.net/
* License : GPL-3+
  Programming Lang: C++
  Description : Nine men's morris game for the gnome desktop

 Morris is an implementation of the board game "Nine Men's Morris".
 Sometimes simply called Mills, Morris, Merrills, or Mühle in German.
 This implementation supports not only the standard game, but also
 several rule-variants and different boards. The game supports a 
 strong computer player which learns from past games played.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130812130734.32302.34304.reportbug@inanna



ITP: erfa -- Image visualization and access to catalogs and data for astronomy

2013-08-12 Thread Ole Streicher
Package: wnpp
Severity: wishlist
Owner: Ole Streicher 
C-Debbugs-Cc:  debian-devel@lists.debian.org,
debian-scie...@lists.debian.org

* Package name: erfa
  Version : None released yet
  Upstream Author : IAU SOFA Board
* URL : https://github.com/liberfa/erfa
* License : BSD3
  Programming Lang: C
  Description : Image visualization and access to catalogs and data
for astronomy
 ERFA is a C library containing key algorithms for astronomy, and is
 based on the SOFA library published by the International Astronomical
 Union (IAU).
 .
 It is intended to replicate the functionality of SOFA (aside from
 possible bugfixes in ERFA that have not yet been included in SOFA),
 but is licensed under a three-clause BSD license to enable its
 compatibility with a wide range of open source licenses. Permission
 for this release has been obtained from the SOFA board.

The intention of this package is to replace the iausofa_c package. Since
the prefix of all functions changed from iau_ to erfa_, the dependent
packages need to be patched to use erfa. These are the packages
"starlink-pal" and "python-astropy".

Best

Ole





-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/5208b233.1000...@liska.ath.cx



Re: UTF-8 in jessie

2013-08-12 Thread Vincent Lefevre
On 2013-08-12 02:51:52 +0200, Adam Borowski wrote:
> Detecting non-UTF files is easy:
> * false positives are impossible
> * false negatives are extremely unlikely: combinations of letters that would
>   happen to match a valid utf character don't happen naturally, and even if
>   they did, every single combination in the file tested would need to match
>   valid utf.

Not that unlikely, and it is rather annoying that Firefox (and
therefore Iceweasel) gets this wrong due to an ambiguity with TIS-620.
IMHO, in case of ambiguity, UTF-8 should always be preferred by
default (applications could have options to change the preferences).

Bug reports:
  https://bugzilla.mozilla.org/show_bug.cgi?id=760050
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=719481

> On the other hand, detecting text files is hard.

Deciding whether a file is a text file may be hard even for a human.
What about text files with ANSI control sequences?

> The best tool so far, "file", makes so many errors it's useless for
> this purpose.

Yes.

> One could use location: like, declaring stuff in /etc/ and
> /usr/share/doc/ to be text unless proven otherwise, but that's an
> incomplete hack. Only hashbangs can be considered reliable, but
> scripts are not where most documentation goes.
> 
> Also, should HTML be considered text or not?  Updating http-equiv is not
> rocket surgery, detecting HTML with fancy extensions can be.

I think better questions could be: why do you want to regard a file as
text? For what purpose(s)? For the "all shipped text files in UTF-8"
rule only?

What about examples whose purpose is to have a file in a charset
different from UTF-8?

> 4a. perl and pod
> 
> Considering perl to be text raises one more issue: pod.  By perl's design,
> pod without a specified encoding is considered to be ISO-8859-1, even if
> the file contains "use utf8;".  This is surprising, and many authors use
> UTF-8 like everywhere else, leading to obvious results ("man gdm3" for one
> example).  Thus, there should be a tool (preferably the one mentioned
> above) that checks perl files for pod with undeclared encoding, and raises
> alarm if the file contains any bytes with high bit set.  If a conversion
> encoding is specified, such a declaration could be added automatically.

Yes, undeclared encoding when not ASCII should be regarded as a bug.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130812105035.ga28...@xvii.vinc17.org



Bug#719480: ITP: libmodule-path-perl -- module to get the full path to a locally installed Perl module

2013-08-12 Thread Florian Schlichting
Package: wnpp
Severity: wishlist
Owner: Florian Schlichting 

* Package name: libmodule-path-perl
  Version : 0.09
  Upstream Author : Neil Bowers 
* URL : http://search.cpan.org/dist/Module-Path/
* License : GPL-1+, Artistic
  Programming Lang: Perl
  Description : module to get the full path to a locally installed Perl 
module

Module::Path provides a single function, module_path(), which will find
where a module is installed locally. It works by looking in all the
directories in @INC for an appropriately named file, returning the full
path when found and undef otherwise.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130812101007.10858.75097.reportbug@thinkpad



Re: UTF-8 in jessie

2013-08-12 Thread Niels Thykier
On 2013-08-12 04:18, Charles Plessy wrote:
> Le Mon, Aug 12, 2013 at 02:51:52AM +0200, Adam Borowski a écrit :
>>
>> I would like to propose full UTF-8 support.  I don't mean here full
>> support for all of Unicode's finer points, merely complete eradication of
>> mojibake.
> 
> Hi Adam,
> 

Hi,

> this is a great goal.  Here are two comments.
> 
> There is a related issue opened on the Policy 
> (),
> where we propose the following:
> 
>  - Require UTF-8 for the names of all files and directories installed by 
> binary packages.

For the record, there is a Lintian tag for this now[1], which suggests
only a handful of packages violates this.

>  - Recommend ASCII when possible.
>  - Require ASCII for files in /bin, /sbin, /usr/bin, /usr/sbin and /usr/games.
> 

Requiring ASCII for files in $PATH should be trivial to implement as a
separate tag.  I suppose the ASCII requirement could also be implemented
as a pedantic check or so.  Regardless, patches welcome.  :)

> About display by GUIs, I think that we should have a system to install all the
> fonts necessary to display languages that we support at the installation.
> 
> Have a nice Debconf !
> 

~Niels

[1] http://lintian.debian.org/tags/file-name-is-not-valid-UTF-8.html


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/520895a6.1090...@thykier.net



Re: Bug#719211: ITP: lnav -- nurses-based log file viewer

2013-08-12 Thread Salvatore Bonaccorso
Hi Chris

On Sat, Aug 10, 2013 at 06:42:13PM +1200, Chris Bannister wrote:
> On Fri, Aug 09, 2013 at 11:44:11AM +0200, Salvatore Bonaccorso wrote:
> > Package: wnpp
> > Severity: wishlist
> > Owner: Salvatore Bonaccorso 
> > 
> > * Package name: lnav
> >   Version : 0.5.0
> >   Upstream Author : Timothy Stack 
> > * URL : http://tstack.github.io/lnav/
> > * License : BSD
> >   Programming Lang: C++
> >   Description : nurses-based log file viewer
> 
> root@tal:~# apt-cache search ncurses | wc -l
> 147
> root@tal:~# apt-cache search nurses | wc -l
> 0
> root@tal:~#

Thanks for spotting this typo. I will fix this in the short
description of the package (not yet ready to upload).

Regards,
Salvatore


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130812072832.GA4917@eldamar.local