Bug#588990: libc-bin: iconv -l doesn't indicate aliases

2010-07-30 Thread Neil Mayhew

 On 2010-07-26 9:15 PM Aurelien Jarno wrote:

On Wed, Jul 14, 2010 at 11:53:56AM +0200, Aurelien Jarno wrote:
You have to be more specific about the problem, I don't see any 
change between glibc based version and eglibc based version beside a 
few more supported encoding.


glibc and eglibc don't differ on the iconv code.


I checked, and it seems I was getting confused between GNU libiconv 
http://www.gnu.org/software/libiconv/ and the glibc/eglibc 
implementation of iconv.


GNU libiconv outputs the following from iconv -l, for example:

ISO-10646-UCS-2 UCS-2 CSUNICODE
UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11
UCS-2LE UNICODELITTLE
ISO-10646-UCS-4 UCS-4 CSUCS4

This makes it clear which names are equivalents. The glibc/eglibc iconv 
just outputs these on separate lines. If it were possible to provide the 
libiconv functionality, maybe using an additional option to iconv, that 
would be helpful.


The bigger issue, however, is that glibc's iconv doesn't document what 
the various encoding names mean, *anywhere*. Something like CP1149 can 
be Googled and found in places like Wikipedia, but a name like UNICODE 
is very ambiguous, and odd names like CSUNICODE don't return anything 
very obvious in Google searches. In fact, the best description I found 
was in the documentation for an entirely different library, recode 
http://www.delorie.com/gnu/docs/recode/recode_30.html. I think 
(e)glibc should do its own documentation and not rely on other sources.


The GNU libiconv is slightly better, because output from iconv -l 
explains what CSUNICODE means by showing that it's the same as a 
well-defined, unambiguous encoding (ISO-10646-UCS-2).


However, neither library explains byte order anywhere. I can get BE or 
LE by specifying it explicitly in the encoding name, but typically I 
need to get native and I don't want to have to do a runtime test for 
endianness and then add it to the encoding name. How was I supposed to 
know that UCS-2 means native byte order rather than some canonical 
ordering such as big? Different iconv implementations actually differ on 
this. On Mac OS X on Intel with either the system iconv and the MacPorts 
version of GNU libiconv, UCS-2 actually means big-endian:


$ echo -ne '\xe2\x80\xa2' | iconv -f utf-8 -t ucs-2 | xxd
000: 2022

Running the same on Linux returns:
000: 2220

So if it's interpreted differently by different libraries, even though 
they all implement the same standard, shouldn't the behaviour on Linux 
be documented somewhere?



Any news about that?


Sorry for the delay. My email address forwards to gmail, which put both 
of your messages in the spam folder :-( Normally, gmail's spam detection 
is excellent so I don't bother to check it very often.


--Neil



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#588990: libc-bin: iconv -l doesn't indicate aliases

2010-07-30 Thread Neil Mayhew

 On 2010-07-30 6:14 PM Neil Mayhew wrote:
Different iconv implementations actually differ on this. On Mac OS X 
on Intel with either the system iconv and the MacPorts version of GNU 
libiconv, UCS-2 actually means big-endian.


I just built libiconv on Linux, and even there it treats UCS-2 as big 
endian even on a little-endian machine.


It may be right, too. I just found this on the Unicode Consortium web site:


Q: What does Unicode conformance require?
A: Chapter 3, Conformance discusses this in detail. Here's a very 
informal version:

...
If you don't know, assume big-endian.


http://www.unicode.org/faq/basic_q.html#11

--Neil



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#588990: libc-bin: iconv -l doesn't indicate aliases

2010-07-26 Thread Aurelien Jarno
On Wed, Jul 14, 2010 at 11:53:56AM +0200, Aurelien Jarno wrote:
 tag 58899 + moreinfo
 thanks
 
 On Tue, Jul 13, 2010 at 10:18:33PM -0600, Neil Mayhew wrote:
  Package: libc-bin
  Version: 2.11.2-2
  Severity: normal
  Tags: upstream
  
  
  Previously, before the switch from glibc to eglibc, iconv -l would show all
  the aliases for an encoding on the same line as the encoding. Now every
  encoding, whether primary or an alias, is on a separate line.
 
 You have to be more specific about the problem, I don't see any change
 between glibc based version and eglibc based version beside a few more
 supported encoding.
  
  POSIX doesn't specify the format of the output of iconv -l, but the previous
  behavior was helpful, and I understand that eglibc is trying to be closely
  compatible with glibc.
  
 
 glibc and eglibc don't differ on the iconv code.
 

Any news about that?

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#588990: libc-bin: iconv -l doesn't indicate aliases

2010-07-14 Thread Aurelien Jarno
tag 58899 + moreinfo
thanks

On Tue, Jul 13, 2010 at 10:18:33PM -0600, Neil Mayhew wrote:
 Package: libc-bin
 Version: 2.11.2-2
 Severity: normal
 Tags: upstream
 
 
 Previously, before the switch from glibc to eglibc, iconv -l would show all
 the aliases for an encoding on the same line as the encoding. Now every
 encoding, whether primary or an alias, is on a separate line.

You have to be more specific about the problem, I don't see any change
between glibc based version and eglibc based version beside a few more
supported encoding.
 
 POSIX doesn't specify the format of the output of iconv -l, but the previous
 behavior was helpful, and I understand that eglibc is trying to be closely
 compatible with glibc.
 

glibc and eglibc don't differ on the iconv code.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#588990: libc-bin: iconv -l doesn't indicate aliases

2010-07-13 Thread Neil Mayhew
Package: libc-bin
Version: 2.11.2-2
Severity: normal
Tags: upstream


Previously, before the switch from glibc to eglibc, iconv -l would show all
the aliases for an encoding on the same line as the encoding. Now every
encoding, whether primary or an alias, is on a separate line.

POSIX doesn't specify the format of the output of iconv -l, but the previous
behavior was helpful, and I understand that eglibc is trying to be closely
compatible with glibc.


-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (900, 'testing')
Architecture: i386 (i686)

Kernel: Linux 2.6.32-5-686-bigmem (SMP w/2 CPU cores)
Locale: LANG=en_CA.UTF-8, LC_CTYPE=en_CA.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

-- Configuration Files:
/etc/ld.so.conf.d/libc.conf changed [not included]

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org