Bug#468209: msgfmt: no documentation of the --endianness option

2011-11-10 Thread Jakub Wilk

* Santiago Vila sanv...@unex.es, 2011-11-03, 18:38:
With the advent of multi-arch, such behavior has become a problem. If 
a package is marked as Multi-Arch: same all the files (including 
*.mo) have to be identical across all architectures.


Hmm, why do they have to be identical?

It is not enough that both types of systems (big and little endian) are 
able to read and use both types of .mo files, as it seems to be the 
case?


If .mo files are useable everywhere, regardless of their endianess, I 
would say that the multi-arch requirement is not reasonable.


Multi-Arch: same makes it possible for users to install a package for 
more than one architecture at the same time. If files with same name are 
not identical across architectures, package manager has to resolve the 
conflict somehow, and it does it by simply aborting the installation, 
e.g. like that:

| # apt-get install -qq libavahi-common-data:powerpc
| (Reading database ... 59644 files and directories currently installed.)
| Unpacking libavahi-common-data:powerpc (from 
.../libavahi-common-data_0.6.30-5_powerpc.deb) ...
| dpkg: error processing 
/var/cache/apt/archives/libavahi-common-data_0.6.30-5_powerpc.deb (--unpack):
|  './usr/share/locale/he/LC_MESSAGES/avahi.mo' is different from the same file 
on the system
| configured to not write apport reports
|   dpkg-deb: error: subprocess paste was 
killed by signal (Broken pipe)
| Errors were encountered while processing:
|  /var/cache/apt/archives/libavahi-common-data_0.6.30-5_powerpc.deb
| E: Sub-process /usr/bin/dpkg returned an error code (1)

Does it make things clear?

Note that the problem would affect only tiny minority of packages: 
Multi-Arch: same is useful mainly for shared libraries and they rarely 
come with translations.


--
Jakub Wilk



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#468209: msgfmt: no documentation of the --endianness option

2011-11-10 Thread Santiago Vila

El 10/11/11 12:34, Jakub Wilk escribió:

* Santiago Vila sanv...@unex.es, 2011-11-03, 18:38:

With the advent of multi-arch, such behavior has become a problem. If
a package is marked as Multi-Arch: same all the files (including
*.mo) have to be identical across all architectures.


Hmm, why do they have to be identical?

It is not enough that both types of systems (big and little endian)
are able to read and use both types of .mo files, as it seems to be
the case?

If .mo files are useable everywhere, regardless of their endianess, I
would say that the multi-arch requirement is not reasonable.


Multi-Arch: same makes it possible for users to install a package for
more than one architecture at the same time. If files with same name are
not identical across architectures, package manager has to resolve the
conflict somehow, and it does it by simply aborting the installation,
e.g. like that:
| # apt-get install -qq libavahi-common-data:powerpc
| (Reading database ... 59644 files and directories currently installed.)
| Unpacking libavahi-common-data:powerpc (from
.../libavahi-common-data_0.6.30-5_powerpc.deb) ...
| dpkg: error processing
/var/cache/apt/archives/libavahi-common-data_0.6.30-5_powerpc.deb
(--unpack):
| './usr/share/locale/he/LC_MESSAGES/avahi.mo' is different from the
same file on the system
| configured to not write apport reports
| dpkg-deb: error: subprocess paste was killed by signal (Broken pipe)
| Errors were encountered while processing:
| /var/cache/apt/archives/libavahi-common-data_0.6.30-5_powerpc.deb
| E: Sub-process /usr/bin/dpkg returned an error code (1)

Does it make things clear?


Yes, I now see what the problem is, but I don't see that making every 
.mo file to be always little endian again is the best solution. We could 
also tell dpkg somehow that different files in /usr/share/locale are ok 
in this case.



Note that the problem would affect only tiny minority of packages:
Multi-Arch: same is useful mainly for shared libraries and they rarely
come with translations.


In such case, making those packages to depend on another Arch: all 
package containing just the translations would solve the issue, would it 
not?


(For the record, I happen to maintain a library containing translations, 
and I have always seen it as an anomaly, this would force me to do 
what I feel is the right thing).




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#468209: msgfmt: no documentation of the --endianness option

2011-11-03 Thread Jakub Wilk

* Santiago Vila sanv...@unex.es, 2008-02-28, 12:10:
msgfmt does support --endianness {big|little} from the source in 
gettext-tools/src/msgfmt.c


neil@dwarf:po$ file messages.mo
messages.mo: GNU message catalog (little endian), revision 0, 14
messages
neil@dwarf:po$ msgfmt --endianness big cs.po
neil@dwarf:po$ file messages.mo
messages.mo: GNU message catalog (big endian), revision 0, 14 messages
neil@dwarf:po$

Please document this in the manpage and ask upstream if it can also be 
output in the --help output.


(endianness is important when crossbuilding packages containing PO 
files.)


No, it is not. Binary .mo files as used by libc and gettext are always
little endian, regardless of the machine architecture.


Hmm, this doesn't seem to be the case (anymore?). As far as I can see, 
msgfmt produces files with native endianness.


With the advent of multi-arch, such behavior has become a problem. If a 
package is marked as Multi-Arch: same all the files (including *.mo) 
have to be identical across all architectures.


Either the --endianness option should be documented (so that M-A:same 
packages could use when needed), or msgfmt should produce little-endian 
files even on big-endian architectures. Please tell if I should reopen 
this bug, or rather file a new one requesting using little-endian 
everywhere.


--
Jakub Wilk



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#468209: msgfmt: no documentation of the --endianness option

2011-11-03 Thread Santiago Vila

El 03/11/11 16:24, Jakub Wilk escribió:

* Santiago Vila sanv...@unex.es, 2008-02-28, 12:10:

msgfmt does support --endianness {big|little} from the source in
gettext-tools/src/msgfmt.c

neil@dwarf:po$ file messages.mo
messages.mo: GNU message catalog (little endian), revision 0, 14
messages
neil@dwarf:po$ msgfmt --endianness big cs.po
neil@dwarf:po$ file messages.mo
messages.mo: GNU message catalog (big endian), revision 0, 14 messages
neil@dwarf:po$

Please document this in the manpage and ask upstream if it can also
be output in the --help output.

(endianness is important when crossbuilding packages containing PO
files.)


No, it is not. Binary .mo files as used by libc and gettext are always
little endian, regardless of the machine architecture.


Hmm, this doesn't seem to be the case (anymore?). As far as I can see,
msgfmt produces files with native endianness.


I didn't know but yes, that seems to be the case now. I've just checked 
by running file * on a locale directory in my old powerpc.



With the advent of multi-arch, such behavior has become a problem. If a
package is marked as Multi-Arch: same all the files (including *.mo)
have to be identical across all architectures.


Hmm, why do they have to be identical?

It is not enough that both types of systems (big and little endian) are 
able to read and use both types of .mo files, as it seems to be the case?


If .mo files are useable everywhere, regardless of their endianess, I 
would say that the multi-arch requirement is not reasonable.



Either the --endianness option should be documented (so that M-A:same
packages could use when needed), or msgfmt should produce little-endian
files even on big-endian architectures. Please tell if I should reopen
this bug, or rather file a new one requesting using little-endian
everywhere.


I would prefer a new bug because the rationale for considering it as a 
bug would be quite different. Previously it was said about performance 
reasons, but figures about that never were shown.


However, I'm happy to discuss about this in this old report first, at 
least until I really understand the nature of the new bug.




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#468209: msgfmt: no documentation of the --endianness option

2008-08-03 Thread Bruno Haible
Hi,

While reading
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=468209
I have to agree with most of what Santiago said.

Neil Williams said:
 On big endian systems, the CPU wastes time converting the endianness at
 loadtime which is important for embedded devices.

Do you have any figures? In my opinion, the non-native endianness costs a
few CPU cycles at every non-cached gettext() invocation, but nothing at load
time. The thing that costs at load time is when the locale encoding and the
PO file encoding don't match: e.g. if the PO file was in ISO-8859-2 and the
.mo file is used in an UTF-8 locale.

 Please document this in the manpage and ask upstream if it can also be
 output in the --help output.
 ...
 No, I don't see that this needs to be forwarded upstream, this is an
 issue within Debian - primarily within the manpage as far as this bug
 report is concerned.

It would be wrong to document some option in Debian that is not documented
upstream. The upstream maintainer can at any moment withdraw this option,
change its syntax, make it dump core etc., without notice (no word about it
in the NEWS file).

Bruno




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#468209: msgfmt: no documentation of the --endianness option

2008-03-07 Thread Michelle Konzack
Hello Neil,

Am 2008-02-29 07:27:47, schrieb Neil Williams:
 Source packages that put .mo files into an Arch:all binary are buggy -
 implementing that fix will involve lots of work in Debian to handle the
 increase in package numbers but leave that to me - I'll sort out the
 mass bug filing(s) when other TDeb support is implemented elsewhere in
 Debian after Lenny. (In fact, these bugs will simply disappear anyway
 because the implementation of TDebs means that *no* other package in
 Debian would contain any .mo files, all .mo files would only exist in
 TDeb packages which will be Architecture:any.)

Does this mean, if I write/wrote a program (BaSH script + Xdialog) which
is currently Arch:all I have to build it after you for all 12 Arch?

This wold mean, that my 15 MByte source would produce 12 binary packages
of 15 MByte where ONLY the 40kByte .mo file would be different...

I do not know, whether this is realy desirable, even if I understand
your problem quiet well (fighting currently with ARM, MIPS and SH CPUs)

Thanks, Greetings and nice Day
Michelle Konzack
Systemadministrator
24V Electronic Engineer
Tamay Dogan Network
Debian GNU/Linux Consultant


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
# Debian GNU/Linux Consultant #
Michelle Konzack   Apt. 917  ICQ #328449886
+49/177/935194750, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France   IRC #Debian (irc.icq.com)


signature.pgp
Description: Digital signature


Bug#468209: msgfmt: no documentation of the --endianness option

2008-02-29 Thread Santiago Vila
On Fri, 29 Feb 2008, Neil Williams wrote:

 .mo files *are* architecture dependent and should be handled as such.
 Just because 'it happens to work' right now does not mean it is the
 correct way to handle .mo files. 

I'm curious: Do you plan to do the same with PCM .wav files?
(They are always little-endian, like .mo files).

I can agree that it works is not always a good reason to do things
in a certain way, but what you are proposing is a change in something
which is a de facto standard, for very little benefit (saving some cpu
cycles). When the cost of something is very high and the benefit is
very small, the natural thing to do is to keep things as they are.

Anyway, I can forward your suggestion upstream if you insist, but I don't
plan to deviate from upstream gettext if the authors reject your suggestion.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#468209: msgfmt: no documentation of the --endianness option

2008-02-29 Thread Neil Williams
On Fri, 29 Feb 2008 11:51:19 +0100 (CET)
Santiago Vila [EMAIL PROTECTED] wrote:

 On Fri, 29 Feb 2008, Neil Williams wrote:
 
  .mo files *are* architecture dependent and should be handled as such.
  Just because 'it happens to work' right now does not mean it is the
  correct way to handle .mo files. 
 
 I'm curious: Do you plan to do the same with PCM .wav files?

Not for Emdebian - we are unlikely to be handling .wav files due to
storage constraints.

However, if there are any other situations where Arch:all files have
endian problems, I will be pursuing those to provide an Arch:any
mechanism if there is a role for such files on embedded devices. .wav
is simply too large for embedded, .gsm is more likely or .ogg.

So the answer to your question, really, is YES. I intend to seek
correct endianness support for any binary format in Debian that does
not already implement it and for which there is a logical reason to
need that format on an embedded device where endianness conversions are
a significant issue.

Music isn't that much of a problem - if a user has to wait a few
hundred clock cycles more to hear a music track is not a big issue.

What does matter is if a user has to wait several dozen clock cycles
*every time any application loads* merely to get the .mo content into
the correct presentation. This is a *MAJOR* usability issue. It will
make the entire OS appear slow in any translated locale.

If Debian wants to be able to support embedded and low resource
machines, Debian has to accept that there will be changes needed to
enable such support.

 (They are always little-endian, like .mo files).
 
 I can agree that it works is not always a good reason to do things
 in a certain way, but what you are proposing is a change in something
 which is a de facto standard, for very little benefit (saving some cpu
 cycles). 

Please remember this is for embedded devices. It works now only becuase
Debian really isn't the Universal OS - lots of parts of Debian are
simply wrong for embedded usage which is why it is taking so long for
Emdebian to make progress.

Nobody cares about a few dozen clock cycles on a dual core GHz amd64 -
but it becomes a quite noticeable delay on an iPAQ. The delay is
repeated every single time any application is started outside the C
locale.

All I'm asking for here is *documentation* that this is how it needs to
be done for certain situations so that others do not make the same
mistake that both you and I have done - assuming that the current
Debian method of Arch:all for .mo is acceptable. It is not, sadly.

 When the cost of something is very high and the benefit is
 very small, the natural thing to do is to keep things as they are.

The cost of this change is irrelevant because the cost of implementing
TDebs is already high and this change does not make that any higher.
Whether TDebs are Arch:all or Arch:any makes no difference to the
amount of work required to implement TDebs in Debian. It merely adds a
tiny amount of work for the buildd network.

The benefits of TDebs (installation sizes, separate translator uploads,
faster translation updates etc.) far, far outweigh the temporary work
of getting them implemented in Debian. It is ludicrous that Debian
insists on installing over 250Mb of *unused and unusable* .mo files
in a default GNOME installation when any one locale needs just 9Mb or
less. (Check the size of your /usr/share/locale/ directory and compare
that with the collected size of the few languages that you actually
speak. Granted that will be a few more than me but I doubt anyone can
speak/read all of the 90+ languages installed by default in Debian.
I'd be surprised if anyone would require more than half a dozen.) 

 Anyway, I can forward your suggestion upstream if you insist, 

No, I don't see that this needs to be forwarded upstream, this is an
issue within Debian - primarily within the manpage as far as this bug
report is concerned.

All I want is for the manpage of msgfmt to explain that --endianness
{big|little} *is* supported, *why* it is supported and why it is
important for certain situations.

 but I don't
 plan to deviate from upstream gettext if the authors reject your suggestion.

The upstream authors already *explicitly support* endianness because the
option is part of the source code for msgfmt!
(gettext-tools/src/msgfmt.c)

I think upstream may have a better grasp of the issue than you may
imagine because other non-Debian embedded developments can use
--endianness.

Whether or not anything in gettext-Debian changes, I will implement
TDebs using --endianness in calls to msgfmt. All I'm asking is that the
reasons that I have set out for this are clearly explained in the
manpage in Debian so that other developers do not waste time believing
that the current method used for powerful desktop machines is in any
way appropriate for low resource units.

The option exists, it works and the manpage should document it - just
as with any application in Debian. The 

Bug#468209: msgfmt: no documentation of the --endianness option

2008-02-28 Thread Neil Williams
reopen 468209
quit

On Thu, 28 Feb 2008 12:10:23 +0100 (CET)
Santiago Vila [EMAIL PROTECTED] wrote:

 On Wed, 27 Feb 2008, Neil Williams wrote:
  msgfmt does support --endianness {big|little} from the source in
  gettext-tools/src/msgfmt.c
  
  [EMAIL PROTECTED]:po$ file messages.mo 
  messages.mo: GNU message catalog (little endian), revision 0, 14
  messages
  [EMAIL PROTECTED]:po$ msgfmt --endianness big cs.po 
  [EMAIL PROTECTED]:po$ file messages.mo 
  messages.mo: GNU message catalog (big endian), revision 0, 14 messages
  [EMAIL PROTECTED]:po$ 
  
  Please document this in the manpage and ask upstream if it can also be
  output in the --help output.
  
  (endianness is important when crossbuilding packages containing PO
  files.)
 
 No, it is not. Binary .mo files as used by libc and gettext are always
 little endian, regardless of the machine architecture.

Not necessarily. That is just how gettext currently works but it is not
how gettext *must* work, hence the --endianness option in the source
code.

 Otherwise it
 would be impossible for us to have Architecture: all packages
 like util-linux-locales containing just binary .mo files.
 

Actually, endianness *IS* important because we should not be *having*
Architecture:all packages that contian .mo files - that was
demonstrated during my talk on TDebs at Fosdem.

On big endian systems, the CPU wastes time converting the endianness at
loadtime which is important for embedded devices.

All packages containing .mo files should be Arch:any and this is
something I will be fixing during the course of TDeb development in
Debian. All the other questions that arise from this (increased package
numbers, extra builds, repository implications, userspace controls and
cache sizes) have all got solutions that are currently working in
Emdebian and which are due to be applied to Debian (probably after
Lenny).

TDebs might also need to drop the hash table in the .mo, again
discussed at Fosdem, but I'm currently working on whether that is
necessary and whether it has positive or negative consequences on the
use of .mo files on embedded devices.

I started working on TDebs for Emdebian thinking exactly the same way,
that .mo files were immune to other problems of endianness etc. (The
slides at Fosdem claimed that TDeb packages would be Arch:all until
the question and answer section of the talk). They are not Arch:all. It
is just that msgfmt defaults to little unless --endianness is specified,
irrespective of the build machine architecture.

In some ways, this is a bug but documenting the --endianness option
allows others to not make the same mistake again.

.mo files *are* architecture dependent and should be handled as such.
Just because 'it happens to work' right now does not mean it is the
correct way to handle .mo files. 

Source packages that put .mo files into an Arch:all binary are buggy -
implementing that fix will involve lots of work in Debian to handle the
increase in package numbers but leave that to me - I'll sort out the
mass bug filing(s) when other TDeb support is implemented elsewhere in
Debian after Lenny. (In fact, these bugs will simply disappear anyway
because the implementation of TDebs means that *no* other package in
Debian would contain any .mo files, all .mo files would only exist in
TDeb packages which will be Architecture:any.)

-- 

Neil Williams
=
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/


pgpF9FNYCdpao.pgp
Description: PGP signature


Bug#468209: msgfmt: no documentation of the --endianness option

2008-02-27 Thread Neil Williams
Package: gettext
Version: 0.17-2
Severity: wishlist

msgfmt does support --endianness {big|little} from the source in
gettext-tools/src/msgfmt.c

[EMAIL PROTECTED]:po$ file messages.mo 
messages.mo: GNU message catalog (little endian), revision 0, 14
messages
[EMAIL PROTECTED]:po$ msgfmt --endianness big cs.po 
[EMAIL PROTECTED]:po$ file messages.mo 
messages.mo: GNU message catalog (big endian), revision 0, 14 messages
[EMAIL PROTECTED]:po$ 

Please document this in the manpage and ask upstream if it can also be
output in the --help output.

(endianness is important when crossbuilding packages containing PO
files.)

-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.24-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages gettext depends on:
ii  gettext-base  0.17-2 GNU Internationalization utilities
ii  libc6 2.7-8  GNU C Library: Shared libraries
ii  libgomp1  4.3-20080219-1 GCC OpenMP (GOMP) support library

Versions of packages gettext recommends:
ii  lynx  2.8.6-2Text-mode WWW Browser
ii  wget  1.10.2-3   retrieves files from the web

-- no debconf information


signature.asc
Description: Digital signature