Bug#468209: msgfmt: no documentation of the --endianness option
* Santiago Vila sanv...@unex.es, 2011-11-03, 18:38: With the advent of multi-arch, such behavior has become a problem. If a package is marked as Multi-Arch: same all the files (including *.mo) have to be identical across all architectures. Hmm, why do they have to be identical? It is not enough that both types of systems (big and little endian) are able to read and use both types of .mo files, as it seems to be the case? If .mo files are useable everywhere, regardless of their endianess, I would say that the multi-arch requirement is not reasonable. Multi-Arch: same makes it possible for users to install a package for more than one architecture at the same time. If files with same name are not identical across architectures, package manager has to resolve the conflict somehow, and it does it by simply aborting the installation, e.g. like that: | # apt-get install -qq libavahi-common-data:powerpc | (Reading database ... 59644 files and directories currently installed.) | Unpacking libavahi-common-data:powerpc (from .../libavahi-common-data_0.6.30-5_powerpc.deb) ... | dpkg: error processing /var/cache/apt/archives/libavahi-common-data_0.6.30-5_powerpc.deb (--unpack): | './usr/share/locale/he/LC_MESSAGES/avahi.mo' is different from the same file on the system | configured to not write apport reports | dpkg-deb: error: subprocess paste was killed by signal (Broken pipe) | Errors were encountered while processing: | /var/cache/apt/archives/libavahi-common-data_0.6.30-5_powerpc.deb | E: Sub-process /usr/bin/dpkg returned an error code (1) Does it make things clear? Note that the problem would affect only tiny minority of packages: Multi-Arch: same is useful mainly for shared libraries and they rarely come with translations. -- Jakub Wilk -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#468209: msgfmt: no documentation of the --endianness option
El 10/11/11 12:34, Jakub Wilk escribió: * Santiago Vila sanv...@unex.es, 2011-11-03, 18:38: With the advent of multi-arch, such behavior has become a problem. If a package is marked as Multi-Arch: same all the files (including *.mo) have to be identical across all architectures. Hmm, why do they have to be identical? It is not enough that both types of systems (big and little endian) are able to read and use both types of .mo files, as it seems to be the case? If .mo files are useable everywhere, regardless of their endianess, I would say that the multi-arch requirement is not reasonable. Multi-Arch: same makes it possible for users to install a package for more than one architecture at the same time. If files with same name are not identical across architectures, package manager has to resolve the conflict somehow, and it does it by simply aborting the installation, e.g. like that: | # apt-get install -qq libavahi-common-data:powerpc | (Reading database ... 59644 files and directories currently installed.) | Unpacking libavahi-common-data:powerpc (from .../libavahi-common-data_0.6.30-5_powerpc.deb) ... | dpkg: error processing /var/cache/apt/archives/libavahi-common-data_0.6.30-5_powerpc.deb (--unpack): | './usr/share/locale/he/LC_MESSAGES/avahi.mo' is different from the same file on the system | configured to not write apport reports | dpkg-deb: error: subprocess paste was killed by signal (Broken pipe) | Errors were encountered while processing: | /var/cache/apt/archives/libavahi-common-data_0.6.30-5_powerpc.deb | E: Sub-process /usr/bin/dpkg returned an error code (1) Does it make things clear? Yes, I now see what the problem is, but I don't see that making every .mo file to be always little endian again is the best solution. We could also tell dpkg somehow that different files in /usr/share/locale are ok in this case. Note that the problem would affect only tiny minority of packages: Multi-Arch: same is useful mainly for shared libraries and they rarely come with translations. In such case, making those packages to depend on another Arch: all package containing just the translations would solve the issue, would it not? (For the record, I happen to maintain a library containing translations, and I have always seen it as an anomaly, this would force me to do what I feel is the right thing). -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#468209: msgfmt: no documentation of the --endianness option
* Santiago Vila sanv...@unex.es, 2008-02-28, 12:10: msgfmt does support --endianness {big|little} from the source in gettext-tools/src/msgfmt.c neil@dwarf:po$ file messages.mo messages.mo: GNU message catalog (little endian), revision 0, 14 messages neil@dwarf:po$ msgfmt --endianness big cs.po neil@dwarf:po$ file messages.mo messages.mo: GNU message catalog (big endian), revision 0, 14 messages neil@dwarf:po$ Please document this in the manpage and ask upstream if it can also be output in the --help output. (endianness is important when crossbuilding packages containing PO files.) No, it is not. Binary .mo files as used by libc and gettext are always little endian, regardless of the machine architecture. Hmm, this doesn't seem to be the case (anymore?). As far as I can see, msgfmt produces files with native endianness. With the advent of multi-arch, such behavior has become a problem. If a package is marked as Multi-Arch: same all the files (including *.mo) have to be identical across all architectures. Either the --endianness option should be documented (so that M-A:same packages could use when needed), or msgfmt should produce little-endian files even on big-endian architectures. Please tell if I should reopen this bug, or rather file a new one requesting using little-endian everywhere. -- Jakub Wilk -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#468209: msgfmt: no documentation of the --endianness option
El 03/11/11 16:24, Jakub Wilk escribió: * Santiago Vila sanv...@unex.es, 2008-02-28, 12:10: msgfmt does support --endianness {big|little} from the source in gettext-tools/src/msgfmt.c neil@dwarf:po$ file messages.mo messages.mo: GNU message catalog (little endian), revision 0, 14 messages neil@dwarf:po$ msgfmt --endianness big cs.po neil@dwarf:po$ file messages.mo messages.mo: GNU message catalog (big endian), revision 0, 14 messages neil@dwarf:po$ Please document this in the manpage and ask upstream if it can also be output in the --help output. (endianness is important when crossbuilding packages containing PO files.) No, it is not. Binary .mo files as used by libc and gettext are always little endian, regardless of the machine architecture. Hmm, this doesn't seem to be the case (anymore?). As far as I can see, msgfmt produces files with native endianness. I didn't know but yes, that seems to be the case now. I've just checked by running file * on a locale directory in my old powerpc. With the advent of multi-arch, such behavior has become a problem. If a package is marked as Multi-Arch: same all the files (including *.mo) have to be identical across all architectures. Hmm, why do they have to be identical? It is not enough that both types of systems (big and little endian) are able to read and use both types of .mo files, as it seems to be the case? If .mo files are useable everywhere, regardless of their endianess, I would say that the multi-arch requirement is not reasonable. Either the --endianness option should be documented (so that M-A:same packages could use when needed), or msgfmt should produce little-endian files even on big-endian architectures. Please tell if I should reopen this bug, or rather file a new one requesting using little-endian everywhere. I would prefer a new bug because the rationale for considering it as a bug would be quite different. Previously it was said about performance reasons, but figures about that never were shown. However, I'm happy to discuss about this in this old report first, at least until I really understand the nature of the new bug. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#468209: msgfmt: no documentation of the --endianness option
Hi, While reading http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=468209 I have to agree with most of what Santiago said. Neil Williams said: On big endian systems, the CPU wastes time converting the endianness at loadtime which is important for embedded devices. Do you have any figures? In my opinion, the non-native endianness costs a few CPU cycles at every non-cached gettext() invocation, but nothing at load time. The thing that costs at load time is when the locale encoding and the PO file encoding don't match: e.g. if the PO file was in ISO-8859-2 and the .mo file is used in an UTF-8 locale. Please document this in the manpage and ask upstream if it can also be output in the --help output. ... No, I don't see that this needs to be forwarded upstream, this is an issue within Debian - primarily within the manpage as far as this bug report is concerned. It would be wrong to document some option in Debian that is not documented upstream. The upstream maintainer can at any moment withdraw this option, change its syntax, make it dump core etc., without notice (no word about it in the NEWS file). Bruno -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#468209: msgfmt: no documentation of the --endianness option
Hello Neil, Am 2008-02-29 07:27:47, schrieb Neil Williams: Source packages that put .mo files into an Arch:all binary are buggy - implementing that fix will involve lots of work in Debian to handle the increase in package numbers but leave that to me - I'll sort out the mass bug filing(s) when other TDeb support is implemented elsewhere in Debian after Lenny. (In fact, these bugs will simply disappear anyway because the implementation of TDebs means that *no* other package in Debian would contain any .mo files, all .mo files would only exist in TDeb packages which will be Architecture:any.) Does this mean, if I write/wrote a program (BaSH script + Xdialog) which is currently Arch:all I have to build it after you for all 12 Arch? This wold mean, that my 15 MByte source would produce 12 binary packages of 15 MByte where ONLY the 40kByte .mo file would be different... I do not know, whether this is realy desirable, even if I understand your problem quiet well (fighting currently with ARM, MIPS and SH CPUs) Thanks, Greetings and nice Day Michelle Konzack Systemadministrator 24V Electronic Engineer Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 +49/177/935194750, rue de Soultz MSN LinuxMichi +33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com) signature.pgp Description: Digital signature
Bug#468209: msgfmt: no documentation of the --endianness option
On Fri, 29 Feb 2008, Neil Williams wrote: .mo files *are* architecture dependent and should be handled as such. Just because 'it happens to work' right now does not mean it is the correct way to handle .mo files. I'm curious: Do you plan to do the same with PCM .wav files? (They are always little-endian, like .mo files). I can agree that it works is not always a good reason to do things in a certain way, but what you are proposing is a change in something which is a de facto standard, for very little benefit (saving some cpu cycles). When the cost of something is very high and the benefit is very small, the natural thing to do is to keep things as they are. Anyway, I can forward your suggestion upstream if you insist, but I don't plan to deviate from upstream gettext if the authors reject your suggestion. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#468209: msgfmt: no documentation of the --endianness option
On Fri, 29 Feb 2008 11:51:19 +0100 (CET) Santiago Vila [EMAIL PROTECTED] wrote: On Fri, 29 Feb 2008, Neil Williams wrote: .mo files *are* architecture dependent and should be handled as such. Just because 'it happens to work' right now does not mean it is the correct way to handle .mo files. I'm curious: Do you plan to do the same with PCM .wav files? Not for Emdebian - we are unlikely to be handling .wav files due to storage constraints. However, if there are any other situations where Arch:all files have endian problems, I will be pursuing those to provide an Arch:any mechanism if there is a role for such files on embedded devices. .wav is simply too large for embedded, .gsm is more likely or .ogg. So the answer to your question, really, is YES. I intend to seek correct endianness support for any binary format in Debian that does not already implement it and for which there is a logical reason to need that format on an embedded device where endianness conversions are a significant issue. Music isn't that much of a problem - if a user has to wait a few hundred clock cycles more to hear a music track is not a big issue. What does matter is if a user has to wait several dozen clock cycles *every time any application loads* merely to get the .mo content into the correct presentation. This is a *MAJOR* usability issue. It will make the entire OS appear slow in any translated locale. If Debian wants to be able to support embedded and low resource machines, Debian has to accept that there will be changes needed to enable such support. (They are always little-endian, like .mo files). I can agree that it works is not always a good reason to do things in a certain way, but what you are proposing is a change in something which is a de facto standard, for very little benefit (saving some cpu cycles). Please remember this is for embedded devices. It works now only becuase Debian really isn't the Universal OS - lots of parts of Debian are simply wrong for embedded usage which is why it is taking so long for Emdebian to make progress. Nobody cares about a few dozen clock cycles on a dual core GHz amd64 - but it becomes a quite noticeable delay on an iPAQ. The delay is repeated every single time any application is started outside the C locale. All I'm asking for here is *documentation* that this is how it needs to be done for certain situations so that others do not make the same mistake that both you and I have done - assuming that the current Debian method of Arch:all for .mo is acceptable. It is not, sadly. When the cost of something is very high and the benefit is very small, the natural thing to do is to keep things as they are. The cost of this change is irrelevant because the cost of implementing TDebs is already high and this change does not make that any higher. Whether TDebs are Arch:all or Arch:any makes no difference to the amount of work required to implement TDebs in Debian. It merely adds a tiny amount of work for the buildd network. The benefits of TDebs (installation sizes, separate translator uploads, faster translation updates etc.) far, far outweigh the temporary work of getting them implemented in Debian. It is ludicrous that Debian insists on installing over 250Mb of *unused and unusable* .mo files in a default GNOME installation when any one locale needs just 9Mb or less. (Check the size of your /usr/share/locale/ directory and compare that with the collected size of the few languages that you actually speak. Granted that will be a few more than me but I doubt anyone can speak/read all of the 90+ languages installed by default in Debian. I'd be surprised if anyone would require more than half a dozen.) Anyway, I can forward your suggestion upstream if you insist, No, I don't see that this needs to be forwarded upstream, this is an issue within Debian - primarily within the manpage as far as this bug report is concerned. All I want is for the manpage of msgfmt to explain that --endianness {big|little} *is* supported, *why* it is supported and why it is important for certain situations. but I don't plan to deviate from upstream gettext if the authors reject your suggestion. The upstream authors already *explicitly support* endianness because the option is part of the source code for msgfmt! (gettext-tools/src/msgfmt.c) I think upstream may have a better grasp of the issue than you may imagine because other non-Debian embedded developments can use --endianness. Whether or not anything in gettext-Debian changes, I will implement TDebs using --endianness in calls to msgfmt. All I'm asking is that the reasons that I have set out for this are clearly explained in the manpage in Debian so that other developers do not waste time believing that the current method used for powerful desktop machines is in any way appropriate for low resource units. The option exists, it works and the manpage should document it - just as with any application in Debian. The
Bug#468209: msgfmt: no documentation of the --endianness option
reopen 468209 quit On Thu, 28 Feb 2008 12:10:23 +0100 (CET) Santiago Vila [EMAIL PROTECTED] wrote: On Wed, 27 Feb 2008, Neil Williams wrote: msgfmt does support --endianness {big|little} from the source in gettext-tools/src/msgfmt.c [EMAIL PROTECTED]:po$ file messages.mo messages.mo: GNU message catalog (little endian), revision 0, 14 messages [EMAIL PROTECTED]:po$ msgfmt --endianness big cs.po [EMAIL PROTECTED]:po$ file messages.mo messages.mo: GNU message catalog (big endian), revision 0, 14 messages [EMAIL PROTECTED]:po$ Please document this in the manpage and ask upstream if it can also be output in the --help output. (endianness is important when crossbuilding packages containing PO files.) No, it is not. Binary .mo files as used by libc and gettext are always little endian, regardless of the machine architecture. Not necessarily. That is just how gettext currently works but it is not how gettext *must* work, hence the --endianness option in the source code. Otherwise it would be impossible for us to have Architecture: all packages like util-linux-locales containing just binary .mo files. Actually, endianness *IS* important because we should not be *having* Architecture:all packages that contian .mo files - that was demonstrated during my talk on TDebs at Fosdem. On big endian systems, the CPU wastes time converting the endianness at loadtime which is important for embedded devices. All packages containing .mo files should be Arch:any and this is something I will be fixing during the course of TDeb development in Debian. All the other questions that arise from this (increased package numbers, extra builds, repository implications, userspace controls and cache sizes) have all got solutions that are currently working in Emdebian and which are due to be applied to Debian (probably after Lenny). TDebs might also need to drop the hash table in the .mo, again discussed at Fosdem, but I'm currently working on whether that is necessary and whether it has positive or negative consequences on the use of .mo files on embedded devices. I started working on TDebs for Emdebian thinking exactly the same way, that .mo files were immune to other problems of endianness etc. (The slides at Fosdem claimed that TDeb packages would be Arch:all until the question and answer section of the talk). They are not Arch:all. It is just that msgfmt defaults to little unless --endianness is specified, irrespective of the build machine architecture. In some ways, this is a bug but documenting the --endianness option allows others to not make the same mistake again. .mo files *are* architecture dependent and should be handled as such. Just because 'it happens to work' right now does not mean it is the correct way to handle .mo files. Source packages that put .mo files into an Arch:all binary are buggy - implementing that fix will involve lots of work in Debian to handle the increase in package numbers but leave that to me - I'll sort out the mass bug filing(s) when other TDeb support is implemented elsewhere in Debian after Lenny. (In fact, these bugs will simply disappear anyway because the implementation of TDebs means that *no* other package in Debian would contain any .mo files, all .mo files would only exist in TDeb packages which will be Architecture:any.) -- Neil Williams = http://www.data-freedom.org/ http://www.nosoftwarepatents.com/ http://www.linux.codehelp.co.uk/ pgpF9FNYCdpao.pgp Description: PGP signature
Bug#468209: msgfmt: no documentation of the --endianness option
Package: gettext Version: 0.17-2 Severity: wishlist msgfmt does support --endianness {big|little} from the source in gettext-tools/src/msgfmt.c [EMAIL PROTECTED]:po$ file messages.mo messages.mo: GNU message catalog (little endian), revision 0, 14 messages [EMAIL PROTECTED]:po$ msgfmt --endianness big cs.po [EMAIL PROTECTED]:po$ file messages.mo messages.mo: GNU message catalog (big endian), revision 0, 14 messages [EMAIL PROTECTED]:po$ Please document this in the manpage and ask upstream if it can also be output in the --help output. (endianness is important when crossbuilding packages containing PO files.) -- System Information: Debian Release: lenny/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.24-1-amd64 (SMP w/2 CPU cores) Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages gettext depends on: ii gettext-base 0.17-2 GNU Internationalization utilities ii libc6 2.7-8 GNU C Library: Shared libraries ii libgomp1 4.3-20080219-1 GCC OpenMP (GOMP) support library Versions of packages gettext recommends: ii lynx 2.8.6-2Text-mode WWW Browser ii wget 1.10.2-3 retrieves files from the web -- no debconf information signature.asc Description: Digital signature