Re: new proposal: Translating Debian packages' descriptions
On Thu, Sep 13, 2001 at 01:26:07PM -0500, Steve Langasek wrote: > On Thu, 13 Sep 2001, Michael Bramer wrote: > > > The end-user tools would never have to deal with outdated translations if > > > the > > > ".mo" file is assembled ahead of time in a central location. Match up the > > > translations, insert them into the distilled .po file using the package > > > name/version as the key, and you're done. > > > one point: > > I get the package-1.2 from 'commercial german distributor' with german > > translation. And I watch (with apt-cache show) the description > > package-1.2.1 from security.debian.org? > > > I don't see the german description... > > > other point: > > I have a gnome-base-1.2 from HelixCode and show the description from > > gnome-base-1.2 from debian... But the description from both packages > > is not the same > > > (I know our package management don't support this case, but we have > > this case in real (maybe)...) > > This is an interesting point, but the solution is simple. If package+version > can't be used as a key to uniquely identify packages in the dpkg status > database, then we key on whatever /is/ used by this database. apt use IMHO a 'long' pointer. like: security.debian.org_dists_potato_updates_main_binary-i386_Package_version But yes, this don't need the size of a normal description. > > next point: > > You have in your apt-source some sources. Like a older CD-Rom with > > 2.1 and 2.2_r0, a uptodate 2.2_r3 from ftp.debian.org, testing and > > sid from http.debian.org. > > > With this you have alle descriptions many times in the .mo file. > > (like package-0.9 from 2.1, package-1.4 from 2.2_r0, package-1.4.1 > > from 2.2_r3 and package-1.5 from testing and sid...) > > If the descriptions remain constant for the packages, you're correct that > there would be duplication. But should we not optimize for the common case? > Most users won't keep /old/ sources in the list; few will even have testing, > unstable, and stable in the list at the same time. No. stable and testing is more and more common. With pins you can install all from stable and some packages from testing/unstable. I make some talks about this feature and the user use it, after the know it. Believe me. > > > I'm not suggesting replacing the format that translators will work with. > > > I'm > > > just disagreeing that standard .mo files are the best solution to be > > > integrated into dpkg and apt. > > ... > > > More direct lookups. Smaller .po files. Better integration with existing > > > tools, instead of grafting a new arm onto our existing /var/lib/dpkg > > > structure. > > > Yes, .mo files are not the best thing. this is your point and you are > > right. > > > But this is a other problem and we can solve this problem parallel. > > > I propose this: > > - use (a unchanged) gettext now in dpkg and get the thing rolling. > > - change the gettext to use a optional 'md5sum-like' thing for a > > lookups. > > > (save the translation with the md5sum of the orignal text as key) > > For the goal of getting support for translated descriptions into Debian as > soon as possible, I think use of unmodified gettext is a reasonable choice. Yes. We should find a conclusion with the dpkg and apt developer. Only with a conclusion (maybe without gettext) our user can use this translations. Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debian.org PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux Der Optimist glaubt wir leben in der besten aller moeglichen Welten. Der Pessimist befuerchtet, dass das stimmt. pgp3FQRorGpXF.pgp Description: PGP signature
Re: new proposal: Translating Debian packages' descriptions
Hi Michael, On Thu, 13 Sep 2001, Michael Bramer wrote: > > The end-user tools would never have to deal with outdated translations if > > the > > ".mo" file is assembled ahead of time in a central location. Match up the > > translations, insert them into the distilled .po file using the package > > name/version as the key, and you're done. > one point: > I get the package-1.2 from 'commercial german distributor' with german > translation. And I watch (with apt-cache show) the description > package-1.2.1 from security.debian.org? > I don't see the german description... > other point: > I have a gnome-base-1.2 from HelixCode and show the description from > gnome-base-1.2 from debian... But the description from both packages > is not the same > (I know our package management don't support this case, but we have > this case in real (maybe)...) This is an interesting point, but the solution is simple. If package+version can't be used as a key to uniquely identify packages in the dpkg status database, then we key on whatever /is/ used by this database. > next point: > You have in your apt-source some sources. Like a older CD-Rom with > 2.1 and 2.2_r0, a uptodate 2.2_r3 from ftp.debian.org, testing and > sid from http.debian.org. > With this you have alle descriptions many times in the .mo file. > (like package-0.9 from 2.1, package-1.4 from 2.2_r0, package-1.4.1 > from 2.2_r3 and package-1.5 from testing and sid...) If the descriptions remain constant for the packages, you're correct that there would be duplication. But should we not optimize for the common case? Most users won't keep /old/ sources in the list; few will even have testing, unstable, and stable in the list at the same time. And if we use a file format other than .mo files, it's also possible to design a system of indirect lookups that /still/ edges out standard gettext lookups for efficiency. > > I'm not suggesting replacing the format that translators will work with. > > I'm > > just disagreeing that standard .mo files are the best solution to be > > integrated into dpkg and apt. > ... > > More direct lookups. Smaller .po files. Better integration with existing > > tools, instead of grafting a new arm onto our existing /var/lib/dpkg > > structure. > Yes, .mo files are not the best thing. this is your point and you are > right. > But this is a other problem and we can solve this problem parallel. > I propose this: > - use (a unchanged) gettext now in dpkg and get the thing rolling. > - change the gettext to use a optional 'md5sum-like' thing for a > lookups. > (save the translation with the md5sum of the orignal text as key) For the goal of getting support for translated descriptions into Debian as soon as possible, I think use of unmodified gettext is a reasonable choice. Steve Langasek postmodern programmer
Re: new proposal: Translating Debian packages' descriptions
On Tue, Sep 11, 2001 at 12:05:25PM -0500, Steve Langasek wrote: > On Tue, 11 Sep 2001, Martin Quinson wrote: > > On Tue, Sep 04, 2001 at 05:51:38PM -0500, Steve Langasek wrote: > > - an output mecanism, including the fallback to original if the translation > >is outdated. You have either to rewrite msgfmt to do this job at previous > >step, or design a new function in dpkg, apt, grep-dctrl, and all programm > >using the translated descriptions. > > The end-user tools would never have to deal with outdated translations if the > ".mo" file is assembled ahead of time in a central location. Match up the > translations, insert them into the distilled .po file using the package > name/version as the key, and you're done. one point: I get the package-1.2 from 'commercial german distributor' with german translation. And I watch (with apt-cache show) the description package-1.2.1 from security.debian.org? I don't see the german description... other point: I have a gnome-base-1.2 from HelixCode and show the description from gnome-base-1.2 from debian... But the description from both packages is not the same (I know our package management don't support this case, but we have this case in real (maybe)...) next point: You have in your apt-source some sources. Like a older CD-Rom with 2.1 and 2.2_r0, a uptodate 2.2_r3 from ftp.debian.org, testing and sid from http.debian.org. With this you have alle descriptions many times in the .mo file. (like package-0.9 from 2.1, package-1.4 from 2.2_r0, package-1.4.1 from 2.2_r3 and package-1.5 from testing and sid...) And with the pin feature of apt, more and more people have more sources in sources.list... > > If you change any tool of the gettext mechanism, you lost advantages from > > the translator point of view, like compendium, containing standard > > translations for reuse, or user-friendly tools like kbabel for translating, > > (including ispell possibility, which is implemented in kbabel, and some > > others) > > I'm not suggesting replacing the format that translators will work with. I'm > just disagreeing that standard .mo files are the best solution to be > integrated into dpkg and apt. ... > More direct lookups. Smaller .po files. Better integration with existing > tools, instead of grafting a new arm onto our existing /var/lib/dpkg > structure. Yes, .mo files are not the best thing. this is your point and you are right. But this is a other problem and we can solve this problem parallel. I propose this: - use (a unchanged) gettext now in dpkg and get the thing rolling. - change the gettext to use a optional 'md5sum-like' thing for a lookups. (save the translation with the md5sum of the orignal text as key) This .mo files can use all programmes with a lot of text and have the advantages of this. Please don't reinvent the wheel, improve the old wheel and make it chubbier. Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debian.org PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux "Ziemlich viele Firmen, die alle kein Linux benutzen, würden nach Abschaltung der Linux-Rechner erst mal ins Schwimmen kommen." -- Matthias Peick pgp6Fbio1IZ0O.pgp Description: PGP signature
Re: new proposal: Translating Debian packages' descriptions
On Tue, 11 Sep 2001, Martin Quinson wrote: > On Tue, Sep 04, 2001 at 05:51:38PM -0500, Steve Langasek wrote: > > Only if the implementation is poor. The accuracy of a translation can be > > verified in the process of assembling the file that is to be made available > > to > > user machines (whether that file is Packages.gz, or debian-descs.mo, or > > whatever). Obviously the /inputs/ used to create this file must include > > mappings of English string -> translated string, but these mappings need not > > be retained in the output file. We only need to make sure once that the > > translation is up-to-date, not every time the user runs dpkg, because each > > version of each package can have only one untranslated description > > associated > > with it -- it's a unique key, by definition. > > If nothing else, perhaps you would consider that a .mo file containing > > [untranslated string -> translation] mappings will on average be almost > > twice > > as large as a .mo file containing [(package name,version) -> translation] > > mappings. :) > The problem is that you wont have to do a little wheel engineering, but a > lot of. Think, you will have to design: > - the extracting tool control -> po file >ok, that's true for all solutions ;) I'm working on a patch against >gettext so that it can handle text following rfc822. As you say, this is common to all solutions. > - a mechanism to help the translator finding which text have to be >translated in the po file. >With your solution, the translator will face something like >msgid "dpkg-1.9" >msgstr "" Not at all. I never suggested that anyone, translator or maintainer, would directly manipulate a .po file that looks like this. The .po files should look exactly as you would expect them to. It's only /after/ these .po files have been submitted to the archive that they would be automatically processed and matched up with actual packages in the archive, so that the resulting .mo file (or file in another format) contains only the relevant translations. > - a mechanism to produce the mo file, or what ever. If you stick to the po >format, you can reuse msgfmt, through. So, msgfmt is one option, yes. Other solutions to parse and merge text would not be difficult to implement. > - an output mecanism, including the fallback to original if the translation >is outdated. You have either to rewrite msgfmt to do this job at previous >step, or design a new function in dpkg, apt, grep-dctrl, and all programm >using the translated descriptions. The end-user tools would never have to deal with outdated translations if the ".mo" file is assembled ahead of time in a central location. Match up the translations, insert them into the distilled .po file using the package name/version as the key, and you're done. > If you change any tool of the gettext mechanism, you lost advantages from > the translator point of view, like compendium, containing standard > translations for reuse, or user-friendly tools like kbabel for translating, > (including ispell possibility, which is implemented in kbabel, and some > others) I'm not suggesting replacing the format that translators will work with. I'm just disagreeing that standard .mo files are the best solution to be integrated into dpkg and apt. > For what gain ? > A lookup less ? But gettext is cached, and well optimized. I think the > change and redesign is too much, regarding to the small speedup you can > expect... > Smaller resulting po files ? Come on, the woody+1 release will come on 6 CD > or more, and you are speaking about saving a few Mb... These data will be > well compressed, as any natural text, so that a minor problem, in my point > of view. More direct lookups. Smaller .po files. Better integration with existing tools, instead of grafting a new arm onto our existing /var/lib/dpkg structure. Steve Langasek postmodern programmer
Re: new proposal: Translating Debian packages' descriptions
On Tue, Sep 04, 2001 at 05:51:38PM -0500, Steve Langasek wrote: > On Tue, 4 Sep 2001, Michael Bramer wrote: > > > On Tue, Sep 04, 2001 at 02:24:41PM -0500, Steve Langasek wrote: > > > > I don't know enough about gettext - am I assuming correctly that in > > > > the .mo file, the English translation is replaced with a checksum or > > > > similar, so you do not need to store the complete English translation? > > > > Gettext normally uses the entire untranslated string as the key in the .mo > > > file. This has many advantages when dealing with translation of strings > > > in > > > programs, where the untranslated string is actually present in the program > > > source, and this is a big reason the GNU project favors gettext over > > > catgets > > > systems found on other Unices. It makes less sense in the case of package > > > descriptions, however, because we're effectively doing two lookups -- > > > first to > > > find the English description in Packages.gz using the package name and > > > version > > > as a key, then to find the translated description in the .mo file using > > > the > > > English description as a key. > > > yes, you must two lookups. First in the package db (normal in the > > menory) and (if LANG is set) make a second lookup with gettext. > > > But this not a big problem, or is there a problem? > > It casts doubt on the argument that gettext is a good solution here. Just > because gettext is the optimal solution for translation of messages within > programs does not mean it's the best solution for package translations. I'm > personally willing to do a little wheel-engineering if it leads to a more > elegant result. > > > If you put the translated text only in the db, and you don't use the > > english text as key (like gettext) you get maybe outdated translation. > > Only if the implementation is poor. The accuracy of a translation can be > verified in the process of assembling the file that is to be made available to > user machines (whether that file is Packages.gz, or debian-descs.mo, or > whatever). Obviously the /inputs/ used to create this file must include > mappings of English string -> translated string, but these mappings need not > be retained in the output file. We only need to make sure once that the > translation is up-to-date, not every time the user runs dpkg, because each > version of each package can have only one untranslated description associated > with it -- it's a unique key, by definition. > > If nothing else, perhaps you would consider that a .mo file containing > [untranslated string -> translation] mappings will on average be almost twice > as large as a .mo file containing [(package name,version) -> translation] > mappings. :) The problem is that you wont have to do a little wheel engineering, but a lot of. Think, you will have to design: - the extracting tool control -> po file ok, that's true for all solutions ;) I'm working on a patch against gettext so that it can handle text following rfc822. - a mechanism to help the translator finding which text have to be translated in the po file. With your solution, the translator will face something like msgid "dpkg-1.9" msgstr "" and how will them find what text they have to translate ? most of the translators I know are running the stable version of debian because they are not as geek as maintainers. - a mechanism to produce the mo file, or what ever. If you stick to the po format, you can reuse msgfmt, through. - an output mecanism, including the fallback to original if the translation is outdated. You have either to rewrite msgfmt to do this job at previous step, or design a new function in dpkg, apt, grep-dctrl, and all programm using the translated descriptions. If you change any tool of the gettext mechanism, you lost advantages from the translator point of view, like compendium, containing standard translations for reuse, or user-friendly tools like kbabel for translating, (including ispell possibility, which is implemented in kbabel, and some others) For what gain ? A lookup less ? But gettext is cached, and well optimized. I think the change and redesign is too much, regarding to the small speedup you can expect... Smaller resulting po files ? Come on, the woody+1 release will come on 6 CD or more, and you are speaking about saving a few Mb... These data will be well compressed, as any natural text, so that a minor problem, in my point of view. Bye, Mt. -- Un clavier azerty en vaut deux.
Re: new proposal: Translating Debian packages' descriptions
Nick Phillips <[EMAIL PROTECTED]> immo vero scripsit > >What is the size of all this? Ok. we have now in sid/main/i386 (see > >[2]) 7000 Packages and the descriptions of all this packages is > >2660993 bytes big. We get a description size per package of 384 bytes. > >With gzip we will get (maybe) 130 bytes. > This is not directed at this comment in particular, but many many of the > posts in these threads that I've been reading seem to be overly worried > about size. Stop and think about it. If you're going to have translations, > they will take up space, somewhere. That's just life. Thinking about compression of data, rather than mixing different language text in one package's control.tar.gz, it might be better to have them one-language each. Compression mechanisms have better chance of compression on data which have the same kind of text, and Japanese text compress better when it is all Japanese text. So, a control.tar.gz in each package has more chance of being bigger than having language.tar.gz for each language. And, longer files have better compression ratios. regards, junichi -- [EMAIL PROTECTED] http://www.netfort.gr.jp/~dancer
Re: new proposal: Translating Debian packages' descriptions
On Tue, Sep 04, 2001 at 05:51:38PM -0500, Steve Langasek wrote: > > > Gettext normally uses the entire untranslated string as the key in the .mo > > > file. This has many advantages when dealing with translation of strings > > > in > > > programs, where the untranslated string is actually present in the program > > > source, and this is a big reason the GNU project favors gettext over > > > catgets > > > systems found on other Unices. It makes less sense in the case of package > > > descriptions, however, because we're effectively doing two lookups -- > > > first to > > > find the English description in Packages.gz using the package name and > > > version > > > as a key, then to find the translated description in the .mo file using > > > the > > > English description as a key. > > > yes, you must two lookups. First in the package db (normal in the > > menory) and (if LANG is set) make a second lookup with gettext. > > > But this not a big problem, or is there a problem? > > It casts doubt on the argument that gettext is a good solution here. Just > because gettext is the optimal solution for translation of messages within > programs does not mean it's the best solution for package translations. I'm > personally willing to do a little wheel-engineering if it leads to a more > elegant result. yes, maybe other technics like catgets, or own implementations are have in this special case more performance. But if such a solution more elegant? Other programs have also 'non static' output and use gettext. I don't see a real problme with gettext. Gettext work, you must not thought about it. It is like a VW Käfer, it is runing, runing and runing. IMHO it is elegant, if you get with a only -9/+30 clear patch translated description in dpkg. Reuse the code. If we have a real advantage with a re-engineering wheel, make a re-engineering. I have no problem with a special wheel, if we need it. > > If you put the translated text only in the db, and you don't use the > > english text as key (like gettext) you get maybe outdated translation. > > Only if the implementation is poor. The accuracy of a translation can be > verified in the process of assembling the file that is to be made available to > user machines (whether that file is Packages.gz, or debian-descs.mo, or > whatever). Obviously the /inputs/ used to create this file must include > mappings of English string -> translated string, but these mappings need not > be retained in the output file. We only need to make sure once that the > translation is up-to-date, not every time the user runs dpkg, because each > version of each package can have only one untranslated description associated > with it -- it's a unique key, by definition. > > If nothing else, perhaps you would consider that a .mo file containing > [untranslated string -> translation] mappings will on average be almost twice > as large as a .mo file containing [(package name,version) -> translation] > mappings. :) yes, you right in all point. I propose to use only .po files in the source with this thought. I use this [(package name,version) -> translation] relation also (to make a .po file from a Desription-XX and a Package file). This is possible, if we don't use gettext. Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debian.org PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux "Ein Computer ist nunmal ein Hochgeschwindigkeitstrottel." -- Jens Dittmar in de.comp.os.unix.linux.newusers pgpVMuqK8fAyY.pgp Description: PGP signature
Re: new proposal: Translating Debian packages' descriptions
On Wed, Sep 05, 2001 at 01:40:47AM +0200, Richard Atterer wrote: > On Tue, Sep 04, 2001 at 08:55:32PM +0200, Michael Bramer wrote: > > On Tue, Sep 04, 2001 at 07:59:47PM +0200, Richard Atterer wrote: > > > > > > 2.) get the .po/.mo files on the system > > > [snip] > > > >If we don't like this process on the client all the time, we can > > > >produce Descriptions-XX.po files and the clinet must only > > > >download this file and save this in the right dir. But this file > > > >will include the orignal description and with this it has the > > > >double size and download time. > > > > > > I don't know enough about gettext - am I assuming correctly that in > > > the .mo file, the English translation is replaced with a checksum or > > > similar, so you do not need to store the complete English translation? > > > > see in the info page from gettext. > [snip] > > You see in the mo file is the orignal description. > > OK, then why do you say above "But [the .po] will include the orignal > description and with this it has the double size"? Doubled size > compared to what? Packages-XX? Oh, I will explain: We can make two things: 1.) Description-XX file In this file are only the Packagname and the Description (and maybe the version), like: !Package: foo !Description: trans of bar of foo ! trans of first line ! !Package: foo2 !Description: trans of headline of foo2 ! trans of first section of foo2 You see this file will not have the orignal description. Apt must combine this file with the normal Packages file and make with the orignal description of the Packages file a .po file. (The .po file must have the orignal description.) 2.) Description-XX.po file This is a normal po file. This File have the orignal description with the translation, like: !msgid "bar of foo\n" ! "first line" !msgstr "trans of bar of foo\n" ! "trans of first line" ! !msgid "headline of foo2\n" ! "first section of foo2" !msstr "trans of headline of foo2\n" ! "trans of first section of foo2" You see this file has the double size. But the client must not generate the .po file. It must only copy this file in the right location. You understand it now? > > if dpkg use gettext, dpkg show the translation of all textes in the > > mo file. And if you use apt-get update you have the translation of > > all packages (from the apt source) in the .mo file. > > Right, the Descriptions-XX.po.gz needs to contain all translations. > Sorry, I mixed things up. > > (BTW, wouldn't it make sense to represent the English translation only > with a checksum in Descriptions-XX? We'd save a lot of space...) yes, I make this with the ddts. But gettext don't work like this. If you use gettext (like other programs), you must have a .po file, make a .mo with this file and only use it. You can use your own way (like checksum, etc.) and don't use gettext. But now you must re-inventing the wheel, with all gettext features. And in Descriptions-XX we don't need checksum, we have the package name and maybe the version. With this we can assign the translation to the orignal description. > What do you think of my main point: Since we already have an override > facility with the Descriptions-XX.po.gz, why should we bother > introducing another override mechanism which modifies the > control.tar.gz? OK, dpkg --info will not work until the maintainer > catches up, but most people use dselect or "apt-cache show". If we use my proposal, the information in control.tar.gz will only used by dpkg --info and from katie to produce the Description-XX[.po] file. All other outputs use gettext and no special files from dpkg, control, etc. Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debian.org PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux "Every use of Linux is a proper use of Linux." -- John "Maddog" Hall, Keynote at the Linux Kongress in Cologne pgpMdhWcp9Zq2.pgp Description: PGP signature
Re: new proposal: Translating Debian packages' descriptions
On Tue, Sep 04, 2001 at 08:55:32PM +0200, Michael Bramer wrote: > On Tue, Sep 04, 2001 at 07:59:47PM +0200, Richard Atterer wrote: > > > > 2.) get the .po/.mo files on the system > > [snip] > > >If we don't like this process on the client all the time, we can > > >produce Descriptions-XX.po files and the clinet must only > > >download this file and save this in the right dir. But this file > > >will include the orignal description and with this it has the > > >double size and download time. > > > > I don't know enough about gettext - am I assuming correctly that in > > the .mo file, the English translation is replaced with a checksum or > > similar, so you do not need to store the complete English translation? > > see in the info page from gettext. [snip] > You see in the mo file is the orignal description. OK, then why do you say above "But [the .po] will include the orignal description and with this it has the double size"? Doubled size compared to what? Packages-XX? > > > put the translation as normal .po file in the > > > /usr/share/desc-trans//desc-trans.d/ dir. finish. > > > > > > This don't need some extra work on dpkg etc. > > > > Actually, I think this is completely sufficient. Let the > > maintainer include updated translations at his convenience in new > > uploads, and use the override mechanism for the > > Descriptions-XX.mo.gz files until he has done so. > > Descriptions-XX.mo.gz? not Descriptions-XX.po.gz? Er, maybe .po. As I said, I'm not really a gettext expert. > if dpkg use gettext, dpkg show the translation of all textes in the > mo file. And if you use apt-get update you have the translation of > all packages (from the apt source) in the .mo file. Right, the Descriptions-XX.po.gz needs to contain all translations. Sorry, I mixed things up. (BTW, wouldn't it make sense to represent the English translation only with a checksum in Descriptions-XX? We'd save a lot of space...) What do you think of my main point: Since we already have an override facility with the Descriptions-XX.po.gz, why should we bother introducing another override mechanism which modifies the control.tar.gz? OK, dpkg --info will not work until the maintainer catches up, but most people use dselect or "apt-cache show". All the best, Richard -- __ _ |_) /| Richard Atterer | CS student at the Technische | GnuPG key: | \/¯| http://atterer.net | Universität München, Germany | 0x888354F7 ¯ ´` ¯
Re: new proposal: Translating Debian packages' descriptions
On Tue, 4 Sep 2001, Michael Bramer wrote: > On Tue, Sep 04, 2001 at 02:24:41PM -0500, Steve Langasek wrote: > > > I don't know enough about gettext - am I assuming correctly that in > > > the .mo file, the English translation is replaced with a checksum or > > > similar, so you do not need to store the complete English translation? > > Gettext normally uses the entire untranslated string as the key in the .mo > > file. This has many advantages when dealing with translation of strings in > > programs, where the untranslated string is actually present in the program > > source, and this is a big reason the GNU project favors gettext over catgets > > systems found on other Unices. It makes less sense in the case of package > > descriptions, however, because we're effectively doing two lookups -- first > > to > > find the English description in Packages.gz using the package name and > > version > > as a key, then to find the translated description in the .mo file using the > > English description as a key. > yes, you must two lookups. First in the package db (normal in the > menory) and (if LANG is set) make a second lookup with gettext. > But this not a big problem, or is there a problem? It casts doubt on the argument that gettext is a good solution here. Just because gettext is the optimal solution for translation of messages within programs does not mean it's the best solution for package translations. I'm personally willing to do a little wheel-engineering if it leads to a more elegant result. > If you put the translated text only in the db, and you don't use the > english text as key (like gettext) you get maybe outdated translation. Only if the implementation is poor. The accuracy of a translation can be verified in the process of assembling the file that is to be made available to user machines (whether that file is Packages.gz, or debian-descs.mo, or whatever). Obviously the /inputs/ used to create this file must include mappings of English string -> translated string, but these mappings need not be retained in the output file. We only need to make sure once that the translation is up-to-date, not every time the user runs dpkg, because each version of each package can have only one untranslated description associated with it -- it's a unique key, by definition. If nothing else, perhaps you would consider that a .mo file containing [untranslated string -> translation] mappings will on average be almost twice as large as a .mo file containing [(package name,version) -> translation] mappings. :) Steve Langasek postmodern programmer
Re: new proposal: Translating Debian packages' descriptions
On Tue, Sep 04, 2001 at 02:24:41PM -0500, Steve Langasek wrote: > > I don't know enough about gettext - am I assuming correctly that in > > the .mo file, the English translation is replaced with a checksum or > > similar, so you do not need to store the complete English translation? > > Gettext normally uses the entire untranslated string as the key in the .mo > file. This has many advantages when dealing with translation of strings in > programs, where the untranslated string is actually present in the program > source, and this is a big reason the GNU project favors gettext over catgets > systems found on other Unices. It makes less sense in the case of package > descriptions, however, because we're effectively doing two lookups -- first to > find the English description in Packages.gz using the package name and version > as a key, then to find the translated description in the .mo file using the > English description as a key. yes, you must two lookups. First in the package db (normal in the menory) and (if LANG is set) make a second lookup with gettext. But this not a big problem, or is there a problem? If you put the translated text only in the db, and you don't use the english text as key (like gettext) you get maybe outdated translation. And better a untranslated text than a wrong translation. > > For the binary package, I don't know... - Gnome and KDE do include all > > translations, and I think it's easier to handle. Additionally, disc > > space is really cheap these days, so maybe it would be better just to > > include all the descriptions, too. > > I think it does belong in the binary package; if not, I'm not sure why we > would want it in the source package at all. I believe translated descriptions > have just as much reason for inclusion in the binary package's control file > (or in a functional equivalent) as the rest of the informational stuff that's > in there. > > If translated Description: fields in binary packages are not important, then > why do we currently have the untranslated Description: in the control file? yes, If we add the translation in the source, we should also add it in the normal deb. Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debian.org PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux "Nicht geschehene Taten ziehen oft einen erstaunlichen Mangel an Folgen nach sich." -- S.J. Lec pgpSxhzPg8Lu2.pgp Description: PGP signature
Re: new proposal: Translating Debian packages' descriptions
On Tue, Sep 04, 2001 at 02:24:41PM -0500, Steve Langasek wrote: > > I think it's very important to have the translations in the *source* > > package. > > Also agreed. Why? Each system will usually only require one language per package. The rest, as far as any particular system is concerned, is just bloat. It would be more powerful, flexible to keep translations physically (as they are logically) separate. Granted, there may be cases where it is useful to *be able* to put translations into source/binary packages, but I believe that in most cases, it will be more convenient and more useful to keep them separate. Keeping them separate reduces space required in any particular archive, or on any particular system. It reduces maintainer workload, reduces the translators' reliance on maintainers, increases accountability (by allowing each uploaded item to be signed by the relevant maintainer/translator), reduces unnecessary upload/downloads, increases flexibility (for example allowing multiple different sources of any particular language, or makes it easy to provide unofficial translation archives), it makes more sense given a potentially arbitrary number of translated languages... As I said elsewhere, think of the archive as a big database (which it is). Then think about how you would/should normalize the data. When you ship bits around outside the database, it may be useful to be able to encapsulate several related records from the database into one object. Fine, but that doesn't mean that you should screw up your database and do it that way all the time. > > For the binary package, I don't know... - Gnome and KDE do include all > > translations, and I think it's easier to handle. Additionally, disc > > space is really cheap these days, so maybe it would be better just to > > include all the descriptions, too. Gnome and KDE include the translations because they know that that's the only way they can ensure that everyone who distributes their packages distributes the translations. They are designing their systems in the absense of any other good working way of doing it *right now*. The fact that they do that in no way implies that Debian should also do that. > I think it does belong in the binary package; if not, I'm not sure why we > would want it in the source package at all. No reason at all. It doesn't make sense to have it in either, in most cases. > I believe translated descriptions > have just as much reason for inclusion in the binary package's control file > (or in a functional equivalent) as the rest of the informational stuff that's > in there. No they don't. Translations are not providing extra information. They are providing the same information in multiple different ways. > If translated Description: fields in binary packages are not important, then > why do we currently have the untranslated Description: in the control file? Because you need *a* description, and in the past there has only been one description, so there was no reason to normalise it out into a different object. You don't, however, need 15, and when there are 15 or however many, it makes sense to normalise them out. Cheers, Nick -- Nick Phillips -- [EMAIL PROTECTED] If you think last Tuesday was a drag, wait till you see what happens tomorrow!
Re: new proposal: Translating Debian packages' descriptions
yOn Tue, 4 Sep 2001, Steve Langasek wrote: > Hello Richard, > > On Tue, 4 Sep 2001, Richard Atterer wrote: > > > On Tue, Sep 04, 2001 at 01:22:16PM +0200, Michael Bramer wrote: > > > 1.) use all the time _gettext_! > > > I agree, otherwise we'd just have to keep re-inventing the wheel. > > > > 2.) get the .po/.mo files on the system > > [snip] > > >If we don't like this process on the client all the time, we can > > >produce Descriptions-XX.po files and the clinet must only > > >download this file and save this in the right dir. But this file > > >will include the orignal description and with this it has the > > >double size and download time. > > > I don't know enough about gettext - am I assuming correctly that in > > the .mo file, the English translation is replaced with a checksum or > > similar, so you do not need to store the complete English translation? > > Gettext normally uses the entire untranslated string as the key in the .mo > file. This has many advantages when dealing with translation of strings in > programs, where the untranslated string is actually present in the program > source, and this is a big reason the GNU project favors gettext over catgets > systems found on other Unices. It makes less sense in the case of package > descriptions, however, because we're effectively doing two lookups -- first to > find the English description in Packages.gz using the package name and version > as a key, then to find the translated description in the .mo file using the > English description as a key. -
Re: new proposal: Translating Debian packages' descriptions
Hello Richard, On Tue, 4 Sep 2001, Richard Atterer wrote: > On Tue, Sep 04, 2001 at 01:22:16PM +0200, Michael Bramer wrote: > > 1.) use all the time _gettext_! > I agree, otherwise we'd just have to keep re-inventing the wheel. > > 2.) get the .po/.mo files on the system > [snip] > >If we don't like this process on the client all the time, we can > >produce Descriptions-XX.po files and the clinet must only > >download this file and save this in the right dir. But this file > >will include the orignal description and with this it has the > >double size and download time. > I don't know enough about gettext - am I assuming correctly that in > the .mo file, the English translation is replaced with a checksum or > similar, so you do not need to store the complete English translation? Gettext normally uses the entire untranslated string as the key in the .mo file. This has many advantages when dealing with translation of strings in programs, where the untranslated string is actually present in the program source, and this is a big reason the GNU project favors gettext over catgets systems found on other Unices. It makes less sense in the case of package descriptions, however, because we're effectively doing two lookups -- first to find the English description in Packages.gz using the package name and version as a key, then to find the translated description in the .mo file using the English description as a key. > >check this calculation: > > If in all sources are only one desription with 130 (geziped) > > bytes of description we get 1 MByte per languages. If we use po > > files in the source (see below), we get 2 MBytes per languages > > And all deb packages have only one description with 130 (geziped) > > bytes. This make 10 MByte per languages. If we store the > > description as po file, we will use 20 MByte per languges. > > 11/22 MByte per languages, with only 10 languages we will get > > 110/220 MBytes. > Hm, this sounds a lot, but note that relative to the complete archive > size, it's only an increase of about 1%. A well-invested way of using > this disc space, IMHO. Agreed. Spending 1% archive space for the benefit of the 80%+ of our userbase who doesn't speak English natively is not prohibitive. > I think it's very important to have the translations in the *source* > package. Also agreed. > For the binary package, I don't know... - Gnome and KDE do include all > translations, and I think it's easier to handle. Additionally, disc > space is really cheap these days, so maybe it would be better just to > include all the descriptions, too. I think it does belong in the binary package; if not, I'm not sure why we would want it in the source package at all. I believe translated descriptions have just as much reason for inclusion in the binary package's control file (or in a functional equivalent) as the rest of the informational stuff that's in there. If translated Description: fields in binary packages are not important, then why do we currently have the untranslated Description: in the control file? Steve Langasek postmodern programmer
Re: new proposal: Translating Debian packages' descriptions
On Tue, Sep 04, 2001 at 07:59:47PM +0200, Richard Atterer wrote: > > 2.) get the .po/.mo files on the system > [snip] > >If we don't like this process on the client all the time, we can > >produce Descriptions-XX.po files and the clinet must only > >download this file and save this in the right dir. But this file > >will include the orignal description and with this it has the > >double size and download time. > > I don't know enough about gettext - am I assuming correctly that in > the .mo file, the English translation is replaced with a checksum or > similar, so you do not need to store the complete English translation? see in the info page from gettext. from the info page: ... Then, at offset O and offset T in the picture, two tables of string descriptors can be found. In both tables, each string descriptor uses two 32 bits integers, one for the string length, another for the offset of the string in the MO file, counting in bytes from the start of the file. The first table contains descriptors for the original strings, and is sorted so the original strings are in increasing lexicographical order. The second table contains descriptors for the translated strings, and is parallel to the first table: to find the corresponding translation one has to access the array slot in the second array with the same index. ... You see in the mo file is the orignal description. > >Because of this I propose some solutions: > > > >1.) (very fast) > > > > put the translation as normal .po file in the > > /usr/share/desc-trans//desc-trans.d/ dir. finish. > > > > This don't need some extra work on dpkg etc. > > Actually, I think this is completely sufficient. Let the maintainer > include updated translations at his convenience in new uploads, and > use the override mechanism for the Descriptions-XX.mo.gz files until > he has done so. Descriptions-XX.mo.gz? not Descriptions-XX.po.gz? > Hm, but note that this means that dpkg will need to look first at > Descriptions-XX.mo files downloaded by apt, and only then at the .po > file in the package. Would that be a problem? if dpkg use gettext, dpkg show the translation of all textes in the mo file. And if you use apt-get update you have the translation of all packages (from the apt source) in the .mo file. If you have installed the package you have the translation in the .mo file to. Only a dpkg --info don't show the translated description with this solution. > >2.) Put the translation in the control.tar.gz of the deb. > [snip] This can show the translation with --info! > >3.) Add the desc-trans.tar.gz in the deb ar as a own new element. > [snip] This can show the translation with --info! this is the main difference from the user view of this three solutions. Gruss Grisu -- Michael Bramer - a Debian Linux Developer http://www.debian.org PGP: finger [EMAIL PROTECTED] -- Linux Sysadmin -- Use Debian Linux Traue nie einer Computerzeitschrift mit schoenen Frauen auf dem Cover. (Besim Karadeniz in de.comm.internet.misc) pgpwyc38tLW4r.pgp Description: PGP signature
Re: new proposal: Translating Debian packages' descriptions
Hi Michael, all in all, I think this sounds nice! On Tue, Sep 04, 2001 at 01:22:16PM +0200, Michael Bramer wrote: > 1.) use all the time _gettext_! I agree, otherwise we'd just have to keep re-inventing the wheel. > 2.) get the .po/.mo files on the system [snip] >If we don't like this process on the client all the time, we can >produce Descriptions-XX.po files and the clinet must only >download this file and save this in the right dir. But this file >will include the orignal description and with this it has the >double size and download time. I don't know enough about gettext - am I assuming correctly that in the .mo file, the English translation is replaced with a checksum or similar, so you do not need to store the complete English translation? > 4.) How get katie (or the desc-trans-XX.deb) the translation? [snip] > 5.) translated descriptions in the package. [snip] >I see only one problem: the size. > >We have now 80446 .deb packages and 7643 source packages in the >debian archiv on ftp-master. If we include the translation in the >deb, we must store this in the source and in every deb package. > >check this calculation: > If in all sources are only one desription with 130 (geziped) > bytes of description we get 1 MByte per languages. If we use po > files in the source (see below), we get 2 MBytes per languages > And all deb packages have only one description with 130 (geziped) > bytes. This make 10 MByte per languages. If we store the > description as po file, we will use 20 MByte per languges. > 11/22 MByte per languages, with only 10 languages we will get > 110/220 MBytes. Hm, this sounds a lot, but note that relative to the complete archive size, it's only an increase of about 1%. A well-invested way of using this disc space, IMHO. >With more Packages, ports, languages, this will grow. This bytes >must all be downloaded, uploaded and synced with the time. > >And on the local system the descriptions and the translations of >all languages from the package will stored on the local harddisk >(without gzip). Count: > With 10 languages, 1000 installed Packages and 380 Bytes per > description and per translation you get additional 4/8MBytes on > the local disk. > >Is this all usefull in a 'normal' deb package from the debian >project? Maybe yes. We must decide this. (I personal don't find the >real pro about this. But we can add it and I don't have a real >problem with this. I see only the size problem, and this is not a >big problem.) I think it's very important to have the translations in the *source* package. For the binary package, I don't know... - Gnome and KDE do include all translations, and I think it's easier to handle. Additionally, disc space is really cheap these days, so maybe it would be better just to include all the descriptions, too. >In all the cases I propose: store the description in the source as > .po file in the /debian/ dir (one per languages). This is the > only real good way to store the translations. (no encodeing > problem, no outdated text, no debconf-mergetemplate hack, ...) Seconded. >But how get the maintainer the translation? We have some cases: > - The maintainer translate the description hisself > - He find some own translator (like now with debconf) > - He use the ddtp > - He can ask the ddts and get all translations of the package > - He can use the override file of katie > - He use the notification mails from the ddts (In future the > server will use the decided format in this mail. With this, > the maintaner must only copy this file in the source.) > >Now the technique part: > >The proposal with the biggest patch, is the 'put the translation in >a own element in the deb ar'. Maybe this is nice and feasible. >But this is not a fast way. > >Because of this I propose some solutions: > >1.) (very fast) > > put the translation as normal .po file in the > /usr/share/desc-trans//desc-trans.d/ dir. finish. > > This don't need some extra work on dpkg etc. Actually, I think this is completely sufficient. Let the maintainer include updated translations at his convenience in new uploads, and use the override mechanism for the Descriptions-XX.mo.gz files until he has done so. Hm, but note that this means that dpkg will need to look first at Descriptions-XX.mo files downloaded by apt, and only then at the .po file in the package. Would that be a problem? >2.) Put the translation in the control.tar.gz of the deb. [snip] >3.) Add the desc-trans.tar.gz in the deb ar as a own new element. [snip] Cheers, Richard -- __ _ |_) /| Richard Atterer | CS student at the Technische | GnuPG key: | \/¯| http://atterer.net | Universität München, Germany | 0x888354F7 ¯ ´` ¯
Re: new proposal: Translating Debian packages' descriptions
On Tue, 4 Sep 2001, Martin Quinson wrote: > On Tue, Sep 04, 2001 at 02:52:40PM +0200, Simon Richter wrote: > > On Tue, 4 Sep 2001, Michael Bramer wrote: > > > > > After I read some more mails and write some comments myself, IMHO it > > > is time to write a newer hopefully better proposal. Not all is new. > > > But I add some new thoughs and some parts from some comments. > > > > We can reduce the download size by 50% by letting the ddts database decide > > which translations are still up to date and pack only those into the > > downloadable file. It won't be a .po file, but it will be smaller. > You can also have the ddts building po files with only the uptodate > translations in it. So, it will be smaller, and you still can use the > gettext mecanism. This is a very good point that I think hasn't received enough attention yet. The gettext mechanism of resolving translations is an important one to make use of, but it /doesn't/ have to take place on the end user machine. We have an additional, guaranteed-unique index that we can use on description translations: the package name and version. The end user needs exactly one translated description per language for each package name and version that's present in the Debian archive. If there is more than one translation for the same original text, one is out of date -- drop it. If there are translations that don't match the descriptions on any packages in the current archive, discard them -- they don't belong in any file that we're sending to users. If there are packages installed on a system that are no longer present in the archive, the translated descriptions should be stored on the local system: it should not be expected that the archive will keep track of these outdated translations. With such a structure, it doesn't matter what the precise lookup mechanism is for finding the translations on the end user machine. The translations can still be stored in .po files if you like -- gettext is very nice, after all -- and use full text of the original description as the key for lookups, as is done with most gettext stuff. Or you can use package_version-deb.version as the key, which is bound to be a little more efficient. Or you can use any other mechanism that can look up the translation based on the (language,package,version) tuple -- including storing the requested translations in {the,a} Packages.gz file and in /var/lib/dpkg/available. This is not such a complicated thing that "reinventing the wheel" is dangerous. Steve Langasek postmodern programmer
Re: new proposal: Translating Debian packages' descriptions
#include Nick Phillips wrote on Tue Sep 04, 2001 um 03:30:08PM: > So you probably don't usually want the translations to be part of the > package sources or binaries. They're logically separate, and should usually > be physically separate (as physically as we ever get in this sense). ... > So, apt, for example would be told what to do with a line: > > deb-trans http://www.debian.org/debian potato/de main contrib non-free Just my 0.02¤: You are completely right. We should keep the translated stuff out of the packages. Some people claim a package to contain all stuff related to it, but it is too complicated considering the frequency of updated when translating the stuff. Developers should work together with DDTS if they want to have control of the stuff concerning their packages. IMHO a new script should work with the ddts database and create the needed .po files. Additionally, the .mo file may be created so the clients won't have to compile then on their side. This needs further decissions. About the space usage: I thing the best way is to create a big .po/.mo file for all arches and each release. Gruss/Regards, Eduard. -- Für manche ist es Windows, für andere der längste Virus der Welt
Re: new proposal: Translating Debian packages' descriptions
Sorry to screw up the threading; thought I'd posted this already, then deleted grisu's message before finding that I hadn't sent this :( On Tue, Sep 04, 2001 at 01:22:16PM +0200, Michael Bramer wrote: > Not all parts are turned into stone. I need some comments and decision > on some parts. Maybe you can help. > > One quote from a mail from Raphael Hertzog: > I find that having translations is far better that having not a > single one and refusing to add them because we can't have the perfect > solution right now. However we do need to make sure that whatever we do right now can be migrated fairly painlessly to the perfect solution, whatever that may be. > 1.) use all the time _gettext_! And don't forget that "the perfect solution" will involve translations of all relevant text in a package, not just descriptions. I'm not saying that anyone has forgotten this, just that a lot of the thought in these threads seems to have been aimed at getting translated descriptions, and that all would do well to remember the final aim at all times... > 2.) get the .po/.mo files on the system > >If we will use gettext, we must get one .mo file on the system. I'm not familiar with gettext, but I would suggest that this is incorrect. I'd have thought that you would eventually need (at least) one .mo per package, and several others (such as for package descriptions). >I propose the dir /usr/share/desc-trans//desc-trans.d/ to >store all .po files. You've forgotten that in the end we'll not just be talking about descriptions, haven't you? >If you make a apt-get update (or a other funktion like this in >deity and co), you have (maybe) new and changed description in the >apt database. And now you need a newer, better .po file. Because of >this, I propose to download the .po like file (see below) with apt >by the update process. Does the user actually ever need the .po? I thought you said that the .mo was generated from the .po, and then the .mo is used. >What is the size of all this? Ok. we have now in sid/main/i386 (see >[2]) 7000 Packages and the descriptions of all this packages is >2660993 bytes big. We get a description size per package of 384 bytes. >With gzip we will get (maybe) 130 bytes. Whoa there. I guess this is a good a point as any for me to "go off on one". This is not directed at this comment in particular, but many many of the posts in these threads that I've been reading seem to be overly worried about size. Stop and think about it. If you're going to have translations, they will take up space, somewhere. That's just life. Now, think about the structure of where they should/could go, and the relationships between source, binary, and text data. Think databases. Think normalization. The text data in any one of an *arbitrary* number of languages is related to the package, but you'd normally normalize it out into a separate table in your database - you don't want to have your packages' source and binary records growing to arbitrary sizes as arbitrary numbers of translations are added to them. So you probably don't usually want the translations to be part of the package sources or binaries. They're logically separate, and should usually be physically separate (as physically as we ever get in this sense). Gettext abstracts the *idea* that is being communicated from the text used to communicate it. That leaves the actual text used as an overlay, metadata. So, we need to structure the repositories in such a way that the structure of the data is respected. It also happens that this conveniently allows for separation of areas of maintainer/translator expertise (and also responsibility). Packages as prepared by a maintainer need to contain text (.mo) for at least one language; probably usually english, but once this works there'll be no good reason for that to be the case. Translations of a package would logically be in another file (we have .dsc, .deb, .tar.gz, .diff already describing logically different aspects of a package, so there's no problem adding .trans or similar). The exact best method to store these is open to question, but I'd guess that it would be another section in the archive, as sources and binaries are split now. So, apt, for example would be told what to do with a line: deb-trans http://www.debian.org/debian potato/de main contrib non-free Which would be able to provide Packages files created from the various translation packages. Multiple versions of the same package would be dealt with in the same way as currently. It also allows certain mirrors to provide certain sets of translations, which will certainly be a Good Thing. And CD sets could easily include one extra CD which provided the translation section of the archive for whatever languages are required (OK, for the initial install there would need to be a little more jiggery-pokery). Exceptions and trickery needed: 1) to ensure that versions provided within a p
Re: new proposal: Translating Debian packages' descriptions
On Tue, Sep 04, 2001 at 02:52:40PM +0200, Simon Richter wrote: > On Tue, 4 Sep 2001, Michael Bramer wrote: > > > After I read some more mails and write some comments myself, IMHO it > > is time to write a newer hopefully better proposal. Not all is new. > > But I add some new thoughs and some parts from some comments. > > We can reduce the download size by 50% by letting the ddts database decide > which translations are still up to date and pack only those into the > downloadable file. It won't be a .po file, but it will be smaller. You can also have the ddts building po files with only the uptodate translations in it. So, it will be smaller, and you still can use the gettext mecanism. > Also, an important point will be that the downloadable files be sorted, so > that you can diff them easily (I believe there is some effort going on to > make diffs from the Packages files). > > I'm going to apply for a new job now, after that I'll take a more > extensive look into this. Good luck, Mt. -- Un clavier azerty en vaut deux.
Re: new proposal: Translating Debian packages' descriptions
On Tue, 4 Sep 2001, Michael Bramer wrote: > After I read some more mails and write some comments myself, IMHO it > is time to write a newer hopefully better proposal. Not all is new. > But I add some new thoughs and some parts from some comments. We can reduce the download size by 50% by letting the ddts database decide which translations are still up to date and pack only those into the downloadable file. It won't be a .po file, but it will be smaller. Also, an important point will be that the downloadable files be sorted, so that you can diff them easily (I believe there is some effort going on to make diffs from the Packages files). I'm going to apply for a new job now, after that I'll take a more extensive look into this. Simon -- GPG public key available from http://phobos.fs.tum.de/pgp/Simon.Richter.asc Fingerprint: DC26 EB8D 1F35 4F44 2934 7583 DBB6 F98D 9198 3292 Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!
new proposal: Translating Debian packages' descriptions
Hello all After I read some more mails and write some comments myself, IMHO it is time to write a newer hopefully better proposal. Not all is new. But I add some new thoughs and some parts from some comments. In this proposal I have combined the decentralized translations, and also the central repository. And this all without a delay in the translator to user path. Not all parts are turned into stone. I need some comments and decision on some parts. Maybe you can help. One quote from a mail from Raphael Hertzog: I find that having translations is far better that having not a single one and refusing to add them because we can't have the perfect solution right now. Add Translations of the Package Description in the Debian Distribution (c) Michael Bramer <[EMAIL PROTECTED]> 1.) use all the time _gettext_! All know gettext and all use this. Why should we use gettext to add the translated description in the debian describution? Because of this. Gettext is *the* technic for translations. All know it, you need not teach a maintainer, you need not teach a user (a important point). If a user already use a system with locale enviroment, he just will have translated descriptions in future. gettext make all the work and gettext is tested (and is useing in many programes). With this you need only some little pachtes. (We show a -9/+30 patch for dselect/dpkg and hopefully a apt patch will it not much bigger.) Gettext show never outdated translation (a big point) and have other nice features (see below). Maybe the release manager will allowing this patch in woody, but this is a other story. If apt and dpkg is patched and the user have a nice .mo file in /usr/share/desc-trans// all output of _all_ package management programs is transled. (dpkg and APT use a patch, other programs (like deity, etc.) use APT) gettext support already fallback languages. See [1] for more informations. If I understand the gettext source code in the right way, the fallback is per message and not per .mo file. With this someone can set LANGUAGE=hu:sl:cz and get a hungarian->slovak->czech->->english fallback path. (If a description is translated in slovak but not in hungarian, the user will see the slovak description.) This is all nice, and we have only one problem. How will the user get a nice .mo file? First on comment on this question: You have this problem all the time with the description. You must download the descriptions and the translations first. Only and after this, you can use (see) it and install the real programs/packages. With the normal (english) Descriptions we use the Packages files (with apt or dselect (the old methodes)) We must use somethink like this with the translations too... 2.) get the .po/.mo files on the system If we will use gettext, we must get one .mo file on the system. The .mo file is generatted from a .po file and it is itself a binary data file. If you have some sources (like ftp.debian.de and a local mirror with own packages) you will have some translations and some .mo/.po files. The best way is, that you download the .po files, merge this files with a tool and make from this one big .po file a .mo file and use this file. (maybe you must only make a 'cat *.po > master.po', I have not test this now, but this is only a technical question and problem) I propose the dir /usr/share/desc-trans//desc-trans.d/ to store all .po files. If you make a apt-get update (or a other funktion like this in deity and co), you have (maybe) new and changed description in the apt database. And now you need a newer, better .po file. Because of this, I propose to download the .po like file (see below) with apt by the update process. What is the size of all this? Ok. we have now in sid/main/i386 (see [2]) 7000 Packages and the descriptions of all this packages is 2660993 bytes big. We get a description size per package of 384 bytes. With gzip we will get (maybe) 130 bytes. With this the size on the system is like the Package files from apt. If you have some sources you will have some (5-20) Megabytes in /usr/share/desc-trans//desc-trans.d/ and a collect .mo file per language. But the admin of the system must pay this price, if he will see translated descriptions. (and it don't care if we use gettext or a other technic, with gettext we have only the extra .mo file.) But what file should apt download? The first thought is maybe a translated Packages-XX file. But the first thought is not the best way all the time. We have _now_ 316 Packages* (see [3]) files on ftp-master with 141 MByte of size. If we translate this all in (only) 10 languages we need 1,4 GByte. With more Packages and more Languages more and more. Ok, harddisk a