Re: new proposal: Translating Debian packages' descriptions

2001-09-13 Thread Michael Bramer
On Thu, Sep 13, 2001 at 01:26:07PM -0500, Steve Langasek wrote:
> On Thu, 13 Sep 2001, Michael Bramer wrote:
> > > The end-user tools would never have to deal with outdated translations if 
> > > the
> > > ".mo" file is assembled ahead of time in a central location.  Match up the
> > > translations, insert them into the distilled .po file using the package
> > > name/version as the key, and you're done.
> 
> > one point:
> >  I get the package-1.2 from 'commercial german distributor' with german
> >  translation. And I watch (with apt-cache show) the description
> >  package-1.2.1 from security.debian.org?
> 
> >  I don't see the german description...
> 
> > other point:
> >  I have a gnome-base-1.2 from HelixCode and show the description from
> >  gnome-base-1.2 from debian... But the description from both packages
> >  is not the same
> 
> >  (I know our package management don't support this case, but we have
> >  this case in real (maybe)...)
> 
> This is an interesting point, but the solution is simple.  If package+version
> can't be used as a key to uniquely identify packages in the dpkg status
> database, then we key on whatever /is/ used by this database.

apt use IMHO a 'long' pointer. like:
security.debian.org_dists_potato_updates_main_binary-i386_Package_version

But yes, this don't need the size of a normal description.

> > next point:
> >  You have in your apt-source some sources. Like a older CD-Rom with
> >  2.1 and 2.2_r0, a uptodate 2.2_r3 from ftp.debian.org, testing and
> >  sid from http.debian.org.
> 
> >  With this you have alle descriptions many times in the .mo file.
> >  (like package-0.9 from 2.1, package-1.4 from 2.2_r0, package-1.4.1
> >  from 2.2_r3 and package-1.5 from testing and sid...)
> 
> If the descriptions remain constant for the packages, you're correct that
> there would be duplication.  But should we not optimize for the common case?
> Most users won't keep /old/ sources in the list; few will even have testing,
> unstable, and stable in the list at the same time.

No. stable and testing is more and more common. With pins you can
install all from stable and some packages from testing/unstable. I
make some talks about this feature and the user use it, after the know
it. Believe me. 

> > > I'm not suggesting replacing the format that translators will work with.  
> > > I'm
> > > just disagreeing that standard .mo files are the best solution to be
> > > integrated into dpkg and apt.
> > ...
> > > More direct lookups.  Smaller .po files.  Better integration with existing
> > > tools, instead of grafting a new arm onto our existing /var/lib/dpkg
> > > structure.
> 
> > Yes, .mo files are not the best thing. this is your point and you are
> > right.
> 
> > But this is a other problem and we can solve this problem parallel.
> 
> > I propose this:
> >   - use (a unchanged) gettext now in dpkg and get the thing rolling.
> >   - change the gettext to use a optional 'md5sum-like' thing for a
> > lookups.
> 
> > (save the translation with the md5sum of the orignal text as key)
> 
> For the goal of getting support for translated descriptions into Debian as
> soon as possible, I think use of unmodified gettext is a reasonable choice.

Yes. 

We should find a conclusion with the dpkg and apt developer. Only
with a conclusion (maybe without gettext) our user can use this
translations.

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer http://www.debian.org
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
 Der Optimist glaubt wir leben in der besten aller moeglichen Welten.
 Der Pessimist befuerchtet, dass das stimmt.


pgp3FQRorGpXF.pgp
Description: PGP signature


Re: new proposal: Translating Debian packages' descriptions

2001-09-13 Thread Steve Langasek
Hi Michael,

On Thu, 13 Sep 2001, Michael Bramer wrote:

> > The end-user tools would never have to deal with outdated translations if 
> > the
> > ".mo" file is assembled ahead of time in a central location.  Match up the
> > translations, insert them into the distilled .po file using the package
> > name/version as the key, and you're done.

> one point:
>  I get the package-1.2 from 'commercial german distributor' with german
>  translation. And I watch (with apt-cache show) the description
>  package-1.2.1 from security.debian.org?

>  I don't see the german description...

> other point:
>  I have a gnome-base-1.2 from HelixCode and show the description from
>  gnome-base-1.2 from debian... But the description from both packages
>  is not the same

>  (I know our package management don't support this case, but we have
>  this case in real (maybe)...)

This is an interesting point, but the solution is simple.  If package+version
can't be used as a key to uniquely identify packages in the dpkg status
database, then we key on whatever /is/ used by this database.

> next point:
>  You have in your apt-source some sources. Like a older CD-Rom with
>  2.1 and 2.2_r0, a uptodate 2.2_r3 from ftp.debian.org, testing and
>  sid from http.debian.org.

>  With this you have alle descriptions many times in the .mo file.
>  (like package-0.9 from 2.1, package-1.4 from 2.2_r0, package-1.4.1
>  from 2.2_r3 and package-1.5 from testing and sid...)

If the descriptions remain constant for the packages, you're correct that
there would be duplication.  But should we not optimize for the common case?
Most users won't keep /old/ sources in the list; few will even have testing,
unstable, and stable in the list at the same time.

And if we use a file format other than .mo files, it's also possible to design
a system of indirect lookups that /still/ edges out standard gettext lookups
for efficiency.

> > I'm not suggesting replacing the format that translators will work with.  
> > I'm
> > just disagreeing that standard .mo files are the best solution to be
> > integrated into dpkg and apt.
> ...
> > More direct lookups.  Smaller .po files.  Better integration with existing
> > tools, instead of grafting a new arm onto our existing /var/lib/dpkg
> > structure.

> Yes, .mo files are not the best thing. this is your point and you are
> right.

> But this is a other problem and we can solve this problem parallel.

> I propose this:
>   - use (a unchanged) gettext now in dpkg and get the thing rolling.
>   - change the gettext to use a optional 'md5sum-like' thing for a
> lookups.

> (save the translation with the md5sum of the orignal text as key)

For the goal of getting support for translated descriptions into Debian as
soon as possible, I think use of unmodified gettext is a reasonable choice.

Steve Langasek
postmodern programmer




Re: new proposal: Translating Debian packages' descriptions

2001-09-13 Thread Michael Bramer
On Tue, Sep 11, 2001 at 12:05:25PM -0500, Steve Langasek wrote:
> On Tue, 11 Sep 2001, Martin Quinson wrote:
> > On Tue, Sep 04, 2001 at 05:51:38PM -0500, Steve Langasek wrote:
> >  - an output mecanism, including the fallback to original if the translation
> >is outdated. You have either to rewrite msgfmt to do this job at previous
> >step, or design a new function in dpkg, apt, grep-dctrl, and all programm
> >using the translated descriptions.
> 
> The end-user tools would never have to deal with outdated translations if the
> ".mo" file is assembled ahead of time in a central location.  Match up the
> translations, insert them into the distilled .po file using the package
> name/version as the key, and you're done.

one point:
 I get the package-1.2 from 'commercial german distributor' with german
 translation. And I watch (with apt-cache show) the description
 package-1.2.1 from security.debian.org? 

 I don't see the german description...

other point:
 I have a gnome-base-1.2 from HelixCode and show the description from
 gnome-base-1.2 from debian... But the description from both packages
 is not the same

 (I know our package management don't support this case, but we have
 this case in real (maybe)...)

next point:
 You have in your apt-source some sources. Like a older CD-Rom with
 2.1 and 2.2_r0, a uptodate 2.2_r3 from ftp.debian.org, testing and
 sid from http.debian.org. 

 With this you have alle descriptions many times in the .mo file.
 (like package-0.9 from 2.1, package-1.4 from 2.2_r0, package-1.4.1
 from 2.2_r3 and package-1.5 from testing and sid...)

 And with the pin feature of apt, more and more people have more
 sources in sources.list...

> >  If you change any tool of the gettext mechanism, you lost advantages from
> >  the translator point of view, like compendium, containing standard
> >  translations for reuse, or user-friendly tools like kbabel for translating,
> >  (including ispell possibility, which is implemented in kbabel, and some
> >  others)
> 
> I'm not suggesting replacing the format that translators will work with.  I'm
> just disagreeing that standard .mo files are the best solution to be
> integrated into dpkg and apt.
...
> More direct lookups.  Smaller .po files.  Better integration with existing
> tools, instead of grafting a new arm onto our existing /var/lib/dpkg
> structure.

Yes, .mo files are not the best thing. this is your point and you are
right. 

But this is a other problem and we can solve this problem parallel. 

I propose this: 
  - use (a unchanged) gettext now in dpkg and get the thing rolling.
  - change the gettext to use a optional 'md5sum-like' thing for a
lookups.

(save the translation with the md5sum of the orignal text as key)

This .mo files can use all programmes with a lot of text and have the
advantages of this. 


Please don't reinvent the wheel, improve the old wheel and make it
chubbier. 


Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer http://www.debian.org
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
"Ziemlich viele Firmen, die alle kein Linux benutzen, würden nach Abschaltung
 der Linux-Rechner erst mal ins Schwimmen kommen." -- Matthias Peick


pgp6Fbio1IZ0O.pgp
Description: PGP signature


Re: new proposal: Translating Debian packages' descriptions

2001-09-13 Thread Steve Langasek
On Tue, 11 Sep 2001, Martin Quinson wrote:

> On Tue, Sep 04, 2001 at 05:51:38PM -0500, Steve Langasek wrote:

> > Only if the implementation is poor.  The accuracy of a translation can be
> > verified in the process of assembling the file that is to be made available 
> > to
> > user machines (whether that file is Packages.gz, or debian-descs.mo, or
> > whatever).  Obviously the /inputs/ used to create this file must include
> > mappings of English string -> translated string, but these mappings need not
> > be retained in the output file.  We only need to make sure once that the
> > translation is up-to-date, not every time the user runs dpkg, because each
> > version of each package can have only one untranslated description 
> > associated
> > with it -- it's a unique key, by definition.

> > If nothing else, perhaps you would consider that a .mo file containing
> > [untranslated string -> translation] mappings will on average be almost 
> > twice
> > as large as a .mo file containing [(package name,version) -> translation]
> > mappings. :)

> The problem is that you wont have to do a little wheel engineering, but a
> lot of. Think, you will have to design:

>  - the extracting tool control -> po file
>ok, that's true for all solutions ;) I'm working on a patch against
>gettext so that it can handle text following rfc822.

As you say, this is common to all solutions.


>  - a mechanism to help the translator finding which text have to be
>translated in the po file.

>With your solution, the translator will face something like

>msgid "dpkg-1.9"
>msgstr ""

Not at all.  I never suggested that anyone, translator or maintainer, would
directly manipulate a .po file that looks like this.  The .po files should
look exactly as you would expect them to.  It's only /after/ these .po files
have been submitted to the archive that they would be automatically processed
and matched up with actual packages in the archive, so that the resulting .mo
file (or file in another format) contains only the relevant translations.


>  - a mechanism to produce the mo file, or what ever. If you stick to the po
>format, you can reuse msgfmt, through.

So, msgfmt is one option, yes.  Other solutions to parse and merge text would
not be difficult to implement.

>  - an output mecanism, including the fallback to original if the translation
>is outdated. You have either to rewrite msgfmt to do this job at previous
>step, or design a new function in dpkg, apt, grep-dctrl, and all programm
>using the translated descriptions.

The end-user tools would never have to deal with outdated translations if the
".mo" file is assembled ahead of time in a central location.  Match up the
translations, insert them into the distilled .po file using the package
name/version as the key, and you're done.


>  If you change any tool of the gettext mechanism, you lost advantages from
>  the translator point of view, like compendium, containing standard
>  translations for reuse, or user-friendly tools like kbabel for translating,
>  (including ispell possibility, which is implemented in kbabel, and some
>  others)

I'm not suggesting replacing the format that translators will work with.  I'm
just disagreeing that standard .mo files are the best solution to be
integrated into dpkg and apt.

> For what gain ?

> A lookup less ? But gettext is cached, and well optimized. I think the
> change and redesign is too much, regarding to the small speedup you can
> expect...

> Smaller resulting po files ? Come on, the woody+1 release will come on 6 CD
> or more, and you are speaking about saving a few Mb... These data will be
> well compressed, as any natural text, so that a minor problem, in my point
> of view.

More direct lookups.  Smaller .po files.  Better integration with existing
tools, instead of grafting a new arm onto our existing /var/lib/dpkg
structure.

Steve Langasek
postmodern programmer




Re: new proposal: Translating Debian packages' descriptions

2001-09-11 Thread Martin Quinson
On Tue, Sep 04, 2001 at 05:51:38PM -0500, Steve Langasek wrote:
> On Tue, 4 Sep 2001, Michael Bramer wrote:
> 
> > On Tue, Sep 04, 2001 at 02:24:41PM -0500, Steve Langasek wrote:
> > > > I don't know enough about gettext - am I assuming correctly that in
> > > > the .mo file, the English translation is replaced with a checksum or
> > > > similar, so you do not need to store the complete English translation?
> 
> > > Gettext normally uses the entire untranslated string as the key in the .mo
> > > file.  This has many advantages when dealing with translation of strings 
> > > in
> > > programs, where the untranslated string is actually present in the program
> > > source, and this is a big reason the GNU project favors gettext over 
> > > catgets
> > > systems found on other Unices.  It makes less sense in the case of package
> > > descriptions, however, because we're effectively doing two lookups -- 
> > > first to
> > > find the English description in Packages.gz using the package name and 
> > > version
> > > as a key, then to find the translated description in the .mo file using 
> > > the
> > > English description as a key.
> 
> > yes, you must two lookups. First in the package db (normal in the
> > menory) and (if LANG is set) make a second lookup with gettext.
> 
> > But this not a big problem, or is there a problem?
> 
> It casts doubt on the argument that gettext is a good solution here.  Just
> because gettext is the optimal solution for translation of messages within
> programs does not mean it's the best solution for package translations.  I'm
> personally willing to do a little wheel-engineering if it leads to a more
> elegant result.
>

> > If you put the translated text only in the db, and you don't use the
> > english text as key (like gettext) you get maybe outdated translation.
> 
> Only if the implementation is poor.  The accuracy of a translation can be
> verified in the process of assembling the file that is to be made available to
> user machines (whether that file is Packages.gz, or debian-descs.mo, or
> whatever).  Obviously the /inputs/ used to create this file must include
> mappings of English string -> translated string, but these mappings need not
> be retained in the output file.  We only need to make sure once that the
> translation is up-to-date, not every time the user runs dpkg, because each
> version of each package can have only one untranslated description associated
> with it -- it's a unique key, by definition.
> 
> If nothing else, perhaps you would consider that a .mo file containing
> [untranslated string -> translation] mappings will on average be almost twice
> as large as a .mo file containing [(package name,version) -> translation]
> mappings. :)

The problem is that you wont have to do a little wheel engineering, but a
lot of. Think, you will have to design:
 - the extracting tool control -> po file
   ok, that's true for all solutions ;) I'm working on a patch against
   gettext so that it can handle text following rfc822.
 
 - a mechanism to help the translator finding which text have to be
   translated in the po file.

   With your solution, the translator will face something like
   
   msgid "dpkg-1.9" 
   msgstr ""

   and how will them find what text they have to translate ? most of the
   translators I know are running the stable version of debian because they
   are not as geek as maintainers.

 - a mechanism to produce the mo file, or what ever. If you stick to the po
   format, you can reuse msgfmt, through.
   
 - an output mecanism, including the fallback to original if the translation
   is outdated. You have either to rewrite msgfmt to do this job at previous
   step, or design a new function in dpkg, apt, grep-dctrl, and all programm
   using the translated descriptions.
   
 If you change any tool of the gettext mechanism, you lost advantages from
 the translator point of view, like compendium, containing standard
 translations for reuse, or user-friendly tools like kbabel for translating,
 (including ispell possibility, which is implemented in kbabel, and some
 others)
   
For what gain ? 

A lookup less ? But gettext is cached, and well optimized. I think the
change and redesign is too much, regarding to the small speedup you can
expect...

Smaller resulting po files ? Come on, the woody+1 release will come on 6 CD
or more, and you are speaking about saving a few Mb... These data will be
well compressed, as any natural text, so that a minor problem, in my point
of view.

Bye, Mt.

-- 
Un clavier azerty en vaut deux.




Re: new proposal: Translating Debian packages' descriptions

2001-09-05 Thread Junichi Uekawa
Nick Phillips <[EMAIL PROTECTED]> immo vero scripsit

> >What is the size of all this? Ok. we have now in sid/main/i386 (see
> >[2]) 7000 Packages and the descriptions of all this packages is
> >2660993 bytes big. We get a description size per package of 384 bytes.
> >With gzip we will get (maybe) 130 bytes. 

> This is not directed at this comment in particular, but many many of the
> posts in these threads that I've been reading seem to be overly worried
> about size. Stop and think about it. If you're going to have translations,
> they will take up space, somewhere. That's just life.


Thinking about compression of data, rather than mixing different language text
in one package's control.tar.gz, it might be better to have them 
one-language each.

Compression mechanisms have better chance of compression on data
which have the same kind of text, and Japanese text compress better
when it is all Japanese text.

So, a control.tar.gz in each package has more chance of being bigger 
than having language.tar.gz for each language.

And, longer files have better compression ratios.


regards,
junichi
-- 
[EMAIL PROTECTED]  http://www.netfort.gr.jp/~dancer




Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Michael Bramer
On Tue, Sep 04, 2001 at 05:51:38PM -0500, Steve Langasek wrote:
> > > Gettext normally uses the entire untranslated string as the key in the .mo
> > > file.  This has many advantages when dealing with translation of strings 
> > > in
> > > programs, where the untranslated string is actually present in the program
> > > source, and this is a big reason the GNU project favors gettext over 
> > > catgets
> > > systems found on other Unices.  It makes less sense in the case of package
> > > descriptions, however, because we're effectively doing two lookups -- 
> > > first to
> > > find the English description in Packages.gz using the package name and 
> > > version
> > > as a key, then to find the translated description in the .mo file using 
> > > the
> > > English description as a key.
> 
> > yes, you must two lookups. First in the package db (normal in the
> > menory) and (if LANG is set) make a second lookup with gettext.
> 
> > But this not a big problem, or is there a problem?
> 
> It casts doubt on the argument that gettext is a good solution here.  Just
> because gettext is the optimal solution for translation of messages within
> programs does not mean it's the best solution for package translations.  I'm
> personally willing to do a little wheel-engineering if it leads to a more
> elegant result.

yes, maybe other technics like catgets, or own implementations are
have in this special case more performance. But if such a solution
more elegant?

Other programs have also 'non static' output and use gettext. I don't
see a real problme with gettext. Gettext work, you must not thought
about it. It is like a VW Käfer, it is runing, runing and runing. 

IMHO it is elegant, if you get with a only -9/+30 clear patch
translated description in dpkg. Reuse the code. 

If we have a real advantage with a re-engineering wheel, make a
re-engineering. I have no problem with a special wheel, if we need it.

> > If you put the translated text only in the db, and you don't use the
> > english text as key (like gettext) you get maybe outdated translation.
> 
> Only if the implementation is poor.  The accuracy of a translation can be
> verified in the process of assembling the file that is to be made available to
> user machines (whether that file is Packages.gz, or debian-descs.mo, or
> whatever).  Obviously the /inputs/ used to create this file must include
> mappings of English string -> translated string, but these mappings need not
> be retained in the output file.  We only need to make sure once that the
> translation is up-to-date, not every time the user runs dpkg, because each
> version of each package can have only one untranslated description associated
> with it -- it's a unique key, by definition.
> 
> If nothing else, perhaps you would consider that a .mo file containing
> [untranslated string -> translation] mappings will on average be almost twice
> as large as a .mo file containing [(package name,version) -> translation]
> mappings. :)

yes, you right in all point. 

I propose to use only .po files in the source with this thought.

I use this [(package name,version) -> translation] relation also (to
make a .po file from a Desription-XX and a Package file). 

This is possible, if we don't use gettext. 

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer http://www.debian.org
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
"Ein Computer ist nunmal ein Hochgeschwindigkeitstrottel."
  -- Jens Dittmar in de.comp.os.unix.linux.newusers


pgpVMuqK8fAyY.pgp
Description: PGP signature


Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Michael Bramer
On Wed, Sep 05, 2001 at 01:40:47AM +0200, Richard Atterer wrote:
> On Tue, Sep 04, 2001 at 08:55:32PM +0200, Michael Bramer wrote:
> > On Tue, Sep 04, 2001 at 07:59:47PM +0200, Richard Atterer wrote:
> > 
> > > > 2.) get the .po/.mo files on the system
> > > [snip]
> > > >If we don't like this process on the client all the time, we can
> > > >produce Descriptions-XX.po files and the clinet must only
> > > >download this file and save this in the right dir. But this file
> > > >will include the orignal description and with this it has the
> > > >double size and download time.
> > > 
> > > I don't know enough about gettext - am I assuming correctly that in
> > > the .mo file, the English translation is replaced with a checksum or
> > > similar, so you do not need to store the complete English translation?
> > 
> > see in the info page from gettext.
> [snip]
> > You see in the mo file is the orignal description.
> 
> OK, then why do you say above "But [the .po] will include the orignal
> description and with this it has the double size"? Doubled size
> compared to what? Packages-XX?

Oh, I will explain:

We can make two things: 
1.) Description-XX file

  In this file are only the Packagname and the Description (and maybe
  the version), like:
   !Package: foo
   !Description: trans of bar of foo
   ! trans of first line
   !
   !Package: foo2
   !Description: trans of headline of foo2
   ! trans of first section of foo2

  You see this file will not have the orignal description. Apt must
  combine this file with the normal Packages file and make with the
  orignal description of the Packages file a .po file.

  (The .po file must have the orignal description.)

2.) Description-XX.po file

  This is a normal po file. This File have the orignal description
  with the translation, like:
   !msgid "bar of foo\n"
   ! "first line"
   !msgstr "trans of bar of foo\n"
   ! "trans of first line"
   !
   !msgid "headline of foo2\n"
   ! "first section of foo2"
   !msstr "trans of headline of foo2\n"
   ! "trans of first section of foo2"

  You see this file has the double size. But the client must not
  generate the .po file. It must only copy this file in the right
  location. 

You understand it now?

> > if dpkg use gettext, dpkg show the translation of all textes in the
> > mo file. And if you use apt-get update you have the translation of
> > all packages (from the apt source) in the .mo file.
> 
> Right, the Descriptions-XX.po.gz needs to contain all translations.
> Sorry, I mixed things up.
> 
> (BTW, wouldn't it make sense to represent the English translation only
> with a checksum in Descriptions-XX? We'd save a lot of space...)

yes, I make this with the ddts. But gettext don't work like this. If
you use gettext (like other programs), you must have a .po file, make
a .mo with this file and only use it. 

You can use your own way (like checksum, etc.) and don't use gettext.
But now you must re-inventing the wheel, with all gettext features.

And in Descriptions-XX we don't need checksum, we have the package
name and maybe the version. With this we can assign the translation to
the orignal description.

> What do you think of my main point: Since we already have an override
> facility with the Descriptions-XX.po.gz, why should we bother
> introducing another override mechanism which modifies the
> control.tar.gz? OK, dpkg --info will not work until the maintainer
> catches up, but most people use dselect or "apt-cache show".

If we use my proposal, the information in control.tar.gz will only
used by dpkg --info and from katie to produce the Description-XX[.po]
file. 

All other outputs use gettext and no special files from dpkg, control,
etc. 

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer http://www.debian.org
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
"Every use of Linux is a proper use of Linux."
  -- John "Maddog" Hall, Keynote at the Linux Kongress in Cologne


pgpMdhWcp9Zq2.pgp
Description: PGP signature


Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Richard Atterer
On Tue, Sep 04, 2001 at 08:55:32PM +0200, Michael Bramer wrote:
> On Tue, Sep 04, 2001 at 07:59:47PM +0200, Richard Atterer wrote:
> 
> > > 2.) get the .po/.mo files on the system
> > [snip]
> > >If we don't like this process on the client all the time, we can
> > >produce Descriptions-XX.po files and the clinet must only
> > >download this file and save this in the right dir. But this file
> > >will include the orignal description and with this it has the
> > >double size and download time.
> > 
> > I don't know enough about gettext - am I assuming correctly that in
> > the .mo file, the English translation is replaced with a checksum or
> > similar, so you do not need to store the complete English translation?
> 
> see in the info page from gettext.
[snip]
> You see in the mo file is the orignal description.

OK, then why do you say above "But [the .po] will include the orignal
description and with this it has the double size"? Doubled size
compared to what? Packages-XX?

> > >  put the translation as normal .po file in the
> > >  /usr/share/desc-trans//desc-trans.d/ dir. finish. 
> > > 
> > >  This don't need some extra work on dpkg etc.
> > 
> > Actually, I think this is completely sufficient. Let the
> > maintainer include updated translations at his convenience in new
> > uploads, and use the override mechanism for the
> > Descriptions-XX.mo.gz files until he has done so.
> 
> Descriptions-XX.mo.gz? not Descriptions-XX.po.gz?

Er, maybe .po. As I said, I'm not really a gettext expert.

> if dpkg use gettext, dpkg show the translation of all textes in the
> mo file. And if you use apt-get update you have the translation of
> all packages (from the apt source) in the .mo file.

Right, the Descriptions-XX.po.gz needs to contain all translations.
Sorry, I mixed things up.

(BTW, wouldn't it make sense to represent the English translation only
with a checksum in Descriptions-XX? We'd save a lot of space...)

What do you think of my main point: Since we already have an override
facility with the Descriptions-XX.po.gz, why should we bother
introducing another override mechanism which modifies the
control.tar.gz? OK, dpkg --info will not work until the maintainer
catches up, but most people use dselect or "apt-cache show".

All the best,

  Richard

-- 
  __   _
  |_) /|  Richard Atterer |  CS student at the Technische  |  GnuPG key:
  | \/¯|  http://atterer.net  |  Universität München, Germany  |  0x888354F7
  ¯ ´` ¯




Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Steve Langasek
On Tue, 4 Sep 2001, Michael Bramer wrote:

> On Tue, Sep 04, 2001 at 02:24:41PM -0500, Steve Langasek wrote:
> > > I don't know enough about gettext - am I assuming correctly that in
> > > the .mo file, the English translation is replaced with a checksum or
> > > similar, so you do not need to store the complete English translation?

> > Gettext normally uses the entire untranslated string as the key in the .mo
> > file.  This has many advantages when dealing with translation of strings in
> > programs, where the untranslated string is actually present in the program
> > source, and this is a big reason the GNU project favors gettext over catgets
> > systems found on other Unices.  It makes less sense in the case of package
> > descriptions, however, because we're effectively doing two lookups -- first 
> > to
> > find the English description in Packages.gz using the package name and 
> > version
> > as a key, then to find the translated description in the .mo file using the
> > English description as a key.

> yes, you must two lookups. First in the package db (normal in the
> menory) and (if LANG is set) make a second lookup with gettext.

> But this not a big problem, or is there a problem?

It casts doubt on the argument that gettext is a good solution here.  Just
because gettext is the optimal solution for translation of messages within
programs does not mean it's the best solution for package translations.  I'm
personally willing to do a little wheel-engineering if it leads to a more
elegant result.

> If you put the translated text only in the db, and you don't use the
> english text as key (like gettext) you get maybe outdated translation.

Only if the implementation is poor.  The accuracy of a translation can be
verified in the process of assembling the file that is to be made available to
user machines (whether that file is Packages.gz, or debian-descs.mo, or
whatever).  Obviously the /inputs/ used to create this file must include
mappings of English string -> translated string, but these mappings need not
be retained in the output file.  We only need to make sure once that the
translation is up-to-date, not every time the user runs dpkg, because each
version of each package can have only one untranslated description associated
with it -- it's a unique key, by definition.

If nothing else, perhaps you would consider that a .mo file containing
[untranslated string -> translation] mappings will on average be almost twice
as large as a .mo file containing [(package name,version) -> translation]
mappings. :)

Steve Langasek
postmodern programmer




Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Michael Bramer
On Tue, Sep 04, 2001 at 02:24:41PM -0500, Steve Langasek wrote:
> > I don't know enough about gettext - am I assuming correctly that in
> > the .mo file, the English translation is replaced with a checksum or
> > similar, so you do not need to store the complete English translation?
> 
> Gettext normally uses the entire untranslated string as the key in the .mo
> file.  This has many advantages when dealing with translation of strings in
> programs, where the untranslated string is actually present in the program
> source, and this is a big reason the GNU project favors gettext over catgets
> systems found on other Unices.  It makes less sense in the case of package
> descriptions, however, because we're effectively doing two lookups -- first to
> find the English description in Packages.gz using the package name and version
> as a key, then to find the translated description in the .mo file using the
> English description as a key.

yes, you must two lookups. First in the package db (normal in the
menory) and (if LANG is set) make a second lookup with gettext.

But this not a big problem, or is there a problem?

If you put the translated text only in the db, and you don't use the
english text as key (like gettext) you get maybe outdated translation.

And better a untranslated text than a wrong translation.

> > For the binary package, I don't know... - Gnome and KDE do include all
> > translations, and I think it's easier to handle. Additionally, disc
> > space is really cheap these days, so maybe it would be better just to
> > include all the descriptions, too.
> 
> I think it does belong in the binary package; if not, I'm not sure why we
> would want it in the source package at all.  I believe translated descriptions
> have just as much reason for inclusion in the binary package's control file
> (or in a functional equivalent) as the rest of the informational stuff that's
> in there.
> 
> If translated Description: fields in binary packages are not important, then
> why do we currently have the untranslated Description: in the control file?

yes, If we add the translation in the source, we should also add it in
the normal deb.

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer http://www.debian.org
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
"Nicht geschehene Taten ziehen oft einen erstaunlichen Mangel an Folgen 
 nach sich." --   S.J. Lec


pgpSxhzPg8Lu2.pgp
Description: PGP signature


Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Nick Phillips
On Tue, Sep 04, 2001 at 02:24:41PM -0500, Steve Langasek wrote:

> > I think it's very important to have the translations in the *source*
> > package.
> 
> Also agreed.

Why? Each system will usually only require one language per package.
The rest, as far as any particular system is concerned, is just bloat.

It would be more powerful, flexible to keep translations physically (as
they are logically) separate.

Granted, there may be cases where it is useful to *be able* to put
translations into source/binary packages, but I believe that in most cases,
it will be more convenient and more useful to keep them separate.

Keeping them separate reduces space required in any particular archive,
or on any particular system. It reduces maintainer workload, reduces the
translators' reliance on maintainers, increases accountability (by allowing
each uploaded item to be signed by the relevant maintainer/translator),
reduces unnecessary upload/downloads, increases flexibility (for example
allowing multiple different sources of any particular language, or makes
it easy to provide unofficial translation archives), it makes more sense
given a potentially arbitrary number of translated languages...

As I said elsewhere, think of the archive as a big database (which it is).
Then think about how you would/should normalize the data.

When you ship bits around outside the database, it may be useful to be able
to encapsulate several related records from the database into one object.
Fine, but that doesn't mean that you should screw up your database and do
it that way all the time.

> > For the binary package, I don't know... - Gnome and KDE do include all
> > translations, and I think it's easier to handle. Additionally, disc
> > space is really cheap these days, so maybe it would be better just to
> > include all the descriptions, too.

Gnome and KDE include the translations because they know that that's the only
way they can ensure that everyone who distributes their packages distributes
the translations. They are designing their systems in the absense of any other
good working way of doing it *right now*. The fact that they do that in no
way implies that Debian should also do that.

> I think it does belong in the binary package; if not, I'm not sure why we
> would want it in the source package at all.

No reason at all. It doesn't make sense to have it in either, in most cases.

> I believe translated descriptions
> have just as much reason for inclusion in the binary package's control file
> (or in a functional equivalent) as the rest of the informational stuff that's
> in there.

No they don't. Translations are not providing extra information. They are
providing the same information in multiple different ways.

> If translated Description: fields in binary packages are not important, then
> why do we currently have the untranslated Description: in the control file?

Because you need *a* description, and in the past there has only been one
description, so there was no reason to normalise it out into a different
object. You don't, however, need 15, and when there are 15 or however many,
it makes sense to normalise them out.



Cheers,


Nick

-- 
Nick Phillips -- [EMAIL PROTECTED]
If you think last Tuesday was a drag, wait till you see what happens tomorrow!




Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Adam Heath
yOn Tue, 4 Sep 2001, Steve Langasek wrote:

> Hello Richard,
>
> On Tue, 4 Sep 2001, Richard Atterer wrote:
>
> > On Tue, Sep 04, 2001 at 01:22:16PM +0200, Michael Bramer wrote:
> > > 1.) use all the time _gettext_!
>
> > I agree, otherwise we'd just have to keep re-inventing the wheel.
>
> > > 2.) get the .po/.mo files on the system
> > [snip]
> > >If we don't like this process on the client all the time, we can
> > >produce Descriptions-XX.po files and the clinet must only
> > >download this file and save this in the right dir. But this file
> > >will include the orignal description and with this it has the
> > >double size and download time.
>
> > I don't know enough about gettext - am I assuming correctly that in
> > the .mo file, the English translation is replaced with a checksum or
> > similar, so you do not need to store the complete English translation?
>
> Gettext normally uses the entire untranslated string as the key in the .mo
> file.  This has many advantages when dealing with translation of strings in
> programs, where the untranslated string is actually present in the program
> source, and this is a big reason the GNU project favors gettext over catgets
> systems found on other Unices.  It makes less sense in the case of package
> descriptions, however, because we're effectively doing two lookups -- first to
> find the English description in Packages.gz using the package name and version
> as a key, then to find the translated description in the .mo file using the
> English description as a key.

-







Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Steve Langasek
Hello Richard,

On Tue, 4 Sep 2001, Richard Atterer wrote:

> On Tue, Sep 04, 2001 at 01:22:16PM +0200, Michael Bramer wrote:
> > 1.) use all the time _gettext_!

> I agree, otherwise we'd just have to keep re-inventing the wheel.

> > 2.) get the .po/.mo files on the system
> [snip]
> >If we don't like this process on the client all the time, we can
> >produce Descriptions-XX.po files and the clinet must only
> >download this file and save this in the right dir. But this file
> >will include the orignal description and with this it has the
> >double size and download time.

> I don't know enough about gettext - am I assuming correctly that in
> the .mo file, the English translation is replaced with a checksum or
> similar, so you do not need to store the complete English translation?

Gettext normally uses the entire untranslated string as the key in the .mo
file.  This has many advantages when dealing with translation of strings in
programs, where the untranslated string is actually present in the program
source, and this is a big reason the GNU project favors gettext over catgets
systems found on other Unices.  It makes less sense in the case of package
descriptions, however, because we're effectively doing two lookups -- first to
find the English description in Packages.gz using the package name and version
as a key, then to find the translated description in the .mo file using the
English description as a key.

> >check this calculation:
> >  If in all sources are only one desription with 130 (geziped)
> >  bytes of description we get 1 MByte per languages. If we use po
> >  files in the source (see below), we get 2 MBytes per languages
> >  And all deb packages have only one description with 130 (geziped)
> >  bytes. This make 10 MByte per languages. If we store the
> >  description as po file, we will use 20 MByte per languges.
> >  11/22 MByte per languages, with only 10 languages we will get
> >  110/220 MBytes.

> Hm, this sounds a lot, but note that relative to the complete archive
> size, it's only an increase of about 1%. A well-invested way of using
> this disc space, IMHO.

Agreed.  Spending 1% archive space for the benefit of the 80%+ of our userbase
who doesn't speak English natively is not prohibitive.

> I think it's very important to have the translations in the *source*
> package.

Also agreed.

> For the binary package, I don't know... - Gnome and KDE do include all
> translations, and I think it's easier to handle. Additionally, disc
> space is really cheap these days, so maybe it would be better just to
> include all the descriptions, too.

I think it does belong in the binary package; if not, I'm not sure why we
would want it in the source package at all.  I believe translated descriptions
have just as much reason for inclusion in the binary package's control file
(or in a functional equivalent) as the rest of the informational stuff that's
in there.

If translated Description: fields in binary packages are not important, then
why do we currently have the untranslated Description: in the control file?

Steve Langasek
postmodern programmer




Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Michael Bramer
On Tue, Sep 04, 2001 at 07:59:47PM +0200, Richard Atterer wrote:

> > 2.) get the .po/.mo files on the system
> [snip]
> >If we don't like this process on the client all the time, we can
> >produce Descriptions-XX.po files and the clinet must only
> >download this file and save this in the right dir. But this file
> >will include the orignal description and with this it has the
> >double size and download time.
> 
> I don't know enough about gettext - am I assuming correctly that in
> the .mo file, the English translation is replaced with a checksum or
> similar, so you do not need to store the complete English translation?

see in the info page from gettext.

from the info page:
...
   Then, at offset O and offset T in the picture, two tables of string
descriptors can be found.  In both tables, each string descriptor uses
two 32 bits integers, one for the string length, another for the offset
of the string in the MO file, counting in bytes from the start of the
file.  The first table contains descriptors for the original strings,
and is sorted so the original strings are in increasing lexicographical
order.  The second table contains descriptors for the translated
strings, and is parallel to the first table: to find the corresponding
translation one has to access the array slot in the second array with
the same index.
...

You see in the mo file is the orignal description.

 
> >Because of this I propose some solutions:
> > 
> >1.) (very fast)
> > 
> >  put the translation as normal .po file in the
> >  /usr/share/desc-trans//desc-trans.d/ dir. finish. 
> > 
> >  This don't need some extra work on dpkg etc.
> 
> Actually, I think this is completely sufficient. Let the maintainer
> include updated translations at his convenience in new uploads, and
> use the override mechanism for the Descriptions-XX.mo.gz files until
> he has done so.

Descriptions-XX.mo.gz? not Descriptions-XX.po.gz?

> Hm, but note that this means that dpkg will need to look first at
> Descriptions-XX.mo files downloaded by apt, and only then at the .po
> file in the package. Would that be a problem?

if dpkg use gettext, dpkg show the translation of all textes in the mo file.
And if you use apt-get update you have the translation of all packages
(from the apt source) in the .mo file. If you have installed the
package you have the translation in the .mo file to. 

Only a dpkg --info don't show the translated description with this
solution.

> >2.) Put the translation in the control.tar.gz of the deb.
> [snip]

This can show the translation with --info!

> >3.) Add the desc-trans.tar.gz in the deb ar as a own new element.
> [snip]

This can show the translation with --info!

this is the main difference from the user view of this three
solutions.

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer http://www.debian.org
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
Traue nie einer Computerzeitschrift mit schoenen Frauen auf dem Cover.
(Besim Karadeniz in de.comm.internet.misc)


pgpwyc38tLW4r.pgp
Description: PGP signature


Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Richard Atterer
Hi Michael,

all in all, I think this sounds nice!

On Tue, Sep 04, 2001 at 01:22:16PM +0200, Michael Bramer wrote:
> 1.) use all the time _gettext_!

I agree, otherwise we'd just have to keep re-inventing the wheel.

> 2.) get the .po/.mo files on the system
[snip]
>If we don't like this process on the client all the time, we can
>produce Descriptions-XX.po files and the clinet must only
>download this file and save this in the right dir. But this file
>will include the orignal description and with this it has the
>double size and download time.

I don't know enough about gettext - am I assuming correctly that in
the .mo file, the English translation is replaced with a checksum or
similar, so you do not need to store the complete English translation?

> 4.) How get katie (or the desc-trans-XX.deb) the translation?
[snip]

> 5.) translated descriptions in the package. 
[snip]
>I see only one problem: the size. 
> 
>We have now 80446 .deb packages and 7643 source packages in the
>debian archiv on ftp-master. If we include the translation in the
>deb, we must store this in the source and in every deb package. 
> 
>check this calculation:
>  If in all sources are only one desription with 130 (geziped)
>  bytes of description we get 1 MByte per languages. If we use po
>  files in the source (see below), we get 2 MBytes per languages
>  And all deb packages have only one description with 130 (geziped)
>  bytes. This make 10 MByte per languages. If we store the
>  description as po file, we will use 20 MByte per languges. 
>  11/22 MByte per languages, with only 10 languages we will get
>  110/220 MBytes. 

Hm, this sounds a lot, but note that relative to the complete archive
size, it's only an increase of about 1%. A well-invested way of using
this disc space, IMHO.

>With more Packages, ports, languages, this will grow. This bytes
>must all be downloaded, uploaded and synced with the time.
> 
>And on the local system the descriptions and the translations of
>all languages from the package will stored on the local harddisk
>(without gzip). Count:
>  With 10 languages, 1000 installed Packages and 380 Bytes per
>  description and per translation you get additional 4/8MBytes on
>  the local disk.
> 
>Is this all usefull in a 'normal' deb package from the debian
>project? Maybe yes. We must decide this. (I personal don't find the
>real pro about this. But we can add it and I don't have a real
>problem with this. I see only the size problem, and this is not a
>big problem.)

I think it's very important to have the translations in the *source*
package.

For the binary package, I don't know... - Gnome and KDE do include all
translations, and I think it's easier to handle. Additionally, disc
space is really cheap these days, so maybe it would be better just to
include all the descriptions, too.

>In all the cases I propose: store the description in the source as
>  .po file in the /debian/ dir (one per languages). This is the
>  only real good way to store the translations. (no encodeing
>  problem, no outdated text, no debconf-mergetemplate hack, ...)

Seconded.

>But how get the maintainer the translation? We have some cases:
> - The maintainer translate the description hisself
> - He find some own translator (like now with debconf)
> - He use the ddtp
>   - He can ask the ddts and get all translations of the package
>   - He can use the override file of katie
>   - He use the notification mails from the ddts (In future the
>   server will use the decided format in this mail. With this,
>   the maintaner must only copy this file in the source.)
> 
>Now the technique part:
> 
>The proposal with the biggest patch, is the 'put the translation in
>a own element in the deb ar'. Maybe this is nice and feasible. 
>But this is not a fast way. 
> 
>Because of this I propose some solutions:
> 
>1.) (very fast)
> 
>  put the translation as normal .po file in the
>  /usr/share/desc-trans//desc-trans.d/ dir. finish. 
> 
>  This don't need some extra work on dpkg etc.

Actually, I think this is completely sufficient. Let the maintainer
include updated translations at his convenience in new uploads, and
use the override mechanism for the Descriptions-XX.mo.gz files until
he has done so.

Hm, but note that this means that dpkg will need to look first at
Descriptions-XX.mo files downloaded by apt, and only then at the .po
file in the package. Would that be a problem?

>2.) Put the translation in the control.tar.gz of the deb.
[snip]
>3.) Add the desc-trans.tar.gz in the deb ar as a own new element.
[snip]

Cheers,

  Richard

-- 
  __   _
  |_) /|  Richard Atterer |  CS student at the Technische  |  GnuPG key:
  | \/¯|  http://atterer.net  |  Universität München, Germany  |  0x888354F7
  ¯ ´` ¯




Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Steve Langasek

On Tue, 4 Sep 2001, Martin Quinson wrote:

> On Tue, Sep 04, 2001 at 02:52:40PM +0200, Simon Richter wrote:
> > On Tue, 4 Sep 2001, Michael Bramer wrote:
> >
> > > After I read some more mails and write some comments myself, IMHO it
> > > is time to write a newer hopefully better proposal. Not all is new.
> > > But I add some new thoughs and some parts from some comments.
> >
> > We can reduce the download size by 50% by letting the ddts database decide
> > which translations are still up to date and pack only those into the
> > downloadable file. It won't be a .po file, but it will be smaller.

> You can also have the ddts building po files with only the uptodate
> translations in it. So, it will be smaller, and you still can use the
> gettext mecanism.

This is a very good point that I think hasn't received enough attention yet.
The gettext mechanism of resolving translations is an important one to make
use of, but it /doesn't/ have to take place on the end user machine.  We have
an additional, guaranteed-unique index that we can use on description
translations: the package name and version.

The end user needs exactly one translated description per language for each
package name and version that's present in the Debian archive.  If there is
more than one translation for the same original text, one is out of date --
drop it.  If there are translations that don't match the descriptions on any
packages in the current archive, discard them -- they don't belong in any file
that we're sending to users.  If there are packages installed on a system that
are no longer present in the archive, the translated descriptions should be
stored on the local system: it should not be expected that the archive will
keep track of these outdated translations.

With such a structure, it doesn't matter what the precise lookup mechanism is
for finding the translations on the end user machine.  The translations can
still be stored in .po files if you like -- gettext is very nice, after all --
and use full text of the original description as the key for lookups, as is
done with most gettext stuff.  Or you can use package_version-deb.version as
the key, which is bound to be a little more efficient.  Or you can use any
other mechanism that can look up the translation based on the
(language,package,version) tuple -- including storing the requested
translations in {the,a} Packages.gz file and in /var/lib/dpkg/available.

This is not such a complicated thing that "reinventing the wheel" is
dangerous.

Steve Langasek
postmodern programmer




Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Eduard Bloch
#include 
Nick Phillips wrote on Tue Sep 04, 2001 um 03:30:08PM:
> So you probably don't usually want the translations to be part of the
> package sources or binaries. They're logically separate, and should usually
> be physically separate (as physically as we ever get in this sense).
...
> So, apt, for example would be told what to do with a line:
> 
> deb-trans http://www.debian.org/debian potato/de main contrib non-free

Just my 0.02¤:

You are completely right. We should keep the translated stuff out of the
packages. Some people claim a package to contain all stuff related to
it, but it is too complicated considering the frequency of updated when
translating the stuff. Developers should work together with DDTS if they
want to have control of the stuff concerning their packages.

IMHO a new script should work with the ddts database and create the
needed .po files. Additionally, the .mo file may be created so the
clients won't have to compile then on their side. This needs further
decissions.

About the space usage: I thing the best way is to create a big .po/.mo
file for all arches and each release.

Gruss/Regards,
Eduard.
-- 
Für manche ist es Windows, für andere der längste Virus der Welt




Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Nick Phillips
Sorry to screw up the threading; thought I'd posted this already, then
deleted grisu's message before finding that I hadn't sent this :(

On Tue, Sep 04, 2001 at 01:22:16PM +0200, Michael Bramer wrote:

> Not all parts are turned into stone. I need some comments and decision 
> on some parts. Maybe you can help.
> 
> One quote from a mail from Raphael Hertzog:
>  I find that having translations is far better that having not a
>  single one and refusing to add them because we can't have the perfect
>  solution right now.

However we do need to make sure that whatever we do right now can be
migrated fairly painlessly to the perfect solution, whatever that may be.

> 1.) use all the time _gettext_!

And don't forget that "the perfect solution" will involve translations
of all relevant text in a package, not just descriptions. I'm not saying
that anyone has forgotten this, just that a lot of the thought in these
threads seems to have been aimed at getting translated descriptions, and
that all would do well to remember the final aim at all times...


> 2.) get the .po/.mo files on the system
> 
>If we will use gettext, we must get one .mo file on the system.

I'm not familiar with gettext, but I would suggest that this is incorrect.
I'd have thought that you would eventually need (at least) one .mo per
package, and several others (such as for package descriptions).

>I propose the dir /usr/share/desc-trans//desc-trans.d/ to
>store all .po files. 

You've forgotten that in the end we'll not just be talking about
descriptions, haven't you?

>If you make a apt-get update (or a other funktion like this in
>deity and co), you have (maybe) new and changed description in the
>apt database. And now you need a newer, better .po file. Because of
>this, I propose to download the .po like file (see below) with apt
>by the update process. 

Does the user actually ever need the .po? I thought you said that the
.mo was generated from the .po, and then the .mo is used.

>What is the size of all this? Ok. we have now in sid/main/i386 (see
>[2]) 7000 Packages and the descriptions of all this packages is
>2660993 bytes big. We get a description size per package of 384 bytes.
>With gzip we will get (maybe) 130 bytes. 

Whoa there. I guess this is a good a point as any for me to "go off on one".

This is not directed at this comment in particular, but many many of the
posts in these threads that I've been reading seem to be overly worried
about size. Stop and think about it. If you're going to have translations,
they will take up space, somewhere. That's just life.

Now, think about the structure of where they should/could go, and the
relationships between source, binary, and text data. Think databases.
Think normalization.

The text data in any one of an *arbitrary* number of languages is related
to the package, but you'd normally normalize it out into a separate
table in your database - you don't want to have your packages' source and
binary records growing to arbitrary sizes as arbitrary numbers of
translations are added to them.

So you probably don't usually want the translations to be part of the
package sources or binaries. They're logically separate, and should usually
be physically separate (as physically as we ever get in this sense).

Gettext abstracts the *idea* that is being communicated from the text used
to communicate it. That leaves the actual text used as an overlay, metadata.

So, we need to structure the repositories in such a way that the structure
of the data is respected. It also happens that this conveniently allows
for separation of areas of maintainer/translator expertise (and also
responsibility).

Packages as prepared by a maintainer need to contain text (.mo) for at
least one language; probably usually english, but once this works there'll
be no good reason for that to be the case. Translations of a package would
logically be in another file (we have .dsc, .deb, .tar.gz, .diff already
describing logically different aspects of a package, so there's no problem
adding .trans or similar). The exact best method to store these is open to
question, but I'd guess that it would be another section in the archive,
as sources and binaries are split now.

So, apt, for example would be told what to do with a line:

deb-trans http://www.debian.org/debian potato/de main contrib non-free


Which would be able to provide Packages files created from the various
translation packages. Multiple versions of the same package would be
dealt with in the same way as currently.

It also allows certain mirrors to provide certain sets of translations,
which will certainly be a Good Thing. And CD sets could easily include
one extra CD which provided the translation section of the archive for
whatever languages are required (OK, for the initial install there would
need to be a little more jiggery-pokery).


Exceptions and trickery needed:

  1) to ensure that versions provided within a p

Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Martin Quinson
On Tue, Sep 04, 2001 at 02:52:40PM +0200, Simon Richter wrote:
> On Tue, 4 Sep 2001, Michael Bramer wrote:
> 
> > After I read some more mails and write some comments myself, IMHO it
> > is time to write a newer hopefully better proposal. Not all is new.
> > But I add some new thoughs and some parts from some comments.
> 
> We can reduce the download size by 50% by letting the ddts database decide
> which translations are still up to date and pack only those into the
> downloadable file. It won't be a .po file, but it will be smaller.

You can also have the ddts building po files with only the uptodate
translations in it. So, it will be smaller, and you still can use the
gettext mecanism.

> Also, an important point will be that the downloadable files be sorted, so
> that you can diff them easily (I believe there is some effort going on to
> make diffs from the Packages files).
>
> I'm going to apply for a new job now, after that I'll take a more
> extensive look into this.

Good luck, Mt.

-- 
Un clavier azerty en vaut deux.




Re: new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Simon Richter
On Tue, 4 Sep 2001, Michael Bramer wrote:

> After I read some more mails and write some comments myself, IMHO it
> is time to write a newer hopefully better proposal. Not all is new.
> But I add some new thoughs and some parts from some comments.

We can reduce the download size by 50% by letting the ddts database decide
which translations are still up to date and pack only those into the
downloadable file. It won't be a .po file, but it will be smaller.

Also, an important point will be that the downloadable files be sorted, so
that you can diff them easily (I believe there is some effort going on to
make diffs from the Packages files).

I'm going to apply for a new job now, after that I'll take a more
extensive look into this.

   Simon

-- 
GPG public key available from http://phobos.fs.tum.de/pgp/Simon.Richter.asc
 Fingerprint: DC26 EB8D 1F35 4F44 2934  7583 DBB6 F98D 9198 3292
Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!




new proposal: Translating Debian packages' descriptions

2001-09-04 Thread Michael Bramer
Hello all


After I read some more mails and write some comments myself, IMHO it
is time to write a newer hopefully better proposal. Not all is new.
But I add some new thoughs and some parts from some comments.

In this proposal I have combined the decentralized translations, and
also the central repository. And this all without a delay in the
translator to user path. 

Not all parts are turned into stone. I need some comments and decision 
on some parts. Maybe you can help.

One quote from a mail from Raphael Hertzog:
 I find that having translations is far better that having not a
 single one and refusing to add them because we can't have the perfect
 solution right now.



 Add Translations of the Package Description 
  in the Debian Distribution

(c) Michael Bramer <[EMAIL PROTECTED]>


1.) use all the time _gettext_!

   All know gettext and all use this. Why should we use gettext to add
   the translated description in the debian describution? Because of
   this. Gettext is *the* technic for translations. 

   All know it, you need not teach a maintainer, you need not teach a
   user (a important point). If a user already use a system with
   locale enviroment, he just will have translated descriptions in
   future. 

   gettext make all the work and gettext is tested (and is useing in many
   programes). With this you need only some little pachtes. (We show a
   -9/+30 patch for dselect/dpkg and hopefully a apt patch will it not
   much bigger.) Gettext show never outdated translation (a big point)
   and have other nice features (see below).

   Maybe the release manager will allowing this patch in woody, but
   this is a other story.

   If apt and dpkg is patched and the user have a nice .mo file in
   /usr/share/desc-trans// all output of _all_ package
   management programs is transled. (dpkg and APT use a patch, other
   programs (like deity, etc.) use APT)

   gettext support already fallback languages. See [1] for more
   informations. If I understand the gettext source code in the right
   way, the fallback is per message and not per .mo file. With this
   someone can set LANGUAGE=hu:sl:cz and get a
   hungarian->slovak->czech->->english fallback path. (If a
   description is translated in slovak but not in hungarian, the user
   will see the slovak description.)


   This is all nice, and we have only one problem. How will the user
   get a nice .mo file? 

   First on comment on this question: You have this problem all the
 time with the description. You must download the
 descriptions and the translations first. Only and after this, you can
 use (see) it and install the real programs/packages. 
 With the normal (english) Descriptions we use the Packages files
 (with apt or dselect (the old methodes)) We must use somethink
 like this with the translations too...


2.) get the .po/.mo files on the system

   If we will use gettext, we must get one .mo file on the system.
   The .mo file is generatted from a .po file and it is itself a binary
   data file. If you have some sources (like ftp.debian.de and a
   local mirror with own packages) you will have some translations and
   some .mo/.po files.

   The best way is, that you download the .po files, merge this files
   with a tool and make from this one big .po file a .mo file and use
   this file. (maybe you must only make a 'cat *.po > master.po', I have
   not test this now, but this is only a technical question and
   problem)

   I propose the dir /usr/share/desc-trans//desc-trans.d/ to
   store all .po files. 
   
   If you make a apt-get update (or a other funktion like this in
   deity and co), you have (maybe) new and changed description in the
   apt database. And now you need a newer, better .po file. Because of
   this, I propose to download the .po like file (see below) with apt
   by the update process. 

   What is the size of all this? Ok. we have now in sid/main/i386 (see
   [2]) 7000 Packages and the descriptions of all this packages is
   2660993 bytes big. We get a description size per package of 384 bytes.
   With gzip we will get (maybe) 130 bytes. 
 
   With this the size on the system is like the Package files from
   apt. If you have some sources you will have some (5-20) Megabytes in
   /usr/share/desc-trans//desc-trans.d/ and a collect .mo file
   per language.

   But the admin of the system must pay this price, if he will see translated
   descriptions. (and it don't care if we use gettext or a other
   technic, with gettext we have only the extra .mo file.)

   But what file should apt download? The first thought is maybe a
   translated Packages-XX file. But the first thought is not the
   best way all the time.

   We have _now_ 316 Packages* (see [3]) files on ftp-master with 141
   MByte of size. If we translate this all in (only) 10 languages we
   need 1,4 GByte. With more Packages and more Languages more and
   more. Ok, harddisk a