Re: Reasons why package central approach to handling translations may be suboptimal

2001-09-07 Thread Michael Bramer
On Fri, Sep 07, 2001 at 11:47:36AM +0200, Richard Atterer wrote:
> On Thu, Sep 06, 2001 at 08:10:10PM +0200, Michael Bramer wrote:
> > no, don't re-invent the wheel. This all make gettext. We don't need
> > patch apt, dpkg, other toold this way.
> > 
> > We must only use a old, nice and tested tool: gettext.
> 
> Nice, I wasn't aware it solves the encoding problem as well!

I don't check it, but the info page assert this:
...
   `gettext' not only looks up a translation in a message catalog.  It
also converts the translation on the fly to the desired output character
set.  This is useful if the user is working in a different character set
than the translator who created the message catalog, because it avoids
distributing variants of message catalogs which differ only in the
character set.
...
 
> The only place where it isn't 100% suited for our purposes is the
> Descriptions-XX.po, because the English text is duplicated. Would it

See the last proposal, we can bypass this problem. With a
Descriptions-XX file (without the .po and without the english
description) we can reduce the download size. But we must make a work
on the client side and we use the size on the client all the time.

> make sense to hack gettext to make it allow checksums instead of the
> English descriptions?

Maybe. But we can use gettext now. Make a generall improvement to
gettext to use optinal md5sum-.mo files. Gettext support a version
number in the .mo file also. 

With this improvement all programs can use this feature.


Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer http://www.debian.org
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
"Ein Computer ist nunmal ein Hochgeschwindigkeitstrottel."
  -- Jens Dittmar in de.comp.os.unix.linux.newusers


pgpVVTr42C1wP.pgp
Description: PGP signature


Re: Reasons why package central approach to handling translations may be suboptimal

2001-09-07 Thread Richard Atterer
On Thu, Sep 06, 2001 at 08:10:10PM +0200, Michael Bramer wrote:
> no, don't re-invent the wheel. This all make gettext. We don't need
> patch apt, dpkg, other toold this way.
> 
> We must only use a old, nice and tested tool: gettext.

Nice, I wasn't aware it solves the encoding problem as well!

The only place where it isn't 100% suited for our purposes is the
Descriptions-XX.po, because the English text is duplicated. Would it
make sense to hack gettext to make it allow checksums instead of the
English descriptions?

Cheers,

  Richard

-- 
  __   _
  |_) /|  Richard Atterer |  CS student at the Technische  |  GnuPG key:
  | \/¯|  http://atterer.net  |  Universität München, Germany  |  0x888354F7
  ¯ ´` ¯




Re: Reasons why package central approach to handling translations may be suboptimal

2001-09-06 Thread Michael Bramer
On Thu, Sep 06, 2001 at 03:31:49PM +0200, Richard Atterer wrote:
> On Thu, Sep 06, 2001 at 03:42:00PM +0900, Junichi Uekawa wrote:
> > I have been reading the DDTS thread, and seeing that it was
> > resolving into a "each package should maintain their translation". I
> > would like to present what I think may be problematic in that
> > approach :
> > 1. This results in filing random bugs in BTS in random manner. [snip]
> 
> I think there's an answer to this problem: When the maintainer updates
> the translations of his package, this should be fully automated.
> E.g. just one command
> 
>   update-translations --from-mbox 
> 
> which fishes out the DDTS messages (which are sent in a standard
> layout), or, for people lucky enough to be permanently online, an
> "update-translations --from-web" command in debian/rules which gets
> the translation updates directly from the DDTS server.
> 
> So far, *everything* related to a Debian package can be found in the
> corresponding source package. I don't think it's a good idea to change
> this.

full agree

> > 2. A package is re-uploaded with translation. There is a package 
> > uploaded with one-line changelog saying something like "Merged
> > spanish templates". It is a load to autobuilder/ftpmirror/etc. and
> > of course manual intervention to rebuild a package means that an
> > error occurs, and it does.
> > 
> > The main problem here is that translation start after the initial
> > upload of packageto Debian happens, which means there will be a "-2"
> > Debian package which will include the translation, and a "-1" Debian
> > package will have no translation.
> 
> Yes, that's one of the basic problems. IMHO with the proposed
> "override" mechanism via Descriptions-XX.po (or whichever form it is
> going to have), this is mostly solved - anyone getting the "-1"
> package from testing will already see the translation. People tracking
> unstable may or may not see them, depending on how often they update
> and on the speed of the translators.
> 
> Some people are concerned that their daily update from unstable will
> need too much bandwidth because of all those extra uploads. If the
> override mechanism works, I see no reason not to have a policy "don't
> re-upload if only the translation changed".

full agree

> > 3. No choice of "I want this locale and not others".
> 
> I assume in particular you mean "I prefer this _encoding_"? This is a
> point that hasn't been discussed at all so far.

see below

> Will there be one description .po file per language in the source
> package, or one for all translations? The alternatives here are:

In a .po file is only one languages. 

So we have n Description.po file (n = Number of supportet languages).
A User must only downloads the needed Descriptions files.

In the source we have control-de.po, control-fr.po, control-es.po, ...
maybe all in one subdir.

> - Supply descriptions in UTF-8, and recode them for the user's current
>   encoding on the user's machine. Nice and clean, but requires support
>   (possibly quite extensive changes) in dpkg/apt. NB, we _do_not_ need
>   full Unicode support in all of Debian for this, just in the tools
>   that deal directly with the description data.

no, don't re-invent the wheel. This all make gettext. We don't need
patch apt, dpkg, other toold this way. 

We must only use a old, nice and tested tool: gettext.

> - Supply descriptions in UTF-8 and recode them to a default encoding
>   that root can configure on each machine. Do the recoding immediately
>   after an "apt-get update" or "dpkg -i", so the UTF-8 version is
>   never stored on the machine. Might be the way to go for the moment,
>   even if it's not ideal. Most importantly, it is upwards-compatible
>   with the first solution above.

we don't need this all

> - Pick one default encoding per language and just assume the user uses
>   that encoding. Problematic: Should we ever want to change the
>   default encoding, there'll be lots of packages using the old
>   encoding, and we'd be stuck with it.

yes, we use one default encoding per languages in the ddtp. 
 
> I'm all for at least _supplying_ the translations in UTF-8, even if
> they're not stored on the user's machine like that for now. Note that
> this does not even mean that the translators need to produce
> translations in UTF-8 - the DDTS can recode their work into UTF-8.

In future the ddts will make this and send UTF-8 encoded po files. I
have get a request von Wichert to use UTF-8 only. We can latin-X etc
recode to UTF-8, so this all is no problem.


Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer http://www.debian.org
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
 Win0.98 supports real multitasking - it can boot and crash simultaneously.




Re: Reasons why package central approach to handling translations may be suboptimal

2001-09-06 Thread Richard Atterer
On Thu, Sep 06, 2001 at 03:42:00PM +0900, Junichi Uekawa wrote:
> I have been reading the DDTS thread, and seeing that it was
> resolving into a "each package should maintain their translation". I
> would like to present what I think may be problematic in that
> approach :
> 1. This results in filing random bugs in BTS in random manner. [snip]

I think there's an answer to this problem: When the maintainer updates
the translations of his package, this should be fully automated.
E.g. just one command

  update-translations --from-mbox 

which fishes out the DDTS messages (which are sent in a standard
layout), or, for people lucky enough to be permanently online, an
"update-translations --from-web" command in debian/rules which gets
the translation updates directly from the DDTS server.

So far, *everything* related to a Debian package can be found in the
corresponding source package. I don't think it's a good idea to change
this.

> 2. A package is re-uploaded with translation. There is a package 
> uploaded with one-line changelog saying something like "Merged
> spanish templates". It is a load to autobuilder/ftpmirror/etc. and
> of course manual intervention to rebuild a package means that an
> error occurs, and it does.
> 
> The main problem here is that translation start after the initial
> upload of packageto Debian happens, which means there will be a "-2"
> Debian package which will include the translation, and a "-1" Debian
> package will have no translation.

Yes, that's one of the basic problems. IMHO with the proposed
"override" mechanism via Descriptions-XX.po (or whichever form it is
going to have), this is mostly solved - anyone getting the "-1"
package from testing will already see the translation. People tracking
unstable may or may not see them, depending on how often they update
and on the speed of the translators.

Some people are concerned that their daily update from unstable will
need too much bandwidth because of all those extra uploads. If the
override mechanism works, I see no reason not to have a policy "don't
re-upload if only the translation changed".

> 3. No choice of "I want this locale and not others".

I assume in particular you mean "I prefer this _encoding_"? This is a
point that hasn't been discussed at all so far.

Will there be one description .po file per language in the source
package, or one for all translations? The alternatives here are:

- Supply descriptions in UTF-8, and recode them for the user's current
  encoding on the user's machine. Nice and clean, but requires support
  (possibly quite extensive changes) in dpkg/apt. NB, we _do_not_ need
  full Unicode support in all of Debian for this, just in the tools
  that deal directly with the description data.

- Supply descriptions in UTF-8 and recode them to a default encoding
  that root can configure on each machine. Do the recoding immediately
  after an "apt-get update" or "dpkg -i", so the UTF-8 version is
  never stored on the machine. Might be the way to go for the moment,
  even if it's not ideal. Most importantly, it is upwards-compatible
  with the first solution above.

- Pick one default encoding per language and just assume the user uses
  that encoding. Problematic: Should we ever want to change the
  default encoding, there'll be lots of packages using the old
  encoding, and we'd be stuck with it.

I'm all for at least _supplying_ the translations in UTF-8, even if
they're not stored on the user's machine like that for now. Note that
this does not even mean that the translators need to produce
translations in UTF-8 - the DDTS can recode their work into UTF-8.

Comments?

Cheers,

  Richard

-- 
  __   _
  |_) /|  Richard Atterer |  CS student at the Technische  |  GnuPG key:
  | \/¯|  http://atterer.net  |  Universität München, Germany  |  0x888354F7
  ¯ ´` ¯


pgp16ClvZnD3o.pgp
Description: PGP signature


Reasons why package central approach to handling translations may be suboptimal

2001-09-06 Thread Junichi Uekawa

Hello, 

I have been reading the DDTS thread, and seeing that it was resolving into
a "each package should maintain their translation". I would like to
present what I  think may be problematic in that approach :

1. This results in filing random bugs in BTS in random manner. Telling the
submitter to re-submit in a more useful format, and not gettinga response
is bad. Many of the debconf translation "bugs" have translated text
inserted in the mail body. It should really be an attachment if it were to
be handled sanely, to preserve the intended encoding. (for example,
Japanese mail is in ISO-2022-JP, while debconf data should be in euc-jp).
BTS doesn't seem to like attachments either.

2. A package is re-uploaded with translation. There is a package 
uploaded with one-line changelog saying something like
"Merged spanish templates". It is a load to
autobuilder/ftpmirror/etc. and of course manual intervention 
to rebuild a package means that an error occurs, and it does.

The main problem here is that translation start after the initial 
upload of packageto Debian happens, which means there will be
a "-2" Debian package which will include the translation, and 
a "-1" Debian package will have no translation.


3. No choice of "I want this locale and not others".




This is the reason I think the translation data would be 
better off maintained outside of usual packages.


regards,
junichi

-- 
[EMAIL PROTECTED]  http://www.netfort.gr.jp/~dancer