Bug#448216: Support 3-letter codes for package description translations in APT (was: Re: Bug#448216: Waiting 2 years ago...)

2010-02-06 Thread David Kalnischkies
2010/2/5 Marcos marcoscosta...@gmail.com:
 We're waiting more of 2 years ago.
Don't (busy-)wait for it: Invest the time instead to provide a good
patch for it. :) It sometimes also help (as you see now)
to poke someone ~ but it is far better to do this with some
more information: for example, i didn't know that this minor bug
is blocking someones work! (the original bugreport only talks about
private use) The priority had increased by far with this information…


But enough storytelling:
In this specific bug case a patch from me [0] for this and a few other
Translation-file-specific things is pending for the next ABI-break
which will happen sometime in the future (before squeeze).

The relevant lines from the patch for this bug should be:
 // get the environment language code
 // we extract both, a long and a short code and then we will
 // check if we actually need both (rare) or if the short is enough
 string const envMsg = string(Locale == 0 ? std::setlocale(LC_MESSAGES, NULL) 
 : Locale);
 size_t const lenShort = (envMsg.find('_') != string::npos) ? envMsg.find('_') 
 : 2;
 size_t const lenLong = (envMsg.find('.') != string::npos) ? envMsg.find('.') 
 : (lenShort + 3);
 string envLong = envMsg.substr(0,lenLong);
 string const envShort = envLong.substr(0,lenShort);

I don't know if this really handles all cases as i don't know much about
locales, but it should handle at least e.g. de, de_DE, ast_DE (bogus) and
de_DE.UTF-8 and given that nobody else cared to provide a patch for it
i guess this will be better than nothing. This is btw also true for
the rest of the patch [0] as all my questions in the LongDesc thread [1]
remained unanswered…


Best regards / Mit freundlichen Grüßen,


David Kalnischkies

[0] http://bazaar.launchpad.net/~donkult/apt/sid/revision/1920 and ff.
[1] http://lists.debian.org/deity/2009/08/msg00112.html



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#448216: Support 3-letter codes for package description translations in APT (was: Re: Bug#448216: Waiting 2 years ago...)

2010-02-06 Thread Marcos
IMO if an 'xx_YY' file is downloaded APT should *always* also download the
corresponding 'xx' file if it exists.

Hi again :P
Remember that some languages have a iso code 2, then they have 3
letters, as Asturian (ast_ES or ast) :)
Thanks.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#448216: Support 3-letter codes for package description translations in APT (was: Re: Bug#448216: Waiting 2 years ago...)

2010-02-06 Thread Marcos
Hi! First of all: Thanks very much for the answers!

My language (ast) and the English (en_US) are a good examples for the bug :)

You can see the English version of apt-get update:
http://launchpadlibrarian.net/34947585/us_lang_example.png

And you can see the Asturian version of apt-get update:
http://launchpadlibrarian.net/34947533/ast_lang_example.png
In the case of ast (Asturian code) it is getting as (American Samoa).

Best regards and thanks in advance!




On Sat, Feb 6, 2010 at 3:20 PM, Frans Pop elen...@planet.nl wrote:
 (CCing as I'm not sure if you are subscribed to d-i18n)

 This is btw also true for the rest of the patch [0] as all my questions
 in the LongDesc thread [1] remained unanswered.

 I can answer one part of it.

 ! And last but not least a question more or less only for the l10n-team:
 ! APT currently includes a (short) list of languages for which it doesn't
 ! download the Translation file with the short but with the long code.
 ! e.g. for an pt_BR local it would download the file Translation-pt_BR
 ! file instead of even trying to download Translation-pt. On the other
 ! hand Translation-files like cs_CZ are never touched - apt only tries to
 ! download the cs file. So what do you think: Should apt try downloading
 ! long and short always OR short only if long is not available? The
 ! problematic here would be (while currently looking at the l18n for
 ! unstable main [0]) that e.g. the very small cs_CZ file would hide the
 ! larger cs file... (btw: Also a suggestion which whitelist should be used
 ! would be good, e.g. i think it is unlikely that we get a de_?? in the
 ! future...)

 A correct implementation of l10n support automatically falls back to the
 next best translation. The LANGUAGE environment variable can contain a
 list of languages to fall back to.

 For example, Debian Installer sets this as follows by default:
 - for Portuguese: LANGUAGE=pt:pt_BR:en
 - for Brazilian: LANGUAGE=pt_BR:pt:en
 Note that they are defined as fallbacks for eachother!

 And for some Scandinavian languages it's even more fun:
 - for Northern Sami: LANGUAGE=se_NO:nb_NO:nb:no_NO:no:nn_NO:nn:da:sv:en

 I think, but I'm not sure, that if LANGUAGE is *not* set, an automatic
 fallback from e.g. cs_CZ to cs will happen (and maybe even if it is set).

 It looks as if the APT download implementation has wanted to simplify this,
 or has maybe just wanted to limit downloads.

 IMO if an 'xx_YY' file is downloaded APT should *always* also download the
 corresponding 'xx' file if it exists.

 And I think that it would probably also be good if APT downloaded at least
 the first two or three languages listed in LANGUAGES (again both 'xx_YY'
 and 'xx' files for each). This would ensure that e.g. for Portuguese,
 Brazilian is available as fallback.

 English should of course always be downloaded.

 Cheers,
 FJP


 --
 To UNSUBSCRIBE, email to debian-i18n-requ...@lists.debian.org
 with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#448216: Support 3-letter codes for package description translations in APT (was: Re: Bug#448216: Waiting 2 years ago...)

2010-02-06 Thread Frans Pop
On Saturday 06 February 2010, Marcos wrote:
 Remember that some languages have a iso code 2, then they have 3
 letters, as Asturian (ast_ES or ast) :)

Sure. My xx, xx_YY examples should be read to also cover xxx and xxx_YY.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#448216: Support 3-letter codes for package description translations in APT (was: Re: Bug#448216: Waiting 2 years ago...)

2010-02-06 Thread Frans Pop
David Kalnischkies wrote:
 2010/2/6 Frans Pop elen...@planet.nl:
 A correct implementation of l10n support automatically falls back to the
 next best translation. The LANGUAGE environment variable can contain a
 list of languages to fall back to.
 Never saw this colon-syntax for fallbacks before, but a quick test
 suggests that the LC_MESSAGES variable which APT uses currently to get
 the language doesn't support this syntax?

Correct. But if you set MESSAGES and then run 'locale' you will see that it 
is listed besides all the LC_* variables, which shows that it is part of 
the official l18n system.

 APT tries to detect which language to download by inspecting LC_MESSAGES
 extract long (de_DE) and short (de) languagecode. The current code APT
 would download de in this case, but de_DE if it is defined in the
 whitelist. My patch currently downloads (in LANG=de_DE) en and de
 unconditional and de_DE if it would be included in the whitelist (plus
 whatever Acquire::Languages lists as well).

OK. That's a reasonable start.

But a whitelist, besides potentially ignoring the user's preferences, is a 
rather unmaintainable solution in the long run: it will always lag behind 
translation efforts and translators will probably not even be aware that 
they might need to request an addition to the whitelist.

 So additional needed is that APT switches to LANGUAGE and supports colon?

No, it cannot switch. It should consider LANGUAGE *in addition to* 
LC_MESSAGES. Remember LANGUAGE isn't always set.
 
 The only problem i can see with this is, that the acquire method is
 currently thick as a brick: It doesn't check if the file is listed in
 corresponding Index file (this index isn't even downloaded) and

That's bad. IMHO it really should download the index and parse it for 
supported translations. IIUC the current system means 404s for any 
language that doesn't have translated descriptions. I would guess that's 
most of the languages supported in Debian Installer...

 will try to download all translations, so
 - for Northern Sami: LANGUAGE=se_NO:nb_NO:nb:no_NO:no:nn_NO:nn:da:sv:en
 would generate 10 requests (for every component) resulting
 in 6 with a 404 response (if we assume LongDesc becomes
 real and therefore en exists).

That's why I suggested taking only the first 2 or 3 from LANGUAGE, not the 
whole list. It's reasonable IMO to compromise between download needed and 
using the whole list, but it's not reasonable to completely ignore 
fallbacks the user has defined.
 
 I don't know if this has visible side effects beside being silly,
 but i believe this will be unfixable for APT without a (more or less)
 rewrite of the acquire system (as Translations seems to be
 implemented as a hack in the current version already) and
 that this will not happen for squeeze… (not even started).

Understandable. But it would be good to have a description of a proper 
implementation on the ToDo list.
 
 So as ugly as this whitelist is i guess we need it to save the
 mirrors from a lot of silly requests ~ luckily in a stable release
 the list shouldn't vary to much… ?

I would expect not.

Cheers,
FJP



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#448216: Support 3-letter codes for package description translations in APT (was: Re: Bug#448216: Waiting 2 years ago...)

2010-02-05 Thread Christian PERRIER
Quoting Marcos (marcoscosta...@gmail.com):
 Hi!
 Is more complicate this bug?
 We're waiting more of 2 years ago.
 Best regards!


APT has tons of bugs and very few people taking care of it.

It is quite likely that this bug is easy to fixfor someone who
knows about C code.

I agree that it may be frustrating for the Asturian translator to be
able to do package descriptions translationsbut not use it.

Would anyone in the i18n crowd volunteer to look at APT source code
and try providing a patch? I'm sure that APT maintainers would quickly
include it.




signature.asc
Description: Digital signature