On So, 2007-06-10 at 14:29 +0930, Clytie Siddall wrote:
> Hi everyone :)
Hi Clytie
>
> Thanks for your reply, Leonardo.
>
> On 09/06/2007, at 7:32 AM, Leonardo Fontenelle wrote:
...
> I've added quite a few items to Youssef's files, but also removed
> some. Some of the removed items were so technical that they belong
> more in FOLDOC or Wiktionary than in a vocab file. Others can simply
> be composed from base translations ("activity window" = "activity" +
> "window", in the order appropriate for the language), while yet
> ithers were simply other parts of speech for the same meaning.
We have not yet widely advertised the terminology project, so if you can
help us provide a POT file that is more meaningful for the first 1000
terms that people should work on, we should really consider that along
with all the input from Youssef and whoever else gets involved. In
future, we would like to make the terminology something that can either
be downloaded as a separate project and be integrated with Pootle, or as
part of the official Pootle package.
We're not quite there yet.
>
> I'm hoping Pootle will also search the comment fields for the
> terminology files. Can it do that, Pootle admins?
>
> This would save space, since it means we can create items like:
> ___
>
> # verb; adjective ; noun is \"completion\"; past tense is \"completed\"
> msgid "complete"
> msgstr "hoàn thành; hoàn tất
>
> (comment field in the translators' language, v.d.
>
> # động từ ; tính từ ; danh từ \"completion\"; thì quá
> khứ \"completed\"
> msgid "complete"
> msgstr "hoàn thành, gõ xong; hoàn tất
>
> )
>
> In this string, the noun and adjective are the same word in English,
> so we translate the same item in two different ways, separated by a
> semi-colon.
>
> Multiple translations for the same part of speech are separated by a
> comma. Here, there are two ways of translating the verb "complete".
>
> In the comments field, the two types of translation are identified as
> "verb; adjective"; then the noun is quoted, and the past tense.
>
> This means only having to create _one_ string, to handle several
> different ways of expressing or translating the root word.
>
> That will reduce the size of the terminology files, and their
> combined size.
>
> Can the terminology feature in Pootle also search the comment field?
In many regards I agree with what you say, but I don't know if searching
the comments is the ideal way, since the effectiveness then depends on
the people leaving the comments (it can be done in the POT, but still).
Currently, we try to do some _really_ simple morphological manipulation
of the source string and source glossary term to find more matches. For
example, the following all works (forgive the code, but I'm too lazy to
reformat). The first is the translation string, the second is the entry
from the glossary:
test_brackets(self):
"""Tests that brackets at the end of a term are ignored"""
assert termmatcher.similarity("Open file", "file (noun)") > 75
assert termmatcher.similarity("Contact your ISP", "ISP (Internet Service
Provider)") > 75
test_past_tences(self):
"""Tests matching of some past tenses"""
assert termmatcher.similarity("The bug was submitted", "submit") > 75
assert termmatcher.similarity("The site is certified", "certify") > 75
"""Tests that we can match with some spacing mismatch"""
assert termmatcher.similarity("%d minutes downtime", "down time") > 75
"""Tests that we can match with some spacing mismatch"""
assert termmatcher.similarity("You can preorder", "pre-order") > 75
assert termmatcher.similarity("You can pre-order", "pre order") > 75
assert termmatcher.similarity("You can preorder", "pre order") > 75
assert termmatcher.similarity("You can pre order", "pre-order") > 75
Also things like "category" will be found in "%d categories" and so on.
I think it might be more useful to work on improving this code and
giving people good guidance on how to optimally choose the glossary
words for maximum effect.
Pootle displays the comments in the tooltip of the suggested glossary
entry, so the information is not entirely lost. The target field is also
freeform, so one could even add the information in the target
translation if you really needed to (like the example "file (noun)"
above). As Leonardo also suggested, msgctx is probably the proper way to
do it, although it won't be clear then from the tooltip, perhaps.
The one reason why we might want to keep the target field as a single
translated term is to limit the consumed space on screen and to simplify
the use of it. I fixed bug 187 which will allow users of Pootle 1.0.1 to
copy a suggested term to the translation field. In such a case multiple
senses of a word and/or disambiguating information will only make such a
feature less useful.
> I've completed (sic) the first three files (A, B, C), but thought I
> should check how Pootle will handle the files, before I go on.
>
> It would also be good to have a central site/repository where we can
> store these terminology files, so they can be updated readily, and so
> if the person maintaining them for that language becomes unavailable,
> others can still access the files.
We can probably do this on pootle.wordforge.org if we can manage the
added workload of people asking for access rights, etc. I know you have
the admin rights for the terminology project on that server for
Vietnamese, so you could say who can translate / suggest / review, etc.
You can also upload new files. If we have good POT files to start with
and more people want to administer larger projects around that, we can
look at assigning more fine grained rights for those languages as well.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Translate-pootle mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/translate-pootle