Re: [native-lang] Status update season!
Christian Lohmaier wrote: Swedish dictionary (which is from 2003, but almost unchanged since 1997) has 24,489 basic forms and expands to 118,270 variations. This is clearly inferior. So here you see another problem. If a language has lots of variations of a single word, how can you judge that not 12000 of the expanded words are based on useless words (not in widespread use, hiding typos,...) or the other way round: You cannot tell that the important ones are present. What I can tell you is that it is impossible to cover the important words in Swedish if your expanded list only contains 118,270 words unless they were hand-picked, and I know they are not. You seem to be of the opinion that this task is impossible, but it is not. I don't have to arrive at complete knowledge, I only have to outsmart the competition. So far, Microsoft and other vendors can say that their product is robust, well researched and based on science, while the spell checker in OpenOffice is based on wild guessing and general cluelessness. The buyer/procurer doesn't have anything to counter such statements. All I need is a sufficiently strong argument to counter whatever Microsoft is saying. It's like the two people walking on the savanna of Africa when a lion comes up behind them. They start to run, but one of them says: it's no idea, we can never outrun the lion. The other answers: I don't have to outrun the lion, I only have to outrun you. Now, in fact it isn't Microsoft that's giving OpenOffice a bad name. It's the Swedish branch of Sun Microsystems that on their web pages for StarOffice claim that this product contains a professional grade Swedish spell checker, while OpenOffice has one created in web forums that cannot really be trusted. I do take offense, and I think this strategy is unwise of Sun, because it gives them enemies in places where they ought to be looking for friends. But I think I can outrun Sun here. It's they who are pushing me into this fight. And we all know the lion. -- Lars Aronsson ([EMAIL PROTECTED]) Aronsson Datateknik - http://aronsson.se - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
On Thu, Dec 28, 2006 at 01:16:37AM +0100, Lars Aronsson wrote: Christian Lohmaier wrote: Swedish dictionary (which is from 2003, but almost unchanged since 1997) has 24,489 basic forms and expands to 118,270 variations. This is clearly inferior. So here you see another problem. If a language has lots of variations of a single word, how can you judge that not 12000 of the expanded words are based on useless words (not in widespread use, hiding typos,...) or the other way round: You cannot tell that the important ones are present. What I can tell you is that it is impossible to cover the important words in Swedish if your expanded list only contains 118,270 words unless they were hand-picked, and I know they are not. You seem to be of the opinion that this task is impossible, but it is not. The topic was: Measuring the quality/comparing the quality of dictionaries. Having a tag x% completed or something. My point is: You cannot judge it by the number of words alone (apart from telling that a dictionary is really bad). But unfortunately the topic seems to drift away becasue of misunderstandings and/or misinterpretations :-( ciao Christian -- NP: Pantera - Walk - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[native-lang] Re: [lingu-dev] Re: [native-lang] Status update season!
Marcin Miłkowski wrote: If we're into statistics, then the Polish dictionary has something like 3.5 million expanded forms, and about 300.000 base forms. The quality of the dictionary is excellent. [...] I recommend this kind of technique to all l10 teams and dict developers. Look at www.kurnik.pl to see how the site is managed, and in www.kurnik.pl/dictionary there is some info on the dict. This is excellent, but we'd have to learn Polish before we understand the full details of your method. Is there any text in English (or French or German) that describes how this initiative was started and what problems it has met, and how these problems were overcome? That's the kind of guideline that would be useful from Peru to Kazakstan, and from Greenland to Malawi. The Swedish scrabble community has a policy to use the dictionary of the Swedish Academy (SAOL), which unfortunately is copyrighted and not available for free download. -- Lars Aronsson ([EMAIL PROTECTED]) Aronsson Datateknik - http://aronsson.se - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Re: [lingu-dev] Spell checking metrics; was:[native-lang] Status update season!
On 23/12/2006, at 3:56 AM, Lars Aronsson wrote (in conclusion): Another idea is to make OpenOffice.org report all corrections made by users worldwide to some centralized database. I guess this would conflict with users' interest in their own privacy. Well, it might. Unless anybody wanted to win a Most typos for this year award. ;) Actually, I could win that. :D Nowadays, all my fingers are thumbs, and my thumbs don't know how to spell. from Clytie (vi-VN, Vietnamese free-software translation team / nhóm Việt hóa phần mềm tự do) http://groups-beta.google.com/group/vi-VN PGP.sig Description: This is a digitally signed message part
Re: [native-lang] Status update season!
Hi, Thanks for the information. I suppose my overall question is, Can we use this dictionary with OpenOffice.org? yes, but you can't (yet) distribute it together. This is the purpose of this issue. Ask on [EMAIL PROTECTED] how to add your dictionary into DicOOo... -- Pavel Janík [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
Andrea, Andrea Pescetti a écrit : On 12/12/2006 Charles-H.Schulz wrote: ... a status update on your project would be of course very nice! Here you are. This is a status update for the Italian Native-Lang project (PLIO) in the period September-December 2006. Thank you a lot for this detailed update. My congratulations and my personal wishes of Christmas and New Year go to the Italian Native-Language community. Congratulations to both of you, Davide and Andrea! Regards and thank you, Charles; - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
Thanks, Pavel. I have now subscribed to my 15th OpenOffice.org mailing list. Great - it will help you to get faster answers from people actually responsible for their part of work. /me sometimes thinks that this list is a bit more like [EMAIL PROTECTED] or at least that some people want it to be as such ;-) -- Pavel Janík [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[native-lang] Re: [lingu-dev] Spell checking metrics; was:[native-lang] Status update season!
eleonora46 wrote: If both the above are true, then the spell checker did a really good work. Did you try to compute these numbers for your own German dictionary, and compare it to the other German dictionaries from Björn Jacke or Franz Michael Baumann? German is one of few languages where more than one free dictionary is available, so it could be a good test case. Since you continue to work in parallel, I guess each of you are convinced that you do a better job than the others? How do you measure or compare this? German is a good test case also for another reason: Many people in Europe (such as me) know it as their 3rd language, after their native language and English. The recognition of obscure words is more the area of grammar checkers, they should mark obscure words being similar to often used, mispelled words. This note on obscure words connects to what Kevin wrote: cases, like the obscure word yor in English, should clearly not be included since they are most likely to be a misspelling of a common word. It seems we would need statistics on how common yor (or should that be yore?) is in its right use and how common it is as a misspelling of your (or you're). It is easy enough to find statistics on word frequencies, but how or where can we find stats on errors? A simple Google search finds 2.59 billion your and 4.17 million yore, but I cannot tell which of the yore occurrences are errors. There are also 4.37 million (!) hits for yor but they seem to be a film title, a surname, various company names and the ISO language code for Yoruba. The first obvious error usage I find is all yor base r blong 2 us, which is apparently stylistic and not a mistake. One idea for finding stats on errors is to compare changes made to Wikipedia articles. The complete text revision history is available from download.wikimedia.org. All you need is to step through the changes and make statistics for all the small changes such as yor being changed to your. Has anybody done this? Another idea is to make OpenOffice.org report all corrections made by users worldwide to some centralized database. I guess this would conflict with users' interest in their own privacy. -- Lars Aronsson ([EMAIL PROTECTED]) Aronsson Datateknik - http://aronsson.se - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
Christian Lohmaier wrote: This could also mean that these are just dumb wordlists that don't make use of affix transformations. Not really suitable for comparing quality then, even when the languages are closely related. This is not the case, though. Swedish, Danish and Norwegian are closely related and have the same language structure. An expanded wordlist is 5...6 times longer than a well-compressed one using the right ispell flags. That factor is smaller for English and German and a lot larger for Finnish and Hungarian. The current German dictionary maintained by Björn Jacke has 80,000 basic forms which expand to 300,000 variations, for a factor of 3.75. Swedish/Danish/Norwegian have the same way to form basic words (with compounds) as German. Basic words can often be translated syllable by syllable, so the number of basic forms should be about the same. But the Scandinavian languages use endings instead of the definite article (the/der/die/das), resulting in a larger number of expanded variations. The current da_DK.dic has 108,400 basic forms and expands to 380,199 variations. The two versions of Norwegian have 133,242 (nb_NO) and 102,578 (nn_NO) basic forms, respectively, and expand to 556,600 and 295,306 variations. However, the currently used Swedish dictionary (which is from 2003, but almost unchanged since 1997) has 24,489 basic forms and expands to 118,270 variations. This is clearly inferior. Of course, if the Swedish dictionary contained 24,000 relevant words and the other languages had many highly specialized words which are only rarely used, we'd still stand a chance. However, this is not the case either. Fortunately, my friend who maintains the Swedish dictionary has recently published a new version (DSSO 1.22) that expands to 242,611 variations, so he's making great progress. I hope this will be included in future versions of OpenOffice.org. We're catching up on the Danes and Norwegians, but they are still ahead. Yesterday I found this paper by two Hungarian authors, who discuss Zipf's law and the minimum number of words in a dictionary required to cover some percentage of a given corpus of text, http://www.nslij-genetics.org/wli/zipf/nemeth02.pdf Their most important observation is that a decent spelling dictionary needs to contain 20,000 words (variations) for English and 80,000 words for German, but 400,000 for Hungarian. The right number for Scandinavian languages should thus be somewhere between 80,000 and 400,000. However, that is only counting the most frequent words from a language. When I add home to an ispell/hunspell dictionary, I also add homes and homely because of how the flags work, even though homely isn't necessarily among the very common and relevant words. So I add a lot of less relevant words, which don't contribute much to the dictionary's usefulness. When I add one basic word and thus 5..6 variations (for Swedish), perhaps I only add 2..3 useful variations. It is hard to know just how much the numbers are inflated. I don't think there is a way to measure this at all. You feel that it is good or bad, but you cannot really measure it. You can give examples, but that's about it. (IMHO) In the case of OpenOffice.org, what really matters is what people feel about Microsoft Word's spell checker. If that was really useless, we wouldn't have to bother. But now we have to bother. -- Lars Aronsson ([EMAIL PROTECTED]) Aronsson Datateknik - http://aronsson.se - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[native-lang] Re: [lingu-dev] Re: [native-lang] Status update season!
Hi Lars, and all, The current German dictionary maintained by Björn Jacke has 80,000 basic forms which expand to 300,000 variations, for a factor of 3.75. Swedish/Danish/Norwegian have the same way to form basic words (with compounds) as German. Basic words can often be translated syllable by syllable, so the number of basic forms should be about the same. But the Scandinavian languages use endings instead of the definite article (the/der/die/das), resulting in a larger number of expanded variations. If we're into statistics, then the Polish dictionary has something like 3.5 million expanded forms, and about 300.000 base forms. The quality of the dictionary is excellent. How was that achieved? Simple, set up a local scrabble-like community and develop a scrabble dictionary using scrabble players linguistic competence. It's incredibly efficient. Then you simply tweak the Scrabble dict to your needs (like removing rare and confusing forms). I recommend this kind of technique to all l10 teams and dict developers. Look at www.kurnik.pl to see how the site is managed, and in www.kurnik.pl/dictionary there is some info on the dict. Best regards, and happy holidays, Marcin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
Hi, On 12/21/06, Meelad Zakaria [EMAIL PROTECTED] wrote: have prefix be and use it with nouns, like benom, bekitob, It works for your examples and many others, but surely not for every noun. Consider nouns sarmaa (means cold) or shanbeh (means Saturday), bi-sarma and bi-shanbeh are simply wrong. You won't use it on every word. That's what the flags are for. You define a rule, call it a and mark in the wordlist all words that it applies to with /a. That's how it works. The exceptions are not marked with /a, maybe you want to apply rule /b to them. Maybe you should have a look at an English .dic and .aff file, to get the basic concept. Or at the Kurdish ones ;) Regards, Erdal - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [native-lang] Status update season!
On Wed, 2006-12-20 at 10:11 +0500, Murod Latifov wrote: Hello Milad, Regards :) Persian and Tajik have many similarities e.g. grammar, except alphabet. Persian uses Arabic letters Tajik uses Cyrillic. I divided words into groups by mean of type of pref. suff. they accept (say noun and inside noun special sub rules for particular type of nouns). I made some progress with Tajik grammar, it works fine, we need to add dictionary entries. I can read and write Tajik in Persian. I also created small OOo writer script that converts Tajik words into Persian, which means letters are converted to Arabic and word can be read by Persian speaker. As you know in Persian there are more than one letter associated to some sounds than one, for example the s sound can be produced by 3 different letters, and with replacing these letters, sometimes different words with different meanings are produced. So the converter you wrote, unless you use a database with one to one relation between Cyrillic and Persian words, the result may be readable for Persian readers but are not essentially true. This is too simple conversion with no proper logic of typing words in either language. I think our collaboration will give more results, than doing it separately. Also we lack presence of computer terminology, it would be great if we could have some sort of discussions in this regards. There are some efforts in this regards by Academy of Persian Language and other sources, and there are some books published in this respect, though none of them are considered final yet. You can call me on Scype [EMAIL PROTECTED] and we speak Persian. I wrote to you once before but there was no reply. I'm really sorry about that, I must have messed something up. I'm looking forward to that... Luck, -mee - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
On Wed, 2006-12-20 at 01:48 +0100, Lars Aronsson wrote: OpenOffice.org currently uses Hunspell. It's dictionary format is compatible with the Myspell used earlier, and very similar to the older Aspell and Ispell. There is a fa-demo to download from http://wiki.services.openoffice.org/wiki/Dictionary that contains 332554 words in Persian. According to the included README file, this is the same dictionary as is listed on www.aspell.net for GNU Aspell 0.60, maintained by Babak Mahmoudi and Edwin Hakopian. That dictionary is pretty outdated, and in various cases, contains several wrong spellings, and even wrong words. The included fa.aff file is empty. Perhaps affixes are not applicable to Persian? I have no idea. Affixes do apply in Persian, but some combinations though are correct by grammatical logic, are considered wrong because they have never been used in Persian texts. So auto-generated lists don't work and we should use a flat wordlist with all non-eligible words excluded. For most languages, there is only one dictionary, maintained by a single person or a small team, who produces packages adopted for each spell checking software. In Persian, there has been different approaches to in writing and form of some words has changed quite differently in recent 10 years, and people out there mostly use their own taste on choosing the written form. There is The Academy of Persian Language which is considered the governing body of such problem, and it's been working on generating a unified wordlist for acceptable words, and there are some new laws to oblige official documents to use this version, so the thing to do is to use this list for the spellchecker. My dear friend Roozbeh Pournader is working on a spellchecker based on Academy list and checked manually for aspell, and I hope to use that for oo as well. Thanks, -mee - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/19/2006 10:15 AM, Charles-H.Schulz wrote: I also know that you met some people from the DGME (France) recently. Keep on the great work and congratulations to you and all the PT team! Yes. We've organized the first Public Administration OpenSource event, where Patrice Posez explain their project. The Numbers are quite impressive. Here's his presentation: http://paradigma.pt/~vd/wlog/images/eventosoftwarelivre/patrice_posez.odp, also Pascal Drabik from the EC was here (http://paradigma.pt/~vd/wlog/images/eventosoftwarelivre/pascal_drabik.odp), Karl Sarnow from the EU Schoolnet (http://paradigma.pt/~vd/wlog/images/eventosoftwarelivre/karl_sarnow.odp) and Antonella Fresa from the MICHAEL project (http://paradigma.pt/~vd/wlog/images/eventosoftwarelivre/antonella_fresa.odp). - -- Vitor Domingos Paradigma.pt -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFh7/gzLeQsaqPtNIRAqK4AJwMX3oRUN0MrLTkR6CAHW5jdIOo+ACfZIuY h4rPoDw3svCETNHaCX+CIt8= =5uqD -END PGP SIGNATURE- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
On Mon, 2006-12-18 at 17:53 +0100, Charles-H.Schulz wrote: Meelad, thank you for this update. If you have trouble with the RTL aspect of the things, perhaps you could get in touch with the Arabic and the Hebrew projects? I've already done that and I will, but of course there are some common bugs in all of these projects. Furthermore, I'd like to start some spellchecker dictionary for Persian. Any idea where i can find a spec for such dictionaries in oo? Many thanks, -mee - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
Meelad Zakaria wrote: Furthermore, I'd like to start some spellchecker dictionary for Persian. Any idea where i can find a spec for such dictionaries in oo? OpenOffice.org currently uses Hunspell. It's dictionary format is compatible with the Myspell used earlier, and very similar to the older Aspell and Ispell. There is a fa-demo to download from http://wiki.services.openoffice.org/wiki/Dictionary that contains 332554 words in Persian. According to the included README file, this is the same dictionary as is listed on www.aspell.net for GNU Aspell 0.60, maintained by Babak Mahmoudi and Edwin Hakopian. The included fa.aff file is empty. Perhaps affixes are not applicable to Persian? I have no idea. For most languages, there is only one dictionary, maintained by a single person or a small team, who produces packages adopted for each spell checking software. I'm currently trying to improve the Swedish dictionary, which is maintained by a friend of mine, so I'm looking for ways to compare the quality of different dictionaries, and various methods used for maintaining them. The naive approach would be to complain the dictionary doesn't contain words X, Y, and Z, to which the reply would be so, add them. However, this is a never-ending task. The more words I add, the more I discover to be missing. Just adding words to a dictionary is not so important, as the spell checker's ability to help its user to avoid mistakes. But how is this measured? When I compare Swedish to some other closely related languages, I find their dictionaries are much larger than the Swedish one, and this is one useful measure for me. But that doesn't help me to compare the quality of the Swedish dictionary to the one for Persian. On the page http://wiki.services.openoffice.org/wiki/Translation_Statistics there is an indication that 57% of the GUI for OpenOffice.org is translated to Arabic and 62% to Persian. We could add a column in that table to tell us the quality (from 0 to 100%) of the spelling dictionary for each language, but how would we measure this? -- Lars Aronsson ([EMAIL PROTECTED]) Aronsson Datateknik - http://aronsson.se - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
On 12/19/06, Meelad Zakaria [EMAIL PROTECTED] wrote: On Mon, 2006-12-18 at 17:53 +0100, Charles-H.Schulz wrote: Meelad, thank you for this update. If you have trouble with the RTL aspect of the things, perhaps you could get in touch with the Arabic and the Hebrew projects? I've already done that and I will, but of course there are some common bugs in all of these projects. Furthermore, I'd like to start some spellchecker dictionary for Persian. Any idea where i can find a spec for such dictionaries in oo? Hi Meelad, I have developed the Kurdish (Kurdish) MySpell dictionary and have been planning for a long time to work on a Sorani version. That will have a lot in common with a Persian dictionary - also the encoding problems with UTF-8. Please stay in touch with me on the issue. If you need help with the affixes I will be glad to help you (although my knowledge of Persian is very poor). Regards, Erdal - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [native-lang] Status update season!
Hello Milad, Persian and Tajik have many similarities e.g. grammar, except alphabet. Persian uses Arabic letters Tajik uses Cyrillic. I divided words into groups by mean of type of pref. suff. they accept (say noun and inside noun special sub rules for particular type of nouns). I made some progress with Tajik grammar, it works fine, we need to add dictionary entries. I can read and write Tajik in Persian. I also created small OOo writer script that converts Tajik words into Persian, which means letters are converted to Arabic and word can be read by Persian speaker. This is too simple conversion with no proper logic of typing words in either language. I think our collaboration will give more results, than doing it separately. Also we lack presence of computer terminology, it would be great if we could have some sort of discussions in this regards. You can call me on Scype [EMAIL PROTECTED] and we speak Persian. I wrote to you once before but there was no reply. Regards, Murod Latifov, Tajik NL Project Lead. -Original Message- From: Lars Aronsson [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 20, 2006 5:48 AM To: dev@native-lang.openoffice.org Subject: Re: [native-lang] Status update season! Meelad Zakaria wrote: Furthermore, I'd like to start some spellchecker dictionary for Persian. Any idea where i can find a spec for such dictionaries in oo? OpenOffice.org currently uses Hunspell. It's dictionary format is compatible with the Myspell used earlier, and very similar to the older Aspell and Ispell. There is a fa-demo to download from http://wiki.services.openoffice.org/wiki/Dictionary that contains 332554 words in Persian. According to the included README file, this is the same dictionary as is listed on www.aspell.net for GNU Aspell 0.60, maintained by Babak Mahmoudi and Edwin Hakopian. The included fa.aff file is empty. Perhaps affixes are not applicable to Persian? I have no idea. For most languages, there is only one dictionary, maintained by a single person or a small team, who produces packages adopted for each spell checking software. I'm currently trying to improve the Swedish dictionary, which is maintained by a friend of mine, so I'm looking for ways to compare the quality of different dictionaries, and various methods used for maintaining them. The naive approach would be to complain the dictionary doesn't contain words X, Y, and Z, to which the reply would be so, add them. However, this is a never-ending task. The more words I add, the more I discover to be missing. Just adding words to a dictionary is not so important, as the spell checker's ability to help its user to avoid mistakes. But how is this measured? When I compare Swedish to some other closely related languages, I find their dictionaries are much larger than the Swedish one, and this is one useful measure for me. But that doesn't help me to compare the quality of the Swedish dictionary to the one for Persian. On the page http://wiki.services.openoffice.org/wiki/Translation_Statistics there is an indication that 57% of the GUI for OpenOffice.org is translated to Arabic and 62% to Persian. We could add a column in that table to tell us the quality (from 0 to 100%) of the spelling dictionary for each language, but how would we measure this? -- Lars Aronsson ([EMAIL PROTECTED]) Aronsson Datateknik - http://aronsson.se - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
On 20/12/2006, at 11:18 AM, Lars Aronsson wrote: Furthermore, I'd like to start some spellchecker dictionary for Persian. Any idea where i can find a spec for such dictionaries in oo? OpenOffice.org currently uses Hunspell. It's dictionary format is compatible with the Myspell used earlier, and very similar to the older Aspell and Ispell. There is a fa-demo to download from http://wiki.services.openoffice.org/wiki/Dictionary that contains 332554 words in Persian. According to the included README file, this is the same dictionary as is listed on www.aspell.net for GNU Aspell 0.60, maintained by Babak Mahmoudi and Edwin Hakopian. The included fa.aff file is empty. Perhaps affixes are not applicable to Persian? I have no idea. For most languages, there is only one dictionary, maintained by a single person or a small team, who produces packages adopted for each spell checking software. snip details about Swedish dictionary Thankyou for bringing this up, Meelad and Lars. :) I have noticed that our Vietnamese builds of OpenOffice.org do not include a spellchecking dictionary, so we can't use that facility. There is an existing Aspell dictionary for Vietnamese. I use that myself. Can that be used for OpenOffice.org? http://ftp.gnu.org/gnu/aspell/dict/ Spellchecking in Vietnamese is currently very much limited by the word boundaries issue: our language is composed of monosyllabic words, so there aren't that many single words which the spellchecker can spot as _not_ being words. Correct usage depends on combinations of words, as many terms are composed of two words. Defining these boundaries is a complex task. Nguyễn Thái Ngọc Duy (call name, Duy), my team leader at our GNOME project, is currently developing spellchecking software specific to Vietnamese, which we hope will be compatible with existing major software, but it's only in the early stages as yet. So, meanwhile, can we start using our Aspell dictionary? At least it spots the worst typos, and as a champion typo-producer, I appreciate that. ;) from Clytie (vi-VN, Vietnamese free-software translation team / nhóm Việt hóa phần mềm tự do) http://groups-beta.google.com/group/vi-VN PGP.sig Description: This is a digitally signed message part
Re: [native-lang] Status update season!
On 15/12/2006, at 6:27 PM, Muguntharaj Subramanian wrote: Hi Charles, This update from Tamil Team. We havent done much. But have started becoming active. 1. We have started submitting language files(even though thye are not complete) regularly on time starting from OOo 2.0.3 onwards. So we do have OOo2.1 build for tamil. snip Congratulations to the Tamil team on their progress! :) I have found these update reports to be really fascinating, a look behind the scenes of the different language projects. Could we perhaps maintain a wiki page that summarized current status for each project? Or perhaps link from Rafaella's new stats page: http://wiki.services.openoffice.org/wiki/Translation_Statistics each language name being a link that leads to a separate page, in English, briefly summarizing the current status of each project? Quite apart from general interest, I think there would be people who don't speak the local language, but could want to know the status of localization of OOo for that language: business networks, educational and other development organizations, and our l10n and NLP co- ordinators. ;) from Clytie (vi-VN, Vietnamese free-software translation team / nhóm Việt hóa phần mềm tự do) http://groups-beta.google.com/group/vi-VN PGP.sig Description: This is a digitally signed message part
Re: [native-lang] Status update season!
On 15/12/2006, at 8:17 PM, Charles-H.Schulz wrote: I strongly suggest we have a basic Howto for l10n projects. It doesn't need to be long and complex, but it can include items like: This page that I wrote not such a long time ago seems to answer some of your needs, but more needs to be added to it: http://wiki.services.openoffice.org/wiki/NLC What do you think? It's a good NLP introduction, Charles. I think I sort of fell through the cracks, taking over an established NLP level 2 project, but having nobody left to tell me what to do. If it happened to me, it might happen to others. So I suggest making a general l10n page similar to yours, based at l10n.openoffice.org but also linked from your page, which summarizes the steps needed to participate in a language project. The reason I stress l10n is that translators entering the OOo project, if they have worked in other projects, are familiar with l10n and will be looking for i18n or l10n. NLP at that stage won't mean anything to them. The l10n page can introduce the OOo terminology. We do need an intro page which provides the type of information other translators are used to, if we want to attract them. If we make it clear that _every_ translator has to sign the JCA, for example, and join certain key lists and read key information, that will help. Then we need to extend your page, with one that shows the steps for achieving a released translation. The QA page, Release Action List for QA [1], is good, but how is a translator supposed to find it? If we link to it from l10n and NLP, we're making more effective use of our existing information. See the GNOME Translation Project wiki page [2], a very basic page which yet includes most things a new translator or language-team lead needs to know. We send every new translator there: it saves a lot of repeated explanation. Because the OpenOffice.org project is so large and diffuse, we need to have even more effective overall information processes than single projects, to pull the whole together, and give each part the access and abilities it needs. :) from Clytie (vi-VN, Vietnamese free-software translation team / nhóm Việt hóa phần mềm tự do) http://groups-beta.google.com/group/vi-VN [1] http://wiki.services.openoffice.org/wiki/Release_Action_List_for_QA [2] http://live.gnome.org/TranslationProject PGP.sig Description: This is a digitally signed message part
Re: [native-lang] Status update season!
Hello Clytie, Clytie Siddall a écrit : What do you think? It's a good NLP introduction, Charles. I think I sort of fell through the cracks, taking over an established NLP level 2 project, but having nobody left to tell me what to do. right. And besides it was geared towards you but to new NLP leads. If it happened to me, it might happen to others. So I suggest making a general l10n page similar to yours, based at l10n.openoffice.org but also linked from your page, which summarizes the steps needed to participate in a language project. Indeed. But it is not up to me to do that. Perhaps Rafaella would be interested in drafting a similar page as (http://wiki.services.openoffice.org/wiki/NLC) but for L10N? The reason I stress l10n is that translators entering the OOo project, if they have worked in other projects, are familiar with l10n and will be looking for i18n or l10n. NLP at that stage won't mean anything to them. The l10n page can introduce the OOo terminology. We do need an intro page which provides the type of information other translators are used to, if we want to attract them. If we make it clear that _every_ translator has to sign the JCA, for example, and join certain key lists and read key information, that will help. Then we need to extend your page, with one that shows the steps for achieving a released translation. The QA page, Release Action List for QA [1], is good, but how is a translator supposed to find it? If we link to it from l10n and NLP, we're making more effective use of our existing information. Yes, now it's all about connection:-) . I'll link some these pages to the NLC wiki. Perhaps some stuff on the QA and L10N pages could be necessary as well. See the GNOME Translation Project wiki page [2], a very basic page which yet includes most things a new translator or language-team lead needs to know. We send every new translator there: it saves a lot of repeated explanation. Ah, I beg to differ here. I browsed their site. Not only is it fairly slow to load, but I just don't find it clearer than the one of OOo. Of course it looks different and structures are different as well, but that really is not more understandable, at least to me Because the OpenOffice.org project is so large and diffuse, we need to have even more effective overall information processes than single projects, to pull the whole together, and give each part the access and abilities it needs. :) +1... Charles. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
I would look into providing a sort of check list: all major steps you need to perform to have a localized OOo product which can be distributed. What do you think? I strongly agree to that. Maybe we can work that out on a Wiki page together. Erdal - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
Rafaella, Rafaella Braconi a écrit : Hi Charles, Clytie and All, I'll be more than happy to work on a better overview for the localization process. Instead of documentation - which I think we do have - I would look into providing a sort of check list: all major steps you need to perform to have a localized OOo product which can be distributed. This is exactly what we need. All this made me realize that it is not because the info is somewhere on the site that it is accessible for everybody, especially for the newcomers, so clarity and pedagogy should be our concern... Best, Charles. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
On 17/12/2006, at 2:07 AM, Rafaella Braconi wrote: Hi Charles, Clytie and All, I'll be more than happy to work on a better overview for the localization process. Instead of documentation - which I think we do have - I would look into providing a sort of check list: all major steps you need to perform to have a localized OOo product which can be distributed. GREAT idea, Rafaella! :) And the checklist could link different steps to the more detailed docs. I don't think there's anything wrong with our docs, just that we don't necessarily know they're there or how to find them. We need overview statements, linked to the main pages, which then link us to the various support docs. from Clytie (vi-VN, Vietnamese free-software translation team / nhóm Việt hóa phần mềm tự do) http://groups-beta.google.com/group/vi-VN PGP.sig Description: This is a digitally signed message part
Re: [native-lang] Status update season!
Clytie, I strongly suggest we have a basic Howto for l10n projects. It doesn't need to be long and complex, but it can include items like: This page that I wrote not such a long time ago seems to answer some of your needs, but more needs to be added to it: http://wiki.services.openoffice.org/wiki/NLC What do you think? Best, Charles. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
Robert, A very big thank you goes to Pavel, again, for all his help and patience. Indeed. My thanks also go to Pavel for his support of the localization and his work on the community builds. Thank you for this update, Robert, and keep on the good job! Best, Charles. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
Hi Charles and all, On 12/12/06, Charles-H.Schulz [EMAIL PROTECTED] wrote: And for everybody here on this list, I know you are all focused on the 2.1 release, but a status update on your project would be of course very nice! We have achieved more than 90% of translation in both GUI and Help. Our homepage http://ne.openoffice.org/ has been recently updated with the latest download links. But we are yet to see our spellchecker and thesaurus dictionaries integrated in OOo source. We are hoping that these would be there in 2.2. or 2.1.x. resolving the following issues: #i70512#, #i70513#, #i72094# with the inclusion for Nepali fonts in helpcontent2 auxiliary. Also, yet to fully learn about QA process ;) Cheers, Subir Nepali NL project Lead - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
Hello Subir, thanks for your work. This leads me to wonder what your Tibetan neighbours are up to. Could they also provide us with a status update? Also, yet to fully learn about QA process ;) OK. I think I should write a simple document together with the QA project. I hadn't realized, before Clytie and you, how difficult this QA process was... Thank you again for your contributions, Charles. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
Murod, thank you for your update, Murod. Good luck for the next releases! Charles. Murod Latifov a écrit : Dear all, Finally me managed to have official version of OOo 2.1 available in Tajik. Web site was accordingly modified to direct users for downloads. At this release we have GUI 100% and help files 30% translated. Unfortunately our help files are not available for OOo 2.1 release, by some reasons. Translation of help files is work in progress, we are working on it and hope to finalize by the release of OOo2.2. This week I had a presentation in ICT conference in Tajikistan and announced official release of Localized OOo. That's it for now. Thanks, Murod Latifov Tajik NL project Lead -Original Message- From: Charles-H.Schulz [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 12, 2006 8:57 PM To: dev@native-lang.openoffice.org Subject: [native-lang] Status update season! And for everybody here on this list, I know you are all focused on the 2.1 release, but a status update on your project would be of course very nice! Cheers, Charles. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
Ar Máirt 12 Mí na Nollag 2006 09:56, scríobh Charles-H.Schulz: And for everybody here on this list, I know you are all focused on the 2.1 release, but a status update on your project would be of course very nice! The Irish team is in the middle of QAing the latest builds with the hope of releasing a version 2.1 very very soon. This will be the first release of OOo in Irish and we'll be contacting most of the news media outlets in Ireland, government ministers, schools, etc. The UI is about 97% translated but we haven't attempted the help yet. -Kevin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
Charles-H.Schulz pravi: And for everybody here on this list, I know you are all focused on the 2.1 release, but a status update on your project would be of course very nice! Hi Charles Slovenian team made it - we have now 100% localized OOo, meaning GUI, Help, Autocorrect, Autotext and templates (maybe I forgot something). We also succeeded to get spell check and hyphenation dictionaries under GNU/LGPL so they can be shipped with OOo distribution (from 2.2 on). One more very important thing is support for Slovenian Tolar (our currency) in Euro-converter wizard - we'll switch from Tolars to Euro on 1.1.2007 so I guess this will be very useful feature :-) With 2.1 we started (testing) QA for Linux, Mac and Windows. Plans for the future: - strengthen QA - create sl thesaurus - translating documentation A very big thank you goes to Pavel, again, for all his help and patience. Regards Robert Ludvik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
Hello Charles and everybody else :) On 14/12/2006, at 9:12 PM, Charles-H.Schulz wrote: VI: we would have released yesterday but... First, let me congratulate you and your team for the job you did for Vi.OOo. You may feel frustrated but for a first shot, it's really not bad at all. You'll know better for the next time. I certainly hope so! OOo may suffer from a lack of information, but I think that it is also because it is a complex software. I'm also partly responsible if you did not know there was a separate QA project here on OOo. Some information should be clarified I guess. I strongly suggest we have a basic Howto for l10n projects. It doesn't need to be long and complex, but it can include items like: • the need to sign and send in the JCA which I found out by asking questions, only two days before I had to submit our translation) • where QA fits in the release process which I found out too late, by asking questions, a day before release) • the mailing lists you need to join to keep track of l10n issues, e.g. [EMAIL PROTECTED] for announcements, especially about needed translations [EMAIL PROTECTED] for overall i18n issues [EMAIL PROTECTED] for l10n issues releases to see announcements of builds etc. announce to see pan-project announcements [EMAIL PROTECTED] to discuss translating the user survey and I'm probably missing quite a few. :( If someone who actually understood the whole i18n overview, sat down and imagined they knew absolutely nothing about OOo, and summarized what a new person _needs_ to know about OOo to achieve a release, that would be a huge help to all of us and especially those starting out. I've only got from one point to another in OOo by feeling my way around and asking a lot of questions. It's been very inefficient, and has taken up a lot of time I could have used to get the tasks done. And in the end, it meant a blocked release and major arrangements throughout Vietnam hanging on it. Because I didn't know what I was going. God knows I must have asked enough questions to find out. But I didn't find out. And really, we should have some basic information to avoid that sort of blundering around. As a comment however, I don't think that projects like KDE or Mozilla are less complex in their processes, at least it does not look like it on their web site. I have managed to avoid Mozilla so far, although the nets are closing in: the same friend who decoyed me into Ooo has referred the Mozilla people to me. ;) But I can speak with personal experience of KDE, GNOME, Debian, the TP and nearly a score of other free-software projects. No matter how complex the project may be, the i18n list concentrates all the information needed by translators. The minute you join the list, you are welcomed and referred to the Howtos. The list gets all announcements you need to see, including reminders about deadlines, checking and submission procedures. You have all the information you need. I myself write that type of welcome email several times a week. If you create a template (which I should do, and haven't got around to doing), you can do it fairly quickly, and not miss anything. Then you know the new entrant knows the basics. Thank you for your contribution and this first not so ready release, Clytie. Thankyou for your help and encouragement, Charles. :) I just wish I could have done a better job of it. I am trying to get over the shock of having no release where one was planned to occur, and to do both damage control and missing QA at the same time. However, I still need people to do the QA. sigh from Clytie (vi-VN, Vietnamese free-software translation team / nhóm Việt hóa phần mềm tự do) http://groups-beta.google.com/group/vi-VN PGP.sig Description: This is a digitally signed message part
Re: [native-lang] Status update season!
On 15/12/2006, at 4:23 AM, Kevin Scannell wrote: Ar Máirt 12 Mí na Nollag 2006 09:56, scríobh Charles-H.Schulz: And for everybody here on this list, I know you are all focused on the 2.1 release, but a status update on your project would be of course very nice! The Irish team is in the middle of QAing the latest builds with the hope of releasing a version 2.1 very very soon. This will be the first release of OOo in Irish and we'll be contacting most of the news media outlets in Ireland, government ministers, schools, etc. The UI is about 97% translated but we haven't attempted the help yet. Great work, Kevin and team! :) from Clytie (vi-VN, Vietnamese free-software translation team / nhóm Việt hóa phần mềm tự do) http://groups-beta.google.com/group/vi-VN PGP.sig Description: This is a digitally signed message part
Re: [native-lang] Status update season!
Hi Charles, This update from Tamil Team. We havent done much. But have started becoming active. 1. We have started submitting language files(even though thye are not complete) regularly on time starting from OOo 2.0.3 onwards. So we do have OOo2.1 build for tamil. 2. Have created a web interface for non-technical volunteers to help in localization easily. Its hosted at: http://translation.thamizha.com/main.php The files here are updated with the files from the latest milestone source. And tamil GSI files are derived from here periodically for submitting to openoffice. This web interface is getting popular among tamil community. Hope to get the community help to complete the tamil translations. Plans for future: 1. To complete the translations 100% 2. To create a complete spell checker for tamil. Already Hunspell supports tamil. Will improve on this and create a better spell checker for tamil. Regards, Mugunth -- http://mugunth.blogspot.com http://thamizha.com http://tamilblogs.com -- http://webspace2host.com Your friendly hosting provider
Re: [native-lang] Status update season!
Soren, thank you for your update. I actually thought that you had had the idea of resigning, but left it aside. You did a very good job, and it's a pity to see you go. But at least you're among us as long as nobody showed up to take the lead :-) ... I'll keep you updated on the Faroese and Greenlandish. Best, Charles. Søren Thing Pedersen a écrit : Dear all Charles-H.Schulz skrev i dev@native-lang.openoffice.org: And for everybody here on this list, I know you are all focused on the 2.1 release, but a status update on your project would be of course very nice! Good idea. Back in August after 3 years I signed off as NL lead on the Danish list and to Louis and Charles. However, I did not report it here on [EMAIL PROTECTED] because I hoped the successor was soon found. The status from the Danish team is that the demanding part of translating the GUI and help to Danish is complete and easily maintained. Thus the effort is spreading to other areas. - Translation is as updated frequently by the team. I keep my website for translation updated and I still subscribe to dev@l10n.openoffice.org and this list to make sure the Danish localization is perfect - including the nitty gritty locale settings. - QA is managed by Thomas Roswall that along with his team triads RCs for bugs - Danish Newsletter is now edited by Leif Lodahl - Translation of documentation is also coordinated by Leif - An ISO image is maintained by Jørgen Kristensen. Jørgen sent all MPs a cd which caused a good public debate. Currently the main obstacle for the Danish end user is the lack of an included and good dictionary for spell checking. Various solutions are being discussed. The Danish team is looking for an active NL lead, Marcon and website maintainer. I stepped down for many reasons and mostly for positive reasons. The time was right - OpenOffice is now a completely valid alternative for Danish users - and personal stuff. It has been fun and demanding but mostly fun. It is good to leave when the project is doing great. Now, I freely roam the project as a less active contributor and issuezillian. Please don't hesitate to contact me. Thank you and see you around. Best regards Søren PS. Ping me if any Greenlandish or Faroese group wants help with the translation. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [native-lang] Status update season!
On 13/12/2006, at 2:26 AM, Charles-H.Schulz wrote: And for everybody here on this list, I know you are all focused on the 2.1 release, but a status update on your project would be of course very nice! VI: we would have released yesterday but... we didn't know that the entire QA process had to be completed for all builds in order to release. In fact, had I not stumbled onto the QA list to ask questions about an issue, some weeks back, we wouldn't have even known there _was_ a separate QA project. QA is, of course, an integral part of any i18n project. It is essential to make sure the translation is checked, reviewed, builds, installs and runs in production. So our release schedule allowed for all that, and we completed it ourselves. We translated, reviewed, checked, submitted, then tested our builds in production. We were all set to release our language yesterday. Since I've volunteered for the QA project, I've been trying to understand the QA tools. I haven't got anywhere with testtool yet, but I've been using TCM, and have got part-way through TCM for Mac Intel. However, nobody else in my team has enough time and English combined to work out how to use these tools. My original plan was to work them out myself, between this release and the next, then write a Howto in our language. I've also talked with the other QA people about translating the TCM interface into Vietnamese. We (VI) had absolutely no idea that all these formalized QA procedures had to be completed in order to release our translation. I had actually begun worrying about release procedures, since on any other project, I would have seen posts leading up to release, reminding i18n volunteers of the deadlines and other tasks to be completed in that time. Nothing was said. I kept re-reading the release schedule, instructions for committing translations etc. but couldn't find anything else we had to do. I even asked on the QA list exactly where QA fit in with release. I finally got an answer the day before release. The formalized QA procedures are a prerequisite for release. I am deeply concerned about the lack of overall information provided to people entering the OOo project. This is the only project with which I'm involved, where someone entering the project, posting for the first time on a mailing list and saying Here I am, and this is what I hope to do, or Here I am, what should I do? _doesn't_ get a response of this kind: Welcome X! Our aims are Y [link to details]. Z [link] is our homepage. Please read the Howtos and other background information here. [link(s)] In order to participate, you need to do A, B, C ... I write this sort of mail myself several times a week on different projects' lists. But in OpenOffice.org, you are just left blundering around in the dark. There are occasional pointers, like Rafaella's blessed post about the release schedule (one group of facts to which I have clung throughout these confused months), but in general, it seems that people _assume_ you know all the things you need to know. Each time I trip over yet another thing I was supposed to know, and didn't, because I was never told or pointed to it, I feel even more confused. And if I'm blundering around in the dark, how does this affect my team? They have trusted me to get them to release, after years of failure and public disgrace. We worked out a release schedule, took into account everything we knew and everything I could find out by asking a lot of questions, but in the end, I didn't ask the right questions. I've tripped over one too many things in the dark, and I've taken my team with me. This looks really bad from our community POV. The release which was scheduled for yesterday, and for which we were completely ready, hasn't occurred. All I can say to my team and other stakeholders is, I didn't know about this major prerequisite everyone else knows about. I've let them down so badly. This was supposed to be our first release. We had a whole promotion campaign waiting in Vietnam, focussed on releasing yesterday. The Department of Science and Technology in Vietnam, several different universities in Vietnam, Intel and their business and training network throughout Asia, NGOs like Oxfam and AUF, the user network (VietLUG, Hanoi LUG etc.) are all set up and waiting to promote the use of OpenOffice.org throughout Vietnam. We have grants to stamp CDs. And it all falls apart, because I didn't know what I was supposed to do. So our status is just derailed and discouraged. We don't know what to do from here. All those organizations and people expecting the release, and we can't release. I'm the only one who understands any of the QA tools, and I understand very little as yet. I use a Mac. Most of our users are on Windows and Linux. We've done a lot of production testing, but we haven't done the