Re: Adding Extension for Experimental Thai Spelling

2012-09-27 Thread Nathan Wells
Thanks for your input Richard, Firstly, you are right, I was mistaken about ICU and the breakiterator working for sentences (I just tried it right now and it does work, but just not with the normal khan or period of Khmer rather it works with Latin sentence markers which is not enough). I had

Re: Adding Extension for Experimental Thai Spelling

2012-09-27 Thread Martin Hosken
Dear Nathan, Here are some new ideas, ordered by desirability, with number one being the most desired, to number three being the least. 1) When a zero-width space is detected (U+200B), shut off ICU breakiterator for Khmer spell checking for characters following the zero-width space until

Re: Adding Extension for Experimental Thai Spelling

2012-09-27 Thread Richard Wordingham
On Thu, 27 Sep 2012 11:52:26 +0700 Nathan Wells sungk...@gmail.com wrote: 1. If you are shutting off the ICU breakiterator for text following, we should probably also do it for text preceding. Thus if there is a ZWSP or ZWNBSP (U+2060 WJ) anywhere in a text then ICU break iteration is

Re: Adding Extension for Experimental Thai Spelling

2012-09-27 Thread Richard Wordingham
On Thu, 27 Sep 2012 21:08:13 +0700 Nathan Wells sungk...@gmail.com wrote: Firstly, you are right, I was mistaken about ICU and the breakiterator working for sentences (I just tried it right now and it does work, but just not with the normal khan or period of Khmer rather it works with Latin

Re: Adding Extension for Experimental Thai Spelling

2012-09-26 Thread Nathan Wells
Hello Again, Thank you all for your input! This is a deeper problem than I first thought...sorry for the delayed response, but I hope a solution can be found, even though the current ICU breakiterator is not at 100% for Khmer. Here are some new ideas, ordered by desirability, with number one

Re: Adding Extension for Experimental Thai Spelling

2012-09-26 Thread Nathan Wells
Thanks Martin, 1. If you are shutting off the ICU breakiterator for text following, we should probably also do it for text preceding. Thus if there is a ZWSP or ZWNBSP (U+2060 WJ) anywhere in a text then ICU break iteration is disabled for the whole sentence. Yes, I think you are right. If

Re: Adding Extension for Experimental Thai Spelling

2012-07-27 Thread Richard Wordingham
On Thu, 26 Jul 2012 16:33:00 +0700 Martin Hosken martin_hos...@sil.org wrote: 1. use of U+2060 makes string searching and spell checking harder (unless WJ chars are stripped for searching and spell checking). They are not part of the spelling of a word, so their introduction in the underlying

Re: Adding Extension for Experimental Thai Spelling

2012-07-26 Thread Martin Hosken
Dear All, An automatic word and line breaker is very necessary for Khmer and Thai because traditionally they have no spaces between words, and so line-breaking and spell checking require the use of a zero-width space between words which is counterintuitive for most native speakers, and

Re: Adding Extension for Experimental Thai Spelling

2012-07-25 Thread Caolán McNamara
I'll cc this to the list if you don't mind, in order to archive it. I have no immediate great ideas. But I wonder if a view-word boundaries mode would be helpful, i.e. something that indicates the boundaries of the words that the software thinks exist. On Sun, 2012-07-15 at 21:40 +0700, Nathan

Re: Adding Extension for Experimental Thai Spelling

2012-07-25 Thread Nathan Wells
Thanks for your reply. Yes, a view-word boundaries mode would be very helpful (or even incorporating the current view-field shading to include viewing 'gray marks' at the automatic ICU breaking so that users can see what is being done). Would this be hard to implement? Also, we are making some

Re: Adding Extension for Experimental Thai Spelling

2012-07-12 Thread Caolán McNamara
On Sun, 2012-07-08 at 08:08 -0700, sungkhum wrote: I have two questions: is there a way to have the LibreOffice spelling checker (Hunspell) also recognize word-breaks using the ICU break iterator for Khmer so that Cambodians no longer have to add zero-width spaces manually (as it seems to work

Re: Adding Extension for Experimental Thai Spelling

2012-07-12 Thread sungkhum
to this email, your message will be added to the discussion below: http://nabble.documentfoundation.org/Adding-Extension-for-Experimental-Thai-Spelling-tp3735637p3995127.html To unsubscribe from Adding Extension for Experimental Thai Spelling, click herehttp://nabble.documentfoundation.org/template

Re: Adding Extension for Experimental Thai Spelling

2012-07-08 Thread sungkhum
://nabble.documentfoundation.org/Adding-Extension-for-Experimental-Thai-Spelling-tp3735637p3994303.html Sent from the Dev mailing list archive at Nabble.com. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo

Re: Adding Extension for Experimental Thai Spelling

2012-02-17 Thread Németh László
Hi, 2012/2/17 Richard Wordingham richard.wording...@ntlworld.com: It's a vast improvement - it gives LibreOffice a real Thai spell-checker.  Thank you.  I have one worry for Siamese - Németh László suggested that there might be a licensing issue back in

Re: Adding Extension for Experimental Thai Spelling

2012-02-17 Thread Caolán McNamara
On Thu, 2012-02-16 at 23:24 +, Richard Wordingham wrote: I wouldn't expect a dictionary-based line breaker to handle words from other languages. (There's a whole slew of Mon-Khmer languages in Thailand, and they mostly use the Thai script when they happen to get written.) Indeed, yeah, I

Re: Adding Extension for Experimental Thai Spelling

2012-02-17 Thread Richard Wordingham
On Fri, 17 Feb 2012 14:10:21 + Caolán McNamara caol...@redhat.com wrote: On Thu, 2012-02-16 at 23:24 +, Richard Wordingham wrote: Indeed, yeah, I suppose, assuming its as complicated as Thai, that the right direction would be for someone to write for icu new dictionary-based

Re: Adding Extension for Experimental Thai Spelling

2012-02-16 Thread Richard Wordingham
On Tue, 14 Feb 2012 16:19:17 + Caolán McNamara caol...@redhat.com wrote: I think this change: http://cgit.freedesktop.org/libreoffice/core/commit/?id=475d0c59c66fb7752d230f76130b17145aad0c12 should improve matters a lot. It's a vast improvement - it gives LibreOffice a real Thai

Re: Adding Extension for Experimental Thai Spelling

2012-02-14 Thread Caolán McNamara
On Mon, 2012-02-13 at 22:39 +, Richard Wordingham wrote: The spell-checker seems to break up a phrase consisting of just กุหลาบ into 3 or 4 words. Hmm, so I played around with this and here's what I think is the problem... We have some customized break iterator rules in LibreOffice, so

Re: Adding Extension for Experimental Thai Spelling

2012-02-14 Thread Eike Rathke
Hi, On Tuesday, 2012-02-14 16:19:17 +, Caolán McNamara wrote: We have some customized break iterator rules in LibreOffice, so we're using those ones and *not* the built-in icu ones. But we lack a customized Thai one, so we're using some ultra-generic word breaking stuff for Thai and not

Re: Adding Extension for Experimental Thai Spelling

2012-02-13 Thread Michael Stahl
On 11/02/12 17:23, Richard Wordingham wrote: As I understand it, the lack of a usable Thai spell-checker for LibreOffice (unlike, say, a Khmer spell-checker) is due to the Thai break iterator. (I had expected Thai and Khmer to face similar problems, for neither has a visible word separator

Re: Adding Extension for Experimental Thai Spelling

2012-02-13 Thread Michael Meeks
On Sat, 2012-02-11 at 16:23 +, Richard Wordingham wrote: As I understand it, the lack of a usable Thai spell-checker for LibreOffice (unlike, say, a Khmer spell-checker) is due to the Thai break iterator. In common with many, I know nothing about Thai ;-) but my friend Tim does -

Re: Adding Extension for Experimental Thai Spelling

2012-02-13 Thread Caolán McNamara
On Sat, 2012-02-11 at 16:23 +, Richard Wordingham wrote: Is it possible to create an experimental alternative to the Thai break iterator that can be shared with other people as a LibreOffice extension? I don't think we have any way to override our breakiterators from extensions. FWIW,

Re: Adding Extension for Experimental Thai Spelling

2012-02-13 Thread Richard Wordingham
Thank you to every one who's offered me advice. On Mon, 13 Feb 2012 15:08:20 + Caolán McNamara caol...@redhat.com wrote: I don't think we have any way to override our breakiterators from extensions. Ah well, I'll just have to try to get Thai spell-checking working for myself and then

Adding Extension for Experimental Thai Spelling

2012-02-11 Thread Richard Wordingham
As I understand it, the lack of a usable Thai spell-checker for LibreOffice (unlike, say, a Khmer spell-checker) is due to the Thai break iterator. (I had expected Thai and Khmer to face similar problems, for neither has a visible word separator and syllable boundaries are often unclear in both.)