Re: Tagging text as being in arbitrary complex-script languages

2019-04-23 Thread Richard Wordingham
On Tue, 23 Apr 2019 17:35:10 +0200 Eike Rathke wrote: > Hi Richard, > > On Thursday, 2019-04-18 20:40:01 +0100, Richard Wordingham wrote: > > It sounds as though one has to specify the script where there is > > doubt as to what type of script will dominate. Is it an issue

Re: Tagging text as being in arbitrary complex-script languages

2019-04-23 Thread Richard Wordingham
On Tue, 23 Apr 2019 18:00:22 +0200 Eike Rathke wrote: > On Friday, 2019-04-19 03:32:34 +0100, Richard Wordingham wrote: > > In answer to what was intended to be a rhetorical question, I > > suppose und-Latn-t-sa-m0-iast and und-Latn-t-sa-m0-iso would work > > fo

Re: Tagging text as being in arbitrary complex-script languages

2019-04-18 Thread Richard Wordingham
On Thu, 18 Apr 2019 20:40:01 +0100 Richard Wordingham wrote: > On Thu, 18 Apr 2019 12:25:11 +0200 > Eike Rathke wrote: > > Though with sa-Latn > > I doubt there's a use case, so I wouldn't call that "correct" in > > common sense. > > So how do you

Re: Tagging text as being in arbitrary complex-script languages

2019-04-18 Thread Richard Wordingham
On Thu, 18 Apr 2019 12:25:11 +0200 Eike Rathke wrote: > What I usually did is, lookup the language at SIL and the Ethnologue > and use the most prevalent script as implied default script. Which > here https://www.ethnologue.com/language/san would lead to > Devanagari, but in this case more

Re: Tagging text as being in arbitrary complex-script languages

2019-04-17 Thread Richard Wordingham
On Wed, 17 Apr 2019 13:53:25 +0200 Eike Rathke wrote: > > > On 4/15/19 12:26 PM, Eike Rathke wrote: > > > > Adding arbitrary dictionary languages (as long as they strictly > > > > follow the BCP 47 language tag specification) works since quite > > > > a while (2014?) already. > > An

Re: Tagging text as being in arbitrary complex-script languages

2019-04-16 Thread Richard Wordingham
On Mon, 15 Apr 2019 15:14:49 + jonathon wrote: > On 4/15/19 12:26 PM, Eike Rathke wrote: > > Adding arbitrary dictionary languages (as long as they strictly > > follow the BCP 47 language tag specification) works since quite a > > while (2014?) already. Only if you hacked the text to

Re: Tagging text as being in arbitrary complex-script languages

2019-04-10 Thread Richard Wordingham
On Wed, 10 Apr 2019 15:13:52 +0200 Eike Rathke wrote: > Hi Richard, > > On Wednesday, 2019-04-10 04:02:53 +0100, Richard Wordingham wrote: > > > I was also able to get SIL's oxttools to work sufficiently > > What are those oxttools and where to get them? Tools f

Re: Tagging text as being in arbitrary complex-script languages

2019-04-09 Thread Richard Wordingham
On Mon, 8 Apr 2019 16:17:38 +0200 Eike Rathke wrote: > ScriptType value 3 here means CTL. The values are explained in > officecfg/registry/schema/org/openoffice/VCL.xcs under > Thank you for the information, and thanks to Stephan Bergmann for the localisation information. For plodders like

Tagging text as being in arbitrary complex-script languages

2019-04-06 Thread Richard Wordingham
https://wiki.documentfoundation.org/ReleaseNotes/5.4 says, "The language list for text attribution now also displays BCP47 language tags provided by dictionaries if a language is not known in the predefined set of languages. (Eike Rathke (Red Hat, Inc.)) Such additional language tags are

Special Fonts for Spell Checking Northern Thai in Lanna Script

2017-10-15 Thread Richard Wordingham
I am trying to put together a workable solution for spell-checking Northern Thai in the Lanna (a.k.a. Tai Tham) script. I have a good idea how to do it, and it is already working in Firefox. The solution may not be suitable for run of the mill users, but I don't believe run of the mill users

Version of gcc for LibreOffice

2015-10-09 Thread Richard Wordingham
On Wed, 07 Oct 2015 11:10:08 +0200 Jan-Marek Glogowski <glo...@fbihome.de> wrote: (when topic was 'Can't track flow of characters in from Input Method Editor') > Am 06.10.2015 um 23:51 schrieb Richard Wordingham: > > I think my compiler (gcc > > Version 4.6.3) is too old t

Re: Can't track flow of characters in from Input Method Editor

2015-10-08 Thread Richard Wordingham
On Thu, 8 Oct 2015 01:17:14 +0100 Richard Wordingham <richard.wording...@ntlworld.com> wrote: > Thank you all for your inputs. I've finally found where the problem materialises. There is a callback of GtkSalFrame::IMHandler::signalIMDeleteSurrounding() to delete one 'character'. I

Re: Can't track flow of characters in from Input Method Editor

2015-10-08 Thread Richard Wordingham
On Thu, 08 Oct 2015 10:18:15 +0100 Caolán McNamara <caol...@redhat.com> wrote: > On Thu, 2015-10-08 at 08:52 +0100, Richard Wordingham wrote: > > The intent of the call is to delete one Unicode character; On reading the GTK documentation, it is clear that the arguments are in t

Re: Can't track flow of characters in from Input Method Editor

2015-10-07 Thread Richard Wordingham
Thank you all for your inputs. On Wed, 7 Oct 2015 09:57:14 +0200 Miklos Vajna wrote: > Writer "main text" gets all keyboard input in SwEditWin::KeyInput(), > sw/source/uibase/docvw/edtwin.cxx. It's VCL that calls that member > function, and in your case it's probably

Can't track flow of characters in from Input Method Editor

2015-10-06 Thread Richard Wordingham
On Sunday I raised bug report 94753 about the apparent generation of lone surrogates in response to the use of Keyman for Linux under ibus as the input method editor. I have compiled Version 4.4.4.3.0+ with debug to facilitate my investigation; I think my compiler (gcc Version 4.6.3) is too old to

Re: Unicode 8.0?

2015-07-16 Thread Richard Wordingham
On Thu, 16 Jul 2015 17:40:06 +0100 Caolán McNamara caol...@redhat.com wrote: On Thu, 2015-07-16 at 11:53 +0200, Viktor Kovács wrote: I would like to ask when will be adopted Old Hungarian fonts. It is defined in the UNICODE 8.0, central-europe subgroup, and it must be typed right to left

Re: Univerbation

2015-07-07 Thread Richard Wordingham
On Tue, 07 Jul 2015 09:55:38 +0100 Caolán McNamara caol...@redhat.com wrote: On Mon, 2015-07-06 at 09:13 +0100, Richard Wordingham wrote: What mechanisms does ODF have to indicate that a sequence of word characters constitutes a word? But generally we follow the rules of the underlying

Univerbation

2015-07-06 Thread Richard Wordingham
What mechanisms does ODF have to indicate that a sequence of word characters constitutes a word? Having such a mechanism is useful for spell-checking Thai and other languages where the boundaries between words are not marked. At present, one can cancel spurious boundaries by inserting U+2060

Re: Adding Languages to Writer's Character, Font Menu

2015-07-02 Thread Richard Wordingham
On Wed, 24 Jun 2015 23:40:10 +0200 Michael Stahl mst...@redhat.com wrote: On 24.06.2015 23:26, toki wrote: That is part of the reason why I think the whole Western/CJKV/CTL split should be thrown out, and replaced with language/writing system, supplemented by locale data. that's a great

Re: Adding Languages to Writer's Character, Font Menu

2015-06-30 Thread Richard Wordingham
On Tue, 30 Jun 2015 17:48:05 +0200 Eike Rathke er...@redhat.com wrote: On Monday, 2015-06-29 20:40:46 +0200, Khaled Hosny wrote: We already handle this at the text shaping level in VCL for platforms where HarfBuzz is used. I think we talk about two different things here. Yes. Khaled

Licence to Convert Dictionary to Spell-Checker Dictionary

2015-06-29 Thread Richard Wordingham
One way of producing a spelling dictionary is to take the words from a near-normal dictionary and use them. Does publishing such a dictionary require the permission of the dictionary's copyright holder? If it's relevant, the dictionary was published in Thailand. I appreciate that one ought to

Re: Adding Languages to Writer's Character, Font Menu

2015-06-29 Thread Richard Wordingham
On Mon, 29 Jun 2015 20:40:46 +0200 Khaled Hosny khaledho...@eglug.org wrote: On Mon, Jun 29, 2015 at 12:14:44PM +0200, Eike Rathke wrote: Hi Richard, On Wednesday, 2015-06-24 20:54:54 +0100, Richard Wordingham wrote: The script is generally implicit in the text. You want

Re: Adding Languages to Writer's Character, Font Menu

2015-06-29 Thread Richard Wordingham
On Wed, 24 Jun 2015 21:26:50 + toki toki.kant...@gmail.com wrote: I'll simply point to the current version of Microsoft Office, which is claimed, by Microsoft, to support more than 7,000 languages. As far as UI design goes, there are at least four options. 1) Offer everything, listed

Re: Adding Languages to Writer's Character, Font Menu

2015-06-25 Thread Richard Wordingham
On Wed, 24 Jun 2015 20:54:54 +0100 Richard Wordingham richard.wording...@ntlworld.com wrote: On Wed, 24 Jun 2015 12:31:16 +0200 Eike Rathke er...@redhat.com wrote: Simply in a css::lang::Locale set the Language field to qlt and in the Variant have the language tag, see http

Re: Adding Languages to Writer's Character, Font Menu

2015-06-24 Thread Richard Wordingham
On Tue, 23 Jun 2015 21:07:12 + toki toki.kant...@gmail.com wrote: On 06/22/2015 07:30 PM, Richard Wordingham wrote: How do I add a language to this menu so that fonts that can will render text in the style appropriate to the language? I've been getting a fair bit of information off

Re: Adding Languages to Writer's Character, Font Menu

2015-06-24 Thread Richard Wordingham
On Wed, 24 Jun 2015 11:52:49 +0200 Eike Rathke er...@redhat.com wrote: If I have some text with khb-CN as the language and region and then try to set the language for a greater expanse of text, khb-CN does not come up in the menu. N.B. By 'language' and 'region', I mean language and region

Re: Adding Languages to Writer's Character, Font Menu

2015-06-24 Thread Richard Wordingham
On Wed, 24 Jun 2015 12:31:16 +0200 Eike Rathke er...@redhat.com wrote: * Allow arbitrary lang tags to be used in a text anywhere OpenDocument allows these - it is just a question of how much LibreOffice supports this. It does. I believe the UNO interface supports this, but I won't

Re: Adding Languages to Writer's Character, Font Menu

2015-06-23 Thread Richard Wordingham
(Copy to list for reference - I accidentally replied to Caolán alone.) On Tue, 23 Jun 2015 08:59:04 +0100 Caolán McNamara caol...@redhat.com wrote: The language combo-box allows you to enter arbitrary language tags. What happens if you just enter khb-CN in there. Using vanilla Version:

Adding Languages to Writer's Character, Font Menu

2015-06-22 Thread Richard Wordingham
How do I add a language to this menu so that fonts that can will render text in the style appropriate to the language? I am reconciled to having to create a bespoke version of LibreOffice, though I'd rather not. Manually editing a document's XML files would be the last resort - it seems to work!

Re: Adding Extension for Experimental Thai Spelling

2012-09-27 Thread Richard Wordingham
On Thu, 27 Sep 2012 11:52:26 +0700 Nathan Wells sungk...@gmail.com wrote: 1. If you are shutting off the ICU breakiterator for text following, we should probably also do it for text preceding. Thus if there is a ZWSP or ZWNBSP (U+2060 WJ) anywhere in a text then ICU break iteration is

Re: Adding Extension for Experimental Thai Spelling

2012-09-27 Thread Richard Wordingham
On Thu, 27 Sep 2012 21:08:13 +0700 Nathan Wells sungk...@gmail.com wrote: Firstly, you are right, I was mistaken about ICU and the breakiterator working for sentences (I just tried it right now and it does work, but just not with the normal khan or period of Khmer rather it works with Latin

Re: Adding Extension for Experimental Thai Spelling

2012-07-27 Thread Richard Wordingham
On Thu, 26 Jul 2012 16:33:00 +0700 Martin Hosken martin_hos...@sil.org wrote: 1. use of U+2060 makes string searching and spell checking harder (unless WJ chars are stripped for searching and spell checking). They are not part of the spelling of a word, so their introduction in the underlying

Re: Adding Extension for Experimental Thai Spelling

2012-02-17 Thread Richard Wordingham
On Fri, 17 Feb 2012 14:10:21 + Caolán McNamara caol...@redhat.com wrote: On Thu, 2012-02-16 at 23:24 +, Richard Wordingham wrote: Indeed, yeah, I suppose, assuming its as complicated as Thai, that the right direction would be for someone to write for icu new dictionary-based

Re: Adding Extension for Experimental Thai Spelling

2012-02-16 Thread Richard Wordingham
On Tue, 14 Feb 2012 16:19:17 + Caolán McNamara caol...@redhat.com wrote: I think this change: http://cgit.freedesktop.org/libreoffice/core/commit/?id=475d0c59c66fb7752d230f76130b17145aad0c12 should improve matters a lot. It's a vast improvement - it gives LibreOffice a real Thai

Re: Adding Extension for Experimental Thai Spelling

2012-02-13 Thread Richard Wordingham
Thank you to every one who's offered me advice. On Mon, 13 Feb 2012 15:08:20 + Caolán McNamara caol...@redhat.com wrote: I don't think we have any way to override our breakiterators from extensions. Ah well, I'll just have to try to get Thai spell-checking working for myself and then

Adding Extension for Experimental Thai Spelling

2012-02-11 Thread Richard Wordingham
As I understand it, the lack of a usable Thai spell-checker for LibreOffice (unlike, say, a Khmer spell-checker) is due to the Thai break iterator. (I had expected Thai and Khmer to face similar problems, for neither has a visible word separator and syllable boundaries are often unclear in both.)