On 12/03/14 12:12, Fraser Gordon wrote:
On 11 Mar 2014, at 19:26, Richmond <richmondmathew...@gmail.com> wrote:
Well; in theory that looks good until you start to think about languages which 
are
written (such as Sanskrit) with no obvious word boundaries and both vowel 
mutation (Sandhi)
at what would be word boundaries, and consonant fusion.
The library that we use for low-level Unicode stuff (ICU) provides a facility called 
"break iterators" - basically, these functions break up text according to 
various rules and variants are provided for graphemes, words, sentences, etc. ICU has a 
(very large) database of rules and (for some languages) dictionaries in order to properly 
break words even in complex languages. Not all languages are supported but a large number 
are.

sentence (breaks on unicode sentence boundaries)
That looks a bit fishy.

How are you going to work out what marks a sentence boundary in every language 
that one can write
with Unicode? And there are languages where the idea of a 'sentence' is absent.
Again, ICU does the hard work. In a language without sentences, text will only 
contain one sentence.

There is also enough intelligence in ICU that it can tell the difference 
between a decimal point and a full-stop/period. Some languages use different 
marks as sentence separators and ICU also knows about them.

I'm sorry to be such a "pill", but word and sentence boundaries are such 
culture-bound concepts
that they will only be any good for languages that mark word and sentence 
boundaries.

This is about the same as stating dogmatically that "all bananas are yellow", 
when they are not.
Paragraphs are defined in the Unicode standard. They are runs of text 
terminated by the Paragraph Separator character or (optionally) any other 
newline character. While it may not make sense linguistically, this is how we 
delimit paragraphs in LiveCode fields.

A pretty comprehensive answer to all my points.

Thanks.

Richmond.

Regards,
Fraser
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to