On 12/03/14 12:12, Fraser Gordon wrote:
On 11 Mar 2014, at 19:26, Richmond <richmondmathew...@gmail.com> wrote:
Well; in theory that looks good until you start to think about languages which
are
written (such as Sanskrit) with no obvious word boundaries and both vowel
mutation (Sandhi)
at what would be word boundaries, and consonant fusion.
The library that we use for low-level Unicode stuff (ICU) provides a facility called
"break iterators" - basically, these functions break up text according to
various rules and variants are provided for graphemes, words, sentences, etc. ICU has a
(very large) database of rules and (for some languages) dictionaries in order to properly
break words even in complex languages. Not all languages are supported but a large number
are.
sentence (breaks on unicode sentence boundaries)
That looks a bit fishy.
How are you going to work out what marks a sentence boundary in every language
that one can write
with Unicode? And there are languages where the idea of a 'sentence' is absent.
Again, ICU does the hard work. In a language without sentences, text will only
contain one sentence.
There is also enough intelligence in ICU that it can tell the difference
between a decimal point and a full-stop/period. Some languages use different
marks as sentence separators and ICU also knows about them.
I'm sorry to be such a "pill", but word and sentence boundaries are such
culture-bound concepts
that they will only be any good for languages that mark word and sentence
boundaries.
This is about the same as stating dogmatically that "all bananas are yellow",
when they are not.
Paragraphs are defined in the Unicode standard. They are runs of text
terminated by the Paragraph Separator character or (optionally) any other
newline character. While it may not make sense linguistically, this is how we
delimit paragraphs in LiveCode fields.
A pretty comprehensive answer to all my points.
Thanks.
Richmond.
Regards,
Fraser
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode