On Thu, 17 Oct 2019 23:11:55 +0100
Richard Wordingham via Unicode wrote:
> There seems to be a Unicode non-compliance (C6) issue in the
> definition of collation grapheme clusters (defined in UTS#10 Section
> 9.9). Using the DUCET collation, the canonically equivalent strings
> รู้
There seems to be a Unicode non-compliance (C6) issue in the definition
of collation grapheme clusters (defined in UTS#10 Section 9.9). Using
the DUCET collation, the canonically equivalent strings รู้ and รัู decompose into collation
grapheme clusters in two different ways. The first
0" does not
>>> mean it is a valid "weight", it's a notation only
>>>
>>> No, it is explicitly a valid weight. And it is explicitly and
>>> normatively referred to in the specification of the algorithm. See UTS10-D8
>>> (and su
plicitly a valid weight. And it is explicitly and normatively
>> referred to in the specification of the algorithm. See UTS10-D8 (and
>> subsequent definitions), which explicitly depend on a definition of "A
>> collation weight whose value is zero." The entire statement o
and normatively
> referred to in the specification of the algorithm. See UTS10-D8 (and
> subsequent definitions), which explicitly depend on a definition of "A
> collation weight whose value is zero." The entire statement of what are
> primary, secondary, tertiary, etc. collation elemen
and normatively
> referred to in the specification of the algorithm. See UTS10-D8 (and
> subsequent definitions), which explicitly depend on a definition of "A
> collation weight whose value is zero." The entire statement of what are
> primary, secondary, tertiary, etc. collation elemen
only
>
> No, it is explicitly a valid weight. And it is explicitly and
> normatively referred to in the specification of the algorithm. See
> UTS10-D8 (and subsequent definitions), which explicitly depend on a
> definition of "A collation weight whose value is zero." The
at its use of "" does not
mean it is a valid "weight", it's a notation only
No, it is explicitly a valid weight. And it is explicitly and
normatively referred to in the specification of the algorithm. See
UTS10-D8 (and subsequent definitions), which explicitly depe
introducing confusion about these "". UTR#10 still does not explicitly
state that its use of "" does not mean it is a valid "weight", it's a
notation only (but the notation is used for TWO distinct purposes: one is
for presenting the notation format used in the DU
required to
> create an equivalent collation order. these steps are completely
> unnecessary and should be removed.
>
> Le ven. 2 nov. 2018 à 14:03, Mark Davis ☕️ a
> écrit :
>
> > You may not like the format of the data, but you are not bound to
> > it. If you don't li
dard makes the presence of required in some steps, and the
> requirement is in fact wrong: this is in fact NEVER required to create an
> equivalent collation order. these steps are completely unnecessary and
> should be removed.
>
> Le ven. 2 nov. 2018 à 14:03, Mark Davis ☕️ a é
It's not just a question of "I like it or not". But the fact that the
standard makes the presence of required in some steps, and the
requirement is in fact wrong: this is in fact NEVER required to create an
equivalent collation order. these steps are completely unnecessar
mance
“The Unicode Collation Algorithm is a logical specification.
Implementations are free to change any part of the algorithm as long as any
two strings compared by the implementation are ordered the same as they
would be by the algorithm as specified. Implementations may also use a
different forma
As well the step 2 of the algorithm speaks about a single "array" of
collation elements. Actually it's best to create one separate array per
level, and append weights for each level in the relevant array for that
level.
The steps S2.2 to S2.4 can do this, including for derived col
nto
> target (binary) strings:
>
> For a level-3 collation, you just then need only 3 calls to
> "string:gsub()" to compute any collation:
>
> - the first ":gsub(mapNormalize)" can decompose a source text into
> collation elements and can perform reordering t
On Thu, 1 Nov 2018 21:13:46 +0100
Philippe Verdy via Unicode wrote:
> I'm not speaking just about how collation keys will finally be stored
> (as uint16 or bytes, or sequences of bits with variable length); I'm
> just refering to the sequence of weights you generate.
>
On Thu, 1 Nov 2018 22:04:40 +0100
Philippe Verdy via Unicode wrote:
> The DUCET could have as well used the notation ".none", or
> just dropped every "." in its file (provided it contains a data
> entry specifying what is the minimum weight used for each level).
> This notation is only intend
So it should be clear in the UCA algorithm and in the DUCET datatable that
"" is NOT a valid weight
It is just a notational placeholder used as ".", only indicating in the
DUCET format that there's NO weight assigned at the indicated level,
because the collation el
cX" where "X" can be any character(s).
Remove any reference to the "level separator" from the UCA. You never need
it.
As well this paragraph
7.3 Form Sort Keys <http://unicode.org/reports/tr10/#Step_3>
*Step 3.* Construct a sort key for each collation element array
't actually assemble a sort key.
>
> People who want sort keys usually want them to be short, so you spend time
> on compression. You probably also build sort keys as byte vectors not
> uint16 vectors (because byte vectors fit into more APIs and tend to be
> shorter), like ICU do
Le jeu. 1 nov. 2018 à 21:31, Philippe Verdy a écrit :
> so you can use these two last functions to write the first one:
>
> bool isIgnorable(int level, string element) {
> return getLevel(getWeightAt(element, 0)) > getMinWeight(level);
> }
>
correction:
return getWeightAt(element, 0)
Le jeu. 1 nov. 2018 à 21:08, Markus Scherer a
écrit :
> When you want fast string comparison, the zero weights are useful for
>> processing -- and you don't actually assemble a sort key.
>>
>
And no, I absolutely no case where any weight is useful during
processing, it does not distinguish a
I'm not speaking just about how collation keys will finally be stored (as
uint16 or bytes, or sequences of bits with variable length); I'm just
refering to the sequence of weights you generate.
You absolutely NEVER need ANYWHERE in the UCA algorithm any weight,
not even during proc
secondary weights in the sort key are terminated by any sequence
of 0020 (the minimal secondary weight), you can suppress them from the
collation key.
* when the tertiary weights are in the sort key are terminated by any
sequence of 0002 (the minimal tertiary weight), you can suppress them from
sort keys as byte vectors not
uint16 vectors (because byte vectors fit into more APIs and tend to be
shorter), like ICU does using the CLDR collation data file. The CLDR root
collation data file remunges all weights into fractional byte sequences,
and leaves gaps for tailoring.
markus
I just remarked that there's absolutely NO utility of the collation weight
anywhere in the algorithm.
For example in UTR #10, section 3.3.1 gives a collection element :
[..0021.0002]
for COMBINING GRAVE ACCENT. However it can also be simply:
[.0021.0002]
for a simple reason
doing
so.
I am not aware of any general requirement that a CET be a tailoring of
DUCET or of the CLDR root collation, so the implicit weights would be
irrelevant in this case. The implicit weights are part of DUCET.
If no characters are supported, performing NFD will be a rather obvious
trivial tra
On Mon, Dec 4, 2017 at 5:30 AM, Richard Wordingham via Unicode <
unicode@unicode.org> wrote:
> May a collation algorithm that always compares all strings as equal be a
> compliant implementation of the Unicode Collation Algorithm (UTS #10)?
> If not, by which clause is it not compl
May a collation algorithm that always compares all strings as equal be a
compliant implementation of the Unicode Collation Algorithm (UTS #10)?
If not, by which clause is it not compliant? Formally, this algorithm
would require that all weights be zero.
Would an implementation that supported no
lar Expressions'
Version 13 (dated 2008, superseded in 2012), RL3.5 comes pretty close
to this with ranges tailored for collation. The pattern
[\u0E01-\u0E02]* would match both those words. To be precise, one
would use a search for [ก-ไก]*. RL3.5 has been with withdrawn because
of difficulties
typically, visually opaque syllable boundaries are taken into
account, e.g. in Lao and in some older Thai dictionaries (though the
Thai examples I know of were compiled by Europeans).
There are two approaches to these ambiguities for correct automated
collation. One can either use a vocabulary-based
Hi Richard,
I was looking again at your example where U+0344 causes bad results in
collation of FCD strings. See inline below.
On Tue, Feb 12, 2013 at 12:19 PM, Richard Wordingham <
richard.wording...@ntlworld.com> wrote:
> On Mon, 11 Feb 2013 17:13:58 -0800
> Markus Scherer wr
ppresses codes that are prefixes of another.
Exactyly what I described (you use variable number of bits), except
that your scheme is highly suboptimal, compared to an Huffman coding
or the optimal artithmetic coding (for which you can generate
statistics of frequencies (precomputed from some initia
rote:
> >> > Please give an example of how the low/high split would fail. With
> >> > the primary collation weights 20, 21, 21 80 and 22 I get the
> >> > following primary collation weight sequences for one and two
> >> > collating elements, marking bou
can be used there as well.
For example frequent sequences of weights could also have a predictive
encoding model, notably when creating collation keys for strenghts 2
or higher, because there will be very frequent sequences of identical
secondary weights.
So instead of encoding the same secondary w
2013/3/16 Richard Wordingham :
> On Sat, 16 Mar 2013 09:29:07 -0700
> Markus Scherer wrote:
>
>> On Sat, Mar 16, 2013 at 4:09 AM, Richard Wordingham <
>> richard.wording...@ntlworld.com> wrote:
>>
>> > Please give an example of how the low/high split w
On Sat, 16 Mar 2013 09:29:07 -0700
Markus Scherer wrote:
> On Sat, Mar 16, 2013 at 4:09 AM, Richard Wordingham <
> richard.wording...@ntlworld.com> wrote:
>
> > Please give an example of how the low/high split would fail. With
> > the primary collation weights 20,
On Sat, Mar 16, 2013 at 4:09 AM, Richard Wordingham <
richard.wording...@ntlworld.com> wrote:
> Please give an example of how the low/high split would fail. With the
> primary collation weights 20, 21, 21 80 and 22 I get the following
> primary collation weight sequences for one an
lit in this post, for greatly improved clarity.)
Please give an example of how the low/high split would fail. With the
primary collation weights 20, 21, 21 80 and 22 I get the following
primary collation weight sequences for one and two collating elements,
marking boundaries of collating elements
clearly treats 'large
> weights' as being in multiple collation elements, whereas, in various
> places, for transforming collation element tables properly, one needs
> them to be treated as being in a single collation element.
>
Correct, that's where the complexities are that
>
> The "fractional" refers to the same kind of mechanism as the "large
> weight values" in the UCA spec.
Yes. The problem is that formally the UCA clearly treats 'large
weights' as being in multiple collation elements, whereas, in various
places, for transfo
On Fri, Mar 15, 2013 at 3:05 PM, Richard Wordingham <
richard.wording...@ntlworld.com> wrote:
> > In CLDR/ICU's FractionalUCA.txt, all but 40 or so of the primary
> > weights (and many of the secondary weights) use the "large weights"
> > mechanism.
>
> No, they're 32-bit weights expressed by omit
p for a character
> or substring. In fact, that's really what ICU does, except the
> current code is limited to one-or-two units (bytes).
I would say that the UCA Section 6.2 stops me. It clearly says that
the generic example '[(X+1)..], [..]' is two
collation
On Fri, Mar 15, 2013 at 12:50 PM, Richard Wordingham <
richard.wording...@ntlworld.com> wrote:
> On Thu, 14 Mar 2013 19:13:43 -0700
> Markus Scherer wrote:
>
> > On Thu, Mar 14, 2013 at 4:09 PM, Richard Wordingham <
> > richard.wording...@ntlworld.com> wrote:
> >
> > > On Thu, 14 Mar 2013 14:49:1
s of other collating elements and an order preserving
change of the irreducible substrings will preserve the order of the
collating elements. This is a consequence of how humans (or
just Unicode man?) generate primary weights, and does not apply to
collation elements in general. This decompositi
range -- it's basically an
> > option for an "ignore punctuation" mode, and you wouldn't want to
> > ignore nearly every assigned character in Unicode.
>
> There are a lot of characters in the SIP!
Richard, we are talking about collation here, and "va
gt; and it would be expressed "u00u2FD5", not "u2FD5".
No - though your being confused merits feedback. The example given
specifies variableTop by means of a *string* - the 'string value' for
the variable top. The equivalent basic syntax for variableTop =
"uXXu&quo
On Thu, 14 Mar 2013 14:49:18 -0700
Markus Scherer wrote:
> However, it does not make a lot of sense to set the variable top to
> something above the currency symbols range -- it's basically an
> option for an "ignore punctuation" mode, and you wouldn't want to
> ignore nearly every assigned chara
-byte primaries for the majority of characters now. See this doc from a
few years ago:
http://site.icu-project.org/design/collation/uca-weight-allocation
Unfortunately, this makes setVariableTop() not work with most
characters<http://bugs.icu-project.org/trac/ticket/8103>.
I believe we have n
, and isn't a part of UCA per se at all.
> Although I can't find a clear official definition of the semantics of
> 'topVariable',
"variableTop" is now defined in the LDML spec. See the proposed update for UTS
#37.
> I do remember being told that it simply us
x27;topVariable', I do remember being told that it simply uses the first
positive primary in the collation key as the maximum variable weight.
Now in allkeys.txt, U+2FD5 expands to two collation elements. However,
in FractionalUCA.txt, which specifies 32-bit (fractional) weights, it
has a sin
On Thu, 14 Mar 2013 00:19:15 +
"Whistler, Ken" wrote:
> Richard Wordingham wrote:
>
> > > It loosened up the spec, so that the spec itself didn't seem to be
> > > requiring that each of the first 3 levels had to be expressed
> > > wi
Richard Wordingham wrote:
> > It loosened up the spec, so that the spec itself didn't seem to be
> > requiring that each of the first 3 levels had to be expressed with a
> > full 16 bits in any collation element table.
>
> I don't read it that way. But it di
ghts.
'Large weights' make it difficult (I don't say impossible) to check
UCETs for well-formedness.
> It loosened up the spec, so that the spec itself didn't seem to be
> requiring that each of the first 3 levels had to be expressed with a
> full 16 bits in any collat
0), or is it intended to do
> away with the inconvenient concept of 'large weights'?
Amplifying somewhat on Markus' response to these questions...
In UCA 6.1.0, the wording was:
"...where a collation element is an ordered list of three or more 16-bit
weights."
In
U logically stores weights as
sequences of 1, 2, 3 or 4 bytes, with collation elements encoded in
interesting ways so that most CEs fit into 32-bit integers.
Previously, each of the four weights could be accommodated in 16, 16,
> 16 and 24 bits. How many bits may be needed for a DUCET collation
&
inconvenient concept of 'large weights'?
Previously, each of the four weights could be accommodated in 16, 16,
16 and 24 bits. How many bits may be needed for a DUCET collation
element now? Are we threatened with having to accommodate 36 bit
weights?
If it is not intended to do away with
losure
> S of T is the least set such that:
>
> 1) E(T) ⊂ S
> 2) If xu ∈ S, vy ∈ T, u and v are characters, and vy is the last
> collation element in xuvy, then x(E(uv) ∩ U ∩ F)E(y) ⊂ S.
I got Condition 2 wrong. See
http://bugs.icu-project.org/trac/ticket/9319 for the correction.
Richard.
closure
> S of T is the least set such that:
>
> 1) E(T) ⊂ S
> 2) If xu ∈ S, vy ∈ T, u and v are characters, and vy is the last
> collation element in xuvy, then x(E(uv) ∩ U ∩ F)E(y) ⊂ S.
CORRECTION: 'Collating element', not 'collation element'.
If the '
On Tue, 12 Feb 2013 01:17:45 +
"Whistler, Ken" wrote:
> One of the reasons I resisted incorporation of
> canonical enclosure in the basic UCA algorithm and in the DUCET table
> is because of its infinitesimal ROI. It complicates the table and its
> processing substantially, all in service of
On Mon, 11 Feb 2013 17:13:58 -0800
Markus Scherer wrote:
> I would not revise FCD itself. For a number of processes, it is
> sufficient as is. For collation it's not.
>
> About the Tibetan precomposed vowels:
>
> For the LDML spec, I submitted a CLDR ticket this mornin
t; for the adequacy of the current canonical closure. If the collation
> fails this adequacy test, then again disabling normalisation should be
> prohibited. (I would suggest that in these cases the normalisation
> setting should be overridden with only the gentlest of chidings.)
FCD is
I would not revise FCD itself. For a number of processes, it is sufficient
as is. For collation it's not.
About the Tibetan precomposed vowels:
For the LDML spec, I submitted a CLDR ticket this morning:
http://unicode.org/cldr/trac/ticket/5667
For UTS #10 section 6.5, I just now submitt
here is an ICU
bug report http://bugs.icu-project.org/trac/ticket/9319
Default collation
I remarked that the UCA (Technical Report 10) and LDML
(Techical Report 35) specifications, taken together, make sense only if
there is no such problem.
Before raising a specific Unicode bug, I think it woul
del bene —*
**
On Sun, Jul 8, 2012 at 7:46 AM, Richard Wordingham <
richard.wording...@ntlworld.com> wrote:
> Are the collation tests meant to have been updated for the change in
> the draft of Step 2.1 of the collation algorithm? I haven't changed
> what I believe to be
Are the collation tests meant to have been updated for the change in
the draft of Step 2.1 of the collation algorithm? I haven't changed
what I believe to be a UCA 6.1.0-compliant implementation, yet my code
now passes the 6.2.0 tests for both DUCET and CLDR root. (I understand
that the err
Mark Davis ☕ wrote:
There are no current plans to do that. If you want to present a case for
adding additional collation sequences to CLDR, please start the process
by
filing a bug at http://unicode.org/cldr/trac/newticket
Mark
— Il meglio è l’inimico del bene —
gt; Entered ticket #4949 for Simplified Chinese, stroke order.
>>
>> Thanks,
>> Matt
>>
>> On Fri, Jun 22, 2012 at 12:55 PM, Mark Davis ☕ wrote:
>>>
>>> There are no current plans to do that. If you want to present a case for
>>> addin
ootwork already?)
Stephan
On 6/22/2012 5:05 PM, Matt Ma wrote:
Entered ticket #4949 for Simplified Chinese, stroke order.
Thanks,
Matt
On Fri, Jun 22, 2012 at 12:55 PM, Mark Davis ☕ wrote:
There are no current plans to do that. If you want to present a case for
adding additional collation sequences
Entered ticket #4949 for Simplified Chinese, stroke order.
Thanks,
Matt
On Fri, Jun 22, 2012 at 12:55 PM, Mark Davis ☕ wrote:
> There are no current plans to do that. If you want to present a case for
> adding additional collation sequences to CLDR, please start the process by
> filing
There are no current plans to do that. If you want to present a case for
adding additional collation sequences to CLDR, please start the process by
filing a bug at http://unicode.org/cldr/trac/newticket
--
Mark <https://plus.google.com/114199149796022210033>
*
*
ese, U+8303 has 9 strokes as Matt mentioned in the
>>>> email.
>>>>
>>>> The radical "++" is counted as 4 strokes. I think there are several
>>>> radicals have the same issue, different stroke counts, between simplified
>>>> Chine
> Claire.
>>
>> On Thu, Jun 7, 2012 at 5:54 PM, Mark Davis ☕ wrote:
>>
>>> On Thu, Jun 7, 2012 at 4:28 PM, Matt Ma wrote:
>>>
>>>> Hi,
>>>>
>>>> I have two questions regarding the collation sequence defined in
>
issue, different stroke counts, between simplified
> Chinese and traditional Chinese.
>
> Claire.
>
> On Thu, Jun 7, 2012 at 5:54 PM, Mark Davis ☕ wrote:
>
>> On Thu, Jun 7, 2012 at 4:28 PM, Matt Ma wrote:
>>
>>> Hi,
>>>
>>> I have two q
fied
Chinese and traditional Chinese.
Claire.
On Thu, Jun 7, 2012 at 5:54 PM, Mark Davis ☕ wrote:
> On Thu, Jun 7, 2012 at 4:28 PM, Matt Ma wrote:
>
>> Hi,
>>
>> I have two questions regarding the collation sequence defined in
>> zh.xml, CLDR 21.0
>>
>> 1
On Thu, Jun 7, 2012 at 4:28 PM, Matt Ma wrote:
> Hi,
>
> I have two questions regarding the collation sequence defined in
> zh.xml, CLDR 21.0
>
> 1. Why is U+8303 (范) counted as 9 strokes instead of 8 for type="stroke">? As a reference, U+59DA (姚) is counted as
UTS#18 is really a mess about collation clusters. But remamber that
collation elements are specific to each language for which they are
defined (including the "root" locale which acts as a pseudo-language
just working as a default option for all languages that don't have
specific
I'm currently reviewing the definition of the Unicode
Collation Algorithm (as opposed to just trying to comply with it),
and I came across the concept of collation grapheme clusters, defined in
UTS#18 'Unicode Regular Expressions'.
For what types of strings are they supposed to b
On Sat, 19 May 2012 01:12:17 +0100
Richard Wordingham wrote:
> This will then work for DUCET
> 6.1.0, work for Danish, and work for my mischievous 0302 COMBINING
> CIRCUMFLEX ACCENT+0067 LATIN SMALL LETTER G contraction.
There is a very similar rule in CLDR for Lithuanian - 0307+0301 has
CE(0301
Hi Richard,
This is essentially the same problem as
http://bugs.icu-project.org/trac/ticket/9319 right? (Contractions
overlapping with decomposition mappings.)
Would you mind adding a reply to that with the Lithuanian issue?
Thanks,
markus
On Sun, 20 May 2012 17:05:00 +0100
Richard Wordingham wrote:
> CORRECTION to correction
I wrote
"rules for +0307+"
when, of course, I meant
"rules for +0307+"
Sorry about that.
Richard.
On Sun, 20 May 2012 16:15:24 +0100
Richard Wordingham wrote:
CORRECTION:
> For the general case, we ought to be able to express a rule such as
> 'ignore the countering of sof-dottedness', as in Lithuanian casing,
> but I don't see any finite method of expressing it under the UCA,
As we have dis
one starting with a combining accent or the non-initial part of an
> Indic vowel.
You may think the suggestions about hiders is excessive, but a real
example of hiding occurs when subjecting the current Lithuanian
collation in CLDR, which has a humanly unreadable contraction making
0307+0301 c
On Fri, 18 May 2012 09:51:34 -0700
Markus Scherer wrote:
> On inspection, we think we can do better (and want to), probably by
> adding overlap contractions. If we get into trouble with that, we
> will think of alternatives. One is to decompose more characters even
> in FCD input. Another is to k
r particular input
> > conditions (except NFD input).
>
> So long as you don't claim conformance to the CLDR collation
> definitions. If you do, a lot depends on how one interprets the
> definition of normalisation settings given in UTS#35 'Unicode Locale
> Data Markup La
On Fri, 18 May 2012 09:51:34 -0700
Markus Scherer wrote:
> There is nothing that requires us to get correct results *without
> normalization* for all FCD strings or any other particular input
> conditions (except NFD input).
So long as you don't claim conformance to the
collation to be fast, at least for most of normal input.
One of the main performance optimizations is to skip the normalization step
but still get the correct results for most input.
We used to think and write that as long as input strings pass the FCD test,
we will get the correct results. Except, at
On Thu, 17 May 2012 21:32:19 -0700
Markus Scherer wrote:
> Ok, but assuming we didn't add 0FB2+0F71, why can't we add the
> contraction 0FB2+0F81 and have the 0334 and any other non-starter be
> handled via discontiguous matching?
Time for me to make a pronouncement on colla
have the implementation-generated contractions for 0F71+0F73
and 0F71+0F73+0F72 (and the other pairs based on pairs of vowels from
0F72, 0F74 and 0F80), and F073 (and the other long vowels) are not
blocked by 0F71, we're OK for UCA 6.1.0 at least as far back as UCA
4.1.0. (A collation ha
0F72> skipping the two middle 0F71. That string
is equivalent to the FCD-passing string <0F71, 0F71, 0F73> but there is no
0F72 in sight there to complete the match if we don't modify the string.
If we cannot find a way to handle this with a finite (actually, small)
amount of data,
On Thu, 17 May 2012 15:42:37 -0700
Markus Scherer wrote:
> On Thu, May 17, 2012 at 3:00 PM, Richard Wordingham <
> richard.wording...@ntlworld.com> wrote:
>> HOWEVER, you must *not* have the added contraction for 0F71+0F71.
> If we don't have this prefix contraction, then we will miss a
> disco
On Thu, May 17, 2012 at 3:00 PM, Richard Wordingham <
richard.wording...@ntlworld.com> wrote:
> If using DUCET, the collation elements for 0F71+0F71+0F72 are those for
> <0F73, 0F71>, namely (at 6.1.0):
>
> [.2572.0020.0002.0F73][.2570.0020.0002.0F71].
>
> The corr
> give the pair of long vowels. We don't need to worry about
> > because that is not FCD.
> I am not following.
> Given contractions
> 0F71+0F71 (needed as a prefix of the next one)
> 0F71+0F73
> what other contractions do we need to add to avoid which pro
2012/5/17 Richard Wordingham :
> On Wed, 16 May 2012 21:46:17 -0700
> Mark Davis ☕ wrote:
>
>> No, it's not.
>>
>> Including x in Lao for some pedagogical (I'm guessing) purpose is
>> completely out of scope. That'd be like including π in Latin because
>> it sometimes occurs in the middle of Engli
On Thu, May 17, 2012 at 1:02 PM, Richard Wordingham <
richard.wording...@ntlworld.com> wrote:
> As x = 0F71, we also need the
> contractions of x+0F73 (or x+0F71+0F72) with 0F72, 0F74 and 0F80 to
> give the pair of long vowels. We don't need to worry about
> because that is not FCD.
>
I am not
On Wed, 16 May 2012 21:46:17 -0700
Mark Davis ☕ wrote:
> No, it's not.
>
> Including x in Lao for some pedagogical (I'm guessing) purpose is
> completely out of scope. That'd be like including π in Latin because
> it sometimes occurs in the middle of English text.
No, it's more like including D
On Wed, 16 May 2012 16:03:08 -0700
Markus Scherer wrote:
> The problem is a contraction x+0F72 and input text x+0F73 where the
> inner 0F71 should be skipped. We can avoid this by adding a
> contraction for x+0F73 (and one for the equivalent x+0F71+0F72).
>
> On the other hand, x+0F73 (together
*Please* use a different email subject line for the "x vs. Lao" discussion.
markus
On Thu, May 17, 2012 at 1:57 AM, wrote:
> Well, I was speaking of the general case, not this specific example.
> Orthographies which mix in random characters from other scripts do not, and
> should not, drive the
From: Mark Davis ☕
> On Wed, May 16, 2012 at 9:20 PM, wrote:
>> From: Ken Whistler
>> > Orthographies which mix in random characters from other scripts do not
>> > (or should not) drive the identity of characters for *scripts* per se.
>> > And edge cases fo
On 5/16/2012 9:46 PM, Mark Davis ☕ wrote:
No, it's not.
Including x in Lao for some pedagogical (I'm guessing) purpose is
completely out of scope. That'd be like including π in Latin because
it sometimes occurs in the middle of English text.
--
1 - 100 of 315 matches
Mail list logo