On 02/08/2001 11:20:27 AM "J M Sykes" wrote:

>When an standard conformaing SQL-implementation concatenates two
normalized
>UCS strings, then it is required that the result be normalized (noting
>Unicode Standard Annex #15 Unicode Normalization Forms, Concatenation).

Yes. It must be understood that a concatenated string is not guaranteed to
be normalised until it is explicitly normalised, regardless of the state of
the operand strings.



>My question is, supposing the NF of the two operands to be different, what
>should be the NF of the result?
>
>In its present state, our proposal specifies the result by referring to
the
>following table:
>
>Table A
>=======
>                |Operand 2
> Operand 1      |NFKD     NFKC      NFD   NFC
> -----------------+------------------------
>    NFKD        |NFKD     NFKC      NFD   NFC
>    NFKC        |NFKC     NFKC      NFD   NFC
>    NFD         |NFD      NFD       NFD   NFC
>    NFC         |NFC      NFC       NFC   NFC
>
>It has been suggested that the following would be preferable:
>
>
>Table B
>=======
>                |Operand 2
> Operand 1      |NFKD     NFKC      NFD   NFC
> -----------------+------------------------
>    NFKD        |NFKD     NFKC      NFKD  NFKC
>    NFKC        |NFKC     NFKC      NFKD  NFKC
>    NFD         |NFKD     NFKD      NFD   NFC
>    NFC         |NFKC     NFKC      NFC   NFC




I'm trying to make sense of these tables. Apparently, Table A consistently
applies a precedence of NFC > NFD > NFKC > NFKD. (I.e. the form for the
result should be the same as that of the operand with the highest form
according to this ordering.) Apparently, Table B gives a precedence to K
forms (K > ~K), and a precedence to C over D (C > D), but the first
ordering (K > ~K) is given higher priority over the second ordering (C >
D).

Actually, I don't think I'd go for either. Certainly, table B has a
concern: precedence given to the compatibility decompositions that occur in
NFKD and NFKC -- this results in removing distinctions that, in certain
situations, might be important. Table B should only be used with caution.

Both tables have an anomalous characteristic: if one operand is NFC, then
the result is always to be composed, but if one operand is NFKC and the
other is decomposed, then the result goes in two directions depending upon
the K or ~K property of the other operand. Why? That seems rather strange
to me. If the "Kompatibility" issue is orthogonal to the (de)composition
issue (which these tables follow, and which I think makes sense), then I
would think either C should always take precedence over D, or vice versa.
If we extract a portion from each table (and simpily because the operation
is commutative), we find

Sub-table A
=======
                |Operand 2
 Operand 1      |NFKD     NFD
----------------+--------------
    NFKC        |NFKC     NFD

Sub-table B
=======
                |Operand 2
 Operand 1      |NFKD     NFD
----------------+--------------
    NFKC        |NFKC     NFKD


Tables A and B could have just as readily had

Sub-table A.a
=======
                |Operand 2
 Operand 1      |NFKD     NFD
----------------+--------------
    NFKC        |NFKD     NFC

Sub-table B.a
=======
                |Operand 2
 Operand 1      |NFKD     NFD
----------------+--------------
    NFKC        |NFKD     NFKC

and I think that wouldn't have been any more or less motivated. It still
wouldn't make sense to me, though: I would have expected D to always have
precedence over C, as in Tables A.b and B.b:

Table A.b
=======
                |Operand 2
 Operand 1      |NFKD     NFKC      NFD   NFC
 -----------------+------------------------
    NFKD        |NFKD     NFKD      NFD   NFD
    NFKC        |NFKD     NFKC      NFD   NFC
    NFD         |NFD      NFD       NFD   NFD
    NFC         |NFD      NFC       NFD   NFC

Table B.b
=======
                |Operand 2
 Operand 1      |NFKD     NFKC      NFD   NFC
 -----------------+------------------------
    NFKD        |NFKD     NFKD      NFKD  NFKD
    NFKC        |NFKD     NFKC      NFKD  NFKC
    NFD         |NFKD     NFKD      NFD   NFD
    NFC         |NFKD     NFKC      NFD   NFC

or for C to always take precedence over D, as in Tables A.c and B.c:

Table A.c
=======
                |Operand 2
 Operand 1      |NFKD     NFKC      NFD   NFC
 -----------------+------------------------
    NFKD        |NFKD     NFKC      NFD   NFC
    NFKC        |NFKC     NFKC      NFC   NFC
    NFD         |NFD      NFC       NFD   NFC
    NFC         |NFC      NFC       NFC   NFC

Table B.c
=======
                |Operand 2
 Operand 1      |NFKD     NFKC      NFD   NFC
 -----------------+------------------------
    NFKD        |NFKD     NFKC      NFKD  NFKC
    NFKC        |NFKC     NFKC      NFKC  NFKC
    NFD         |NFKD     NFKC      NFD   NFC
    NFC         |NFKC     NFKC      NFC   NFC


(What a lot of alternatives!)

For the reason described above, I think compatibility decomposition should
be avoided if either operand did not use it (i.e. ~K > K). As for C vs. D,
I have a personal preference for D over C, but W3G has (with not invalid
reasons) chosen NFC as the preferred and recommended normalisation form in
any protocols that they create. As a result, I'd be inclined from all these
options to select Table A.c. It gives precedence to C over D, and it avoids
K unless both operands conform to K.



- Peter


---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>




- Peter


---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>


Reply via email to