On 1/19/2018 5:37 AM, Philippe Verdy wrote:
May be the IDN could accept a new combining diacritic (sort of right-side acute accent). After all the Kazakh intent is not to define a new separate character but a modification of base letter to create a single letter in their alphabet. So a proposal for COMBINING APOSTROPHE (whose spacing non-combining version is 02BC), so that SPACE+COMBINING APOSTROPHE will render exactly like 02BC.


In the case of TLD IDNs what is at issue is the fact that it "renders exactly like" 02BC (which renders exactly like 2019).

You can see the issue when you look at Andre's twitter tags: you can create two strings that look the same, but the part that is a hashtag is different. That is deemed an unacceptable security risk for TLD IDNs.

If you encoded such a combining character, it would also not be eligible for TLD IDNs.
A./

2018-01-18 19:51 GMT+01:00 Asmus Freytag via Unicode <unicode@unicode.org <mailto:unicode@unicode.org>>:

    Top level IDN domain names can not contain 02BC, nor 0027 or 2019.

    (RFC 6912 gives the rationale and RZ-LGR the implementation, see
    MSR-3 <https://www.icann.org/public-comments/msr-3-2018-01-17-en>)

    A./


    On 1/18/2018 3:00 AM, Andre Schappo via Unicode wrote:


    On 18 Jan 2018, at 08:21, Andre Schappo via Unicode
    <unicode@unicode.org <mailto:unicode@unicode.org>> wrote:



    On 16 Jan 2018, at 08:00, Richard Wordingham via Unicode
    <unicode@unicode.org <mailto:unicode@unicode.org>> wrote:

    On Mon, 15 Jan 2018 20:16:21 -0800
    James Kass via Unicode <unicode@unicode.org
    <mailto:unicode@unicode.org>> wrote:

    It will probably be the ASCII apostrophe. The stated intent favors
    the apostrophe over diacritics or special characters to ensure
    that
    the language can be input to computers with standard keyboards.

    Typing U+0027 into a word processor takes planning.  Of the
    three, it
    should obviously be the modifier letter U+02BC, but I think
    what gets
    stored will be U+0027 or the single quotation mark U+2019.

    However, we shouldn't overlook the diacritic mark U+0315
    COMBINING COMMA
    ABOVE RIGHT.

    Richard.

    I have just tested twitter hashtags and as one would expect,
    U+02BC does not break hashtags. See
    twitter.com/andreschappo/status/953903964722024448
    <http://twitter.com/andreschappo/status/953903964722024448>


    ...and, just in case
    twitter.com/andreschappo/status/953944089896083456
    <http://twitter.com/andreschappo/status/953944089896083456>
    <https://twitter.com/andreschappo/status/953944089896083456>

    André Schappo




Reply via email to