On 1/19/2018 5:37 AM, Philippe Verdy wrote:
May be the IDN could accept a new combining diacritic (sort of
right-side acute accent). After all the Kazakh intent is not to define
a new separate character but a modification of base letter to create a
single letter in their alphabet.
So a proposal for COMBINING APOSTROPHE (whose spacing non-combining
version is 02BC), so that SPACE+COMBINING APOSTROPHE will render
exactly like 02BC.
In the case of TLD IDNs what is at issue is the fact that it "renders
exactly like" 02BC (which renders exactly like 2019).
You can see the issue when you look at Andre's twitter tags: you can
create two strings that look the same, but the part that is a hashtag is
different. That is deemed an unacceptable security risk for TLD IDNs.
If you encoded such a combining character, it would also not be eligible
for TLD IDNs.
A./
2018-01-18 19:51 GMT+01:00 Asmus Freytag via Unicode
<unicode@unicode.org <mailto:unicode@unicode.org>>:
Top level IDN domain names can not contain 02BC, nor 0027 or 2019.
(RFC 6912 gives the rationale and RZ-LGR the implementation, see
MSR-3 <https://www.icann.org/public-comments/msr-3-2018-01-17-en>)
A./
On 1/18/2018 3:00 AM, Andre Schappo via Unicode wrote:
On 18 Jan 2018, at 08:21, Andre Schappo via Unicode
<unicode@unicode.org <mailto:unicode@unicode.org>> wrote:
On 16 Jan 2018, at 08:00, Richard Wordingham via Unicode
<unicode@unicode.org <mailto:unicode@unicode.org>> wrote:
On Mon, 15 Jan 2018 20:16:21 -0800
James Kass via Unicode <unicode@unicode.org
<mailto:unicode@unicode.org>> wrote:
It will probably be the ASCII apostrophe. The stated intent favors
the apostrophe over diacritics or special characters to ensure
that
the language can be input to computers with standard keyboards.
Typing U+0027 into a word processor takes planning. Of the
three, it
should obviously be the modifier letter U+02BC, but I think
what gets
stored will be U+0027 or the single quotation mark U+2019.
However, we shouldn't overlook the diacritic mark U+0315
COMBINING COMMA
ABOVE RIGHT.
Richard.
I have just tested twitter hashtags and as one would expect,
U+02BC does not break hashtags. See
twitter.com/andreschappo/status/953903964722024448
<http://twitter.com/andreschappo/status/953903964722024448>
...and, just in case
twitter.com/andreschappo/status/953944089896083456
<http://twitter.com/andreschappo/status/953944089896083456>
<https://twitter.com/andreschappo/status/953944089896083456>
André Schappo