Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-09 Thread Ken Whistler via Unicode



On 3/9/2018 9:29 AM, via Unicode wrote:
Documented increase such as scientific terms for new elements, flora 
and fauna, would seem to be not more one or two dozen a year. 


Indeed. Of the "urgently needed characters" added to the unified CJK 
ideographs for Unicode 11.0, two were obscure place name characters 
needed to complete mapping for the Japanese IT mandatory use of the Moji 
Joho collection.


The other three were newly standardized Chinese characters for 
superheavy elements that now have official designations by the IUPAC (as 
of December 2015): Nihonium (113), Tennessine (117) and Oganesson (118). 
The Chinese characters coined for those 3 were encoded at U+9FED, 
U+9FEC, and U+9FEB, respectively.


Oganesson, in particular, is of interest, as the heaviest known element 
produced to date. It is the subject of 1000's of hours of intense 
experimentation and of hundreds of scientific papers, but:


   ... since 2005, only five (possibly six) atoms of the nuclide ^294
   Og have been detected.


But we already have a Chinese character (pronounced ào) for Og, and a 
standardized Unicode code point for it: U+9FEB.


Next up: unobtanium and hardtofindium

--Ken



Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-09 Thread via Unicode

Dear Richard,

On 09.03.2018 07:06, Richard Wordingham via Unicode wrote:

On Thu, 08 Mar 2018 09:42:38 +0800
via Unicode  wrote:

to the best of my knowledge virtually no new characters used just 
for

names are under consideration, all the ones that are under
consideration are from before this century.


What I was interested in was the rate of generation of new
CJK characters in general, not just those for names.  I appreciate 
that

encoding is dominated by the backlog of older characters.



Impossible to give an accurate answer or even a reasonable guess.

As to those that would be condidates for Unicode, my guess would be not 
more than a few dozen a year. New  characters are not permitted in legal 
names. Fanasty Chinese characters used for a alien language or a mystery 
novel would not usually be suitable for encoding. Most new words in 
Chinese have more than one syllable and do not require any new 
characters. Documented increase such as scientific terms for new 
elements, flora and fauna, would seem to be not more one or two dozen a 
year.


Regards
John Knightley



Richard.




Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-09 Thread via Unicode

On 09.03.2018 09:17, Philippe Verdy via Unicode wrote:

This still leaves the question about how to write personal names !
IDS alone cannot represent them without enabling some "reasonable"
ligaturing (they dont have to match the exact strokes variants for
optimal placement, or with all possible simplifications).
Im curious to know how China, Taiwan, Singapore or Japan handle this
(for official records or in banks): like our personal signatures (as
digital images), and then using a simplified official record
(including the registration of romanized names)?

2018-03-09 0:06 GMT+01:00 Richard Wordingham via Unicode
:

In mainliand China the full back is to use pinyin capitals without tone 
marks, so ASCII. Passport have names printed in both Chinese characters 
and capitalised pinyin, both are legally valid. ID cards which people 
get when they turn 16 have the names in printed Chinese characters only. 
So these I assume must be printed using a system that has some 
characters not in UCS. Banks certainly don't have all these extra 
characters so they use capitalised pinyin for any characters they can 
not type.


Japan in CJK Ext F had 1,645 characters which included all characters 
required for names of poeple and places. So there should be no need for 
a fallback system, Unicode is enough, now


John Knightley


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-09 Thread Martin J. Dürst via Unicode

On 2018/03/09 10:22, Philippe Verdy via Unicode wrote:

As well how Chinese/Japanese post offices handle addresses written with
sinograms for personal names ? Is the expanded IDS form acceptable for
them, or do they require using Romanized addresses, or phonetic
approximations (Bopomofo in China, Kanas in Japan, Hangul in Korea) ?


They just see the printed form, not an encoding, and therefore no IDS. 
Many addresses use handwriting, which has its own variability. 
Variations such as those covered by IDSes are easily recognizable by 
people as being the same as the 'base' character, and OCR systems, if 
they are good enough to decipher handwriting, can handle such cases, 
too. Romanized addresses will be delivered because otherwise it would be 
difficult for foreigners to send anything. Pure Kana should work in 
Japan, although the postal employee will have a second look because it's 
extremely unusual. For Korea, these days, it will be mostly Hangul; I'm 
not sure whether addresses with Hanja would incur a delay. My guess 
would be that Bopomofo wouldn't work in mainland China (might work in 
Taiwan, not sure).


Regards,   Martin.


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-09 Thread Martin J. Dürst via Unicode

On 2018/03/09 10:17, Philippe Verdy via Unicode wrote:

This still leaves the question about how to write personal names !
IDS alone cannot represent them without enabling some "reasonable"
ligaturing (they don't have to match the exact strokes variants for optimal
placement, or with all possible simplifications).
I'm curious to know how China, Taiwan, Singapore or Japan handle this (for
official records or in banks): like our personal signatures (as digital
images), and then using a simplified official record (including the
registration of romanized names)?


This question seems to assume more of a difference between alphabetic 
and ideographic traditions. A name in ideographs, in the same way as a 
name in alphabetic characters, is defined by the characters that are 
used, not by stuff like stroke variants, etc. And virtually all names, 
even before the introduction of computers, and even more after that, use 
reasonably frequent characters.


The difference, at least in Japan, is that some people keep the 
ideograph before simplification in their official records, but they may 
or may not insist on its use in everyday practice. In most cases, both a 
traditional and a simplified variant are available. Examples are 広/廣, 
高/髙, 崎/﨑, and so on. I regularly hit such cases when grading, because 
our university database uses the formal (old) one, where students may 
not care about it and enter the new one on some system where they have 
to enter their name by themselves.


Apart from that, at least in Japan, signatures are used extremely 
rarely; it's mostly stamped seals, which are also kept as images by 
banks,...


Regards,   Martin.



Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-08 Thread Philippe Verdy via Unicode
As well how Chinese/Japanese post offices handle addresses written with
sinograms for personal names ? Is the expanded IDS form acceptable for
them, or do they require using Romanized addresses, or phonetic
approximations (Bopomofo in China, Kanas in Japan, Hangul in Korea) ?

2018-03-09 2:17 GMT+01:00 Philippe Verdy :

> This still leaves the question about how to write personal names !
> IDS alone cannot represent them without enabling some "reasonable"
> ligaturing (they don't have to match the exact strokes variants for optimal
> placement, or with all possible simplifications).
> I'm curious to know how China, Taiwan, Singapore or Japan handle this (for
> official records or in banks): like our personal signatures (as digital
> images), and then using a simplified official record (including the
> registration of romanized names)?
>
> 2018-03-09 0:06 GMT+01:00 Richard Wordingham via Unicode <
> unicode@unicode.org>:
>
>> On Thu, 08 Mar 2018 09:42:38 +0800
>> via Unicode  wrote:
>>
>> > to the best of my knowledge virtually no new characters used just for
>> > names are under consideration, all the ones that are under
>> > consideration are from before this century.
>>
>> What I was interested in was the rate of generation of new
>> CJK characters in general, not just those for names.  I appreciate that
>> encoding is dominated by the backlog of older characters.
>>
>> Richard.
>>
>
>


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-08 Thread Philippe Verdy via Unicode
This still leaves the question about how to write personal names !
IDS alone cannot represent them without enabling some "reasonable"
ligaturing (they don't have to match the exact strokes variants for optimal
placement, or with all possible simplifications).
I'm curious to know how China, Taiwan, Singapore or Japan handle this (for
official records or in banks): like our personal signatures (as digital
images), and then using a simplified official record (including the
registration of romanized names)?

2018-03-09 0:06 GMT+01:00 Richard Wordingham via Unicode <
unicode@unicode.org>:

> On Thu, 08 Mar 2018 09:42:38 +0800
> via Unicode  wrote:
>
> > to the best of my knowledge virtually no new characters used just for
> > names are under consideration, all the ones that are under
> > consideration are from before this century.
>
> What I was interested in was the rate of generation of new
> CJK characters in general, not just those for names.  I appreciate that
> encoding is dominated by the backlog of older characters.
>
> Richard.
>


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-08 Thread Richard Wordingham via Unicode
On Thu, 08 Mar 2018 09:42:38 +0800
via Unicode  wrote:

> to the best of my knowledge virtually no new characters used just for 
> names are under consideration, all the ones that are under
> consideration are from before this century.

What I was interested in was the rate of generation of new
CJK characters in general, not just those for names.  I appreciate that
encoding is dominated by the backlog of older characters.

Richard.


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-07 Thread via Unicode

On 08.03.2018 06:18, Philippe Verdy via Unicode wrote:

Additional note: the UCS will never large enough to support the
personal signatures of billions Chinese people living today or born
since milleniums, or jsut those to be born in the next century. 
Theres

a need to represent these names using composed strings. A reasonable
compositing/ligaturing process can then present almost all of them !



There is no such need, Chinese names are not formed in this way, if one 
just makes up a character how would others be able to read it, slight 
variants that add style to a character do not in Unicode count as new 
characters. Furthermore with government records in all computerised the 
are now strict rules on babies names in People's Reepulic of China, 
Taiwan, etc that prevent one making up new characters for names.


Whilst there are maybe a few thousand name CJK unified ideographs to 
add to UCS, there are tens of thousands of non-name CJK unified 
ideographs yet to be added.



Regards
John


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-07 Thread via Unicode

Dear Phillip

On 08.03.2018 05:12, Philippe Verdy via Unicode wrote:

So most of the growth in Han characters is caused by people inventing
and registering new sinograms for their own names, using the basic
principles of combining a phonogram and a distinctive semantic
character.


This is not correct. It is certainly not correct for CJK characrters 
added to Unicode, and to the best of my knowledge it one just makes up a 
new character for one's name it is now no longer possible to legally 
register it anywhere that uses Chinese characters. Take Extension F, 
over seven thousand characters of which nearly three thousand Japanese 
characters in Budhist texts, over one thousand Zhuang characters, naerly 
two thousand characters used in Korean historical texts.


Regards
John


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-07 Thread via Unicode

Dear Richard,

to the best of my knowledge virtually no new characters used just for 
names are under consideration, all the ones that are under consideration 
are from before this century. Some are only being submitted now, but 
that does not mean they are new in real life, just new to Unicode. Place 
names tend to be even older.


Regards
John

On 08.03.2018 04:26, Richard Wordingham via Unicode wrote:

On Mon, 05 Mar 2018 23:42:15 +0800
via Unicode  wrote:


In most cases the answer to the above may well be the same, the
unencoded names of people and places are not new names,


How many new characters are being devised per year?

Richard.




Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-07 Thread Andrew West via Unicode
On 7 March 2018 at 22:18, Philippe Verdy via Unicode
 wrote:
>
> Additional note: the UCS will never large enough to support the personal
> signatures of billions Chinese people living today or born since milleniums,
> or jsut those to be born in the next century. There's a need to represent
> these names using composed strings. A reasonable compositing/ligaturing
> process can then present almost all of them !

CJK characters invented for writing personal names are extremely rare,
and do not constitute a significant fraction of CJK ideographs
proposed for encoding. The majority of unencoded modern-use characters
in China (that are not systematic simplified forms of existing encoded
characters) are used in place names or in Chinese dialects or for
writing non-Chinese languages such as Zhuang.

Andrew


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-07 Thread Philippe Verdy via Unicode
Additional note: the UCS will never large enough to support the personal
signatures of billions Chinese people living today or born since
milleniums, or jsut those to be born in the next century. There's a need to
represent these names using composed strings. A reasonable
compositing/ligaturing process can then present almost all of them !


2018-03-07 23:13 GMT+01:00 Philippe Verdy :

> Note: I don't advocate "duplicate encoding" as you think. But probably the
> current IDS model is not sufficient to describe characters correctly, and
> that it may be augmented a bit (using variant codes or some additional
> joiners or diacritics?).
>
> But IDS strings are suitable for rendering as ligatures and this should be
> permitted, and should even be the standard way to represent personal names
> without making them depend on an unproved single distinctive presentation.
>
> E.g. someone writes his name with some personal strokes and uses it as its
> registered "signature"; he is then doing business or is cited in news with
> simplified presentation, and the Chinese authorities also use their own
> simplications. All these will designate the same person. But who is correct
> for the presentation of the character ? In my opinion it is only the person
> that invented it for themselve, as a personal signature, but this is not
> suitable for encoding (privacy and copyright issue). All the other
> presentation are legitimate, and we don't need additional encoding for it:
> the ligaturing of IDS strings is sufficient even if it does not match
> exactly the person's signature.
>
>
> 2018-03-07 23:04 GMT+01:00 Philippe Verdy :
>
>> I'm just speaking about the many yearly inventions of sinograms for
>> personal/proper names, not about the ues of traditional characters for
>> normal language.
>>
>> People just start by assembling components with common rules. Then they
>> enhance the produced character just like we personalize signatures. But for
>> me, all these look like personal signatures and are not neede for formal
>> encoding and even these persons will accept alternate presentations if it's
>> just to cite them (and would not like much that you imitate their personal
>> signature by standardizing it in a worldwide standard: I think many of
>> these encodings have severe privacy issues, possibly as well copyright
>> issues !).
>>
>>
>> 2018-03-07 22:35 GMT+01:00 Ken Whistler :
>>
>>>
>>>
>>> On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote:
>>>
 Shouldn't we create a variant of IDS, using combining joiners between
 Han base glyphs (then possibly augmented by variant selectors if there are
 significant differences on the simplification of rendered strokes for each
 component) ? What is really limiting us to do that ?


>>> Ummm ambiguity, lack of precision, complexity of model, pushback by
>>> stakeholders, likely failure of uptake by most implementers, duplication of
>>> representation, ...
>>>
>>> Do you think combining models of Han weren't already thought of years
>>> ago? They predated the original encoding of unified CJK in Unicode in 1992.
>>> They weren't viable then, and they aren't viable now, either, after 26
>>> years of Unicode implementation of unified CJK as atomic ideographs.
>>>
>>> --Ken
>>>
>>>
>>
>


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-07 Thread Philippe Verdy via Unicode
Note: I don't advocate "duplicate encoding" as you think. But probably the
current IDS model is not sufficient to describe characters correctly, and
that it may be augmented a bit (using variant codes or some additional
joiners or diacritics?).

But IDS strings are suitable for rendering as ligatures and this should be
permitted, and should even be the standard way to represent personal names
without making them depend on an unproved single distinctive presentation.

E.g. someone writes his name with some personal strokes and uses it as its
registered "signature"; he is then doing business or is cited in news with
simplified presentation, and the Chinese authorities also use their own
simplications. All these will designate the same person. But who is correct
for the presentation of the character ? In my opinion it is only the person
that invented it for themselve, as a personal signature, but this is not
suitable for encoding (privacy and copyright issue). All the other
presentation are legitimate, and we don't need additional encoding for it:
the ligaturing of IDS strings is sufficient even if it does not match
exactly the person's signature.


2018-03-07 23:04 GMT+01:00 Philippe Verdy :

> I'm just speaking about the many yearly inventions of sinograms for
> personal/proper names, not about the ues of traditional characters for
> normal language.
>
> People just start by assembling components with common rules. Then they
> enhance the produced character just like we personalize signatures. But for
> me, all these look like personal signatures and are not neede for formal
> encoding and even these persons will accept alternate presentations if it's
> just to cite them (and would not like much that you imitate their personal
> signature by standardizing it in a worldwide standard: I think many of
> these encodings have severe privacy issues, possibly as well copyright
> issues !).
>
>
> 2018-03-07 22:35 GMT+01:00 Ken Whistler :
>
>>
>>
>> On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote:
>>
>>> Shouldn't we create a variant of IDS, using combining joiners between
>>> Han base glyphs (then possibly augmented by variant selectors if there are
>>> significant differences on the simplification of rendered strokes for each
>>> component) ? What is really limiting us to do that ?
>>>
>>>
>> Ummm ambiguity, lack of precision, complexity of model, pushback by
>> stakeholders, likely failure of uptake by most implementers, duplication of
>> representation, ...
>>
>> Do you think combining models of Han weren't already thought of years
>> ago? They predated the original encoding of unified CJK in Unicode in 1992.
>> They weren't viable then, and they aren't viable now, either, after 26
>> years of Unicode implementation of unified CJK as atomic ideographs.
>>
>> --Ken
>>
>>
>


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-07 Thread Philippe Verdy via Unicode
I'm just speaking about the many yearly inventions of sinograms for
personal/proper names, not about the ues of traditional characters for
normal language.

People just start by assembling components with common rules. Then they
enhance the produced character just like we personalize signatures. But for
me, all these look like personal signatures and are not neede for formal
encoding and even these persons will accept alternate presentations if it's
just to cite them (and would not like much that you imitate their personal
signature by standardizing it in a worldwide standard: I think many of
these encodings have severe privacy issues, possibly as well copyright
issues !).


2018-03-07 22:35 GMT+01:00 Ken Whistler :

>
>
> On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote:
>
>> Shouldn't we create a variant of IDS, using combining joiners between Han
>> base glyphs (then possibly augmented by variant selectors if there are
>> significant differences on the simplification of rendered strokes for each
>> component) ? What is really limiting us to do that ?
>>
>>
> Ummm ambiguity, lack of precision, complexity of model, pushback by
> stakeholders, likely failure of uptake by most implementers, duplication of
> representation, ...
>
> Do you think combining models of Han weren't already thought of years ago?
> They predated the original encoding of unified CJK in Unicode in 1992. They
> weren't viable then, and they aren't viable now, either, after 26 years of
> Unicode implementation of unified CJK as atomic ideographs.
>
> --Ken
>
>


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-07 Thread Ken Whistler via Unicode



On 3/7/2018 1:12 PM, Philippe Verdy via Unicode wrote:
Shouldn't we create a variant of IDS, using combining joiners between 
Han base glyphs (then possibly augmented by variant selectors if there 
are significant differences on the simplification of rendered strokes 
for each component) ? What is really limiting us to do that ?




Ummm ambiguity, lack of precision, complexity of model, pushback by 
stakeholders, likely failure of uptake by most implementers, duplication 
of representation, ...


Do you think combining models of Han weren't already thought of years 
ago? They predated the original encoding of unified CJK in Unicode in 
1992. They weren't viable then, and they aren't viable now, either, 
after 26 years of Unicode implementation of unified CJK as atomic 
ideographs.


--Ken



Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-07 Thread Philippe Verdy via Unicode
So most of the growth in Han characters is caused by people inventing and
registering new sinograms for their own names, using the basic principles
of combining a phonogram and a distinctive semantic character.
It's like if we were encoding in the UCS the personal handwritten
signatures with our own choice.
Are these worth encoding ? Why can't we just encode most of them as a
sequence (phonogram, ideogram, and combining layout character) i.e. mostly
what IDS provide, except that they are descriptive but suited for the same
purpose.

Why can't those IDS be rendered as ligatures and then have those
"characters" being in fact ligatured IDS strings ?

Shouldn't the IRG better work on providing a disctionary of IDS strings
needed for people names, then allowing font providers in China to render
them as ligatures (the "representative glyph" of these ligatures would be
the official Chinese personal record for such use, and it would be enough
for the chinese administration).

After all this is what we are already doing by encoding in Unicode various
emoji sequences (then rendered as ligatures in a much more fuzzy way !)...

Shouldn't we create a variant of IDS, using combining joiners between Han
base glyphs (then possibly augmented by variant selectors if there are
significant differences on the simplification of rendered strokes for each
component) ? What is really limiting us to do that ?


2018-03-07 21:26 GMT+01:00 Richard Wordingham via Unicode <
unicode@unicode.org>:

> On Mon, 05 Mar 2018 23:42:15 +0800
> via Unicode  wrote:
>
> > In most cases the answer to the above may well be the same, the
> > unencoded names of people and places are not new names,
>
> How many new characters are being devised per year?
>
> Richard.
>


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-07 Thread Richard Wordingham via Unicode
On Mon, 05 Mar 2018 23:42:15 +0800
via Unicode  wrote:

> In most cases the answer to the above may well be the same, the 
> unencoded names of people and places are not new names,

How many new characters are being devised per year?

Richard.


Re: CJK Ideograph Encoding Velocity (was: Re: Unicode Emoji 11.0 characters now ready for adoption!)

2018-03-06 Thread via Unicode

Dear Ken,

the context of the question was how many characters in modern use are 
being encoded. Part of the answer is that there are several thousand 
Chinese characters that are names of people on places to be encoded. The 
limit of 1,000 characters a working set per member was for workings set 
2017, this is a new thing. If the same member limit is applied to future 
working sets, then the result will be that some of these characters 
identified in 2017. Some around 500 have been included in working set 
2017. Some will be included in the following working set which will most 
likely be in 2020 and if there is then also a limit of 1,000 characters 
per member then not all would be included. That would mean some would 
have to wait until 2022 before they can be submitted to IRG, which means 
at least 2027 before they are encoded. Names of pleople and places are 
not the only CJK unified ideographs that need to be encoded but they 
illustrate the problem that if future working have a 1,000 limit per 
member which submissions every 2 or 3 years, then it delay the encoding 
on CJK unified ideographs by years.


On 06.03.2018 01:40, Ken Whistler via Unicode wrote:

John,

I think this may be giving the list a somewhat misleading picture of
the actual statistics for encoding of CJK unified ideographs. The 
"500

characters a year" or "1000 characters a year" limits are
administrative limits set by the IRG for national bodies (and others)
submitting repertoire to the "working set" that the IRG then segments
into chunks for processing to prepare new increments for actual
encoding.



Here I was refering to the number of CJK unified ideogrpahs that the 
People's Republic of China can submit to IRG, the numbers are of course 
different for CJK  unified ideographs as a whole. A limit of 1,000 a 
working set means that the number of CJK unified ideographs in the 
People's Republic of China awaiting submission to IRG is most likely to 
increase not decreases for decades to come. For other IRG members that 
still have characters to submit a limit of 1,000 a working set most 
likely leads to a decrease in the number of CJK unified ideographs 
awaiting submission over time. In short the administrative limit of 
1,000 works to a degree for most IRG members, but not for the People's 
Republic of China.



In point of fact, if we take 1991 as the base year, the *average*
rate of encoding new CJK unified ideographs now stands at 3379 per
annum (87,860 as of Unicode 10.0). By "encoding" here, I mean, final,
finished publication of the encoded characters -- not the larger
number of potentially unifiable submissions that eventually go into a
publication increment. There is a gradual downward drift in that
number over time, because of the impact on the stats of the "big 
bang"

encoding of 42,711 ideographs for Extension B back in 2001, but
recently, the numbers have been quite consistent with an average
incremental rate of about 3000 new ideographs per year:



1991 to 2001 70,207 that is around seven thousand a year. However 2002 
to 2018 only 17,675 so around one thousand a year



5762 added for Extension E in 2015



These 5762 were submitted to IRG in 2001, so 14 years from submission 
to encoding.



7463 added for Extension F in 2017

~ 4934 to be added for Extension G, probably to be published in 2020

If you run the average calculation including Extension G, assuming
2020, you end up with a cumulative per annum rate of 3200, not much
different than the calculation done as of today.

And as for the implication that China, in particular, is somehow
limited by these numbers, one should note that the vast majority of
Extension G is associated with Chinese sources. Although a 
substantial

chunk is formally labeled with a "UK" source this time around, almost
all of those characters represent a roll-in of systematic
simplifications, of various sorts, associated with PRC usage. (People
who want to check can take a look at L2/17-366R in the UTC document
registry.)



Extension G was before the 1,000 character per memeber limit. Whatever 
the UK characters submitted were, the largest single Chinese source was 
in fact over one thousand Zhuang characters submitted by People's 
Republic of Chhina not "systematic simplifications". It would certainly 
be incorrect to think that the vaste majority of CJK unified ideographs 
to be encoded are "systematic simplifications".


Regards
John



--Ken


On 3/5/2018 7:13 AM, via Unicode wrote:

Dear All,

to simplify discussion I have split the points. 

CJK Ideograph Encoding Velocity (was: Re: Unicode Emoji 11.0 characters now ready for adoption!)

2018-03-05 Thread Ken Whistler via Unicode

John,

I think this may be giving the list a somewhat misleading picture of the 
actual statistics for encoding of CJK unified ideographs. The "500 
characters a year" or "1000 characters a year" limits are administrative 
limits set by the IRG for national bodies (and others) submitting 
repertoire to the "working set" that the IRG then segments into chunks 
for processing to prepare new increments for actual encoding.


In point of fact, if we take 1991 as the base year, the *average* rate 
of encoding new CJK unified ideographs now stands at 3379 per annum 
(87,860 as of Unicode 10.0). By "encoding" here, I mean, final, finished 
publication of the encoded characters -- not the larger number of 
potentially unifiable submissions that eventually go into a publication 
increment. There is a gradual downward drift in that number over time, 
because of the impact on the stats of the "big bang" encoding of 42,711 
ideographs for Extension B back in 2001, but recently, the numbers have 
been quite consistent with an average incremental rate of about 3000 new 
ideographs per year:


5762 added for Extension E in 2015

7463 added for Extension F in 2017

~ 4934 to be added for Extension G, probably to be published in 2020

If you run the average calculation including Extension G, assuming 2020, 
you end up with a cumulative per annum rate of 3200, not much different 
than the calculation done as of today.


And as for the implication that China, in particular, is somehow limited 
by these numbers, one should note that the vast majority of Extension G 
is associated with Chinese sources. Although a substantial chunk is 
formally labeled with a "UK" source this time around, almost all of 
those characters represent a roll-in of systematic simplifications, of 
various sorts, associated with PRC usage. (People who want to check can 
take a look at L2/17-366R in the UTC document registry.)


--Ken


On 3/5/2018 7:13 AM, via Unicode wrote:

Dear All,

to simplify discussion I have split the points. 

Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-05 Thread via Unicode


Dear All,

here is reply to points one and two.

On 05.03.2018 16:57, Phake Nick via Unicode wrote:

在 2018年3月5日週一 13:25,Martin J. Dürst via Unicode
 寫道:


Hello John,

On 2018/03/01 12:31, via Unicode wrote:

> Pen, or brush and paper is much more flexible. With thousands of
names
> of people and places still not encoded I am not sure if I would
describe
> hans (simplified Chinese characters) as well supported. nor with
current
> policy which limits China with over one billion people to
submitting
> less than 500 Chinese characters a year on average, and names not
being
> all to be added, it is hard to say which decade hans will be well
> supported.

I think this contains several misunderstandings. First, of course
pen/brush and paper are more flexible than character encoding, but
thats true for the Latin script, too.


In latin script, as an example, I can simply name myself "Phake", but
in Chinese with current Unicode-based environment, it would not be
possible for me to randomly name myself using a character  ⿰牜爲
as I would like to.


Second, while I have heard that people create new characters for
naming
a baby in a traditional Han context, I havent heard about this in a
simplified Han context. And its not frequent at all, the same way
naming a baby John in the US is way more frequent than lets say
Qvtwzx.
Id also assume that China has regulations on what characters can be
used to name a baby, and that the parents in this age of smartphone
communication will think at least twice before giving their baby a
name
that they cannot send to their relatives via some chat app.




In most cases the answer to the above may well be the same, the 
unencoded names of people and places are not new names, but rather names 
of places and poeple in use from before Unicode and often before 
computers. In IRG #48 People's Republic of China 
http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2187ChinaActivityReport.pdf 
that states of over 3,000 names of people and places are under 
condideration for IRG working set 2017 and at least half require 
encoding. The document also list other categories of CJK ideographs 
under consideration for submission to Unicode.


Regards
John





Links:types
--
[1] mailto:unicode@unicode.org




Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-05 Thread via Unicode

Dear All,

to simplify discussion I have split the points.

On 05.03.2018 16:57, Phake Nick via Unicode wrote:

在 2018年3月5日週一 13:25,Martin J. Dürst via Unicode
 寫道:


Hello John,

On 2018/03/01 12:31, via Unicode wrote:

Third, I cannot confirm or deny the "500 characters a year" limit, 
but
I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real 
need
to encode more characters, everybody would find a way to handle 
these.



Due to the nature of your claims, it's difficult to falsify many of
them. It would be easier to prove them (assuming they were true), 
so if

you have any supporting evidence, please provide it.


Chinese characters for Unicode first go to IRG (or ISO/IEC 
JTC1/SC2/WG2/IRG) website. The limit of 500 a year for China is an 
average based on IRG #48 document regarding working set 2017 
http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2220_IRG48Recommends.pdf 
which explicitly states "each submission shall not exceed 1,000 
characters". The People's Republic of China as one member of IRG is 
limited to 1,000 characters, which hopefully we can all agree has a 
population of over 1,000,000,000 , therefore was limited to submitting 
at most 1,000 characters. The earliest possible date for the next 
working set is two or three years later, that is 2019 or 2020, so that's 
an average limit of either 500 or 333 characters a year.


Regards
John


Regards,   Martin.





Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-05 Thread Phake Nick via Unicode
ah right that's it.

2018年3月5日 19:25 於 "James Kass"  寫道:

Phake Nick wrote,


> In latin script, as an example, I can simply name myself
> "Phake", but in Chinese with current Unicode-based environment,
> it would not be possible for me to randomly name myself using
> a character  ⿰牜爲

Isn't that U+246E8? "䛨"


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-05 Thread James Kass via Unicode
Phake Nick wrote,

> In latin script, as an example, I can simply name myself
> "Phake", but in Chinese with current Unicode-based environment,
> it would not be possible for me to randomly name myself using
> a character  ⿰牜爲

Isn't that U+246E8? "䛨"



Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-05 Thread Phake Nick via Unicode
在 2018年3月5日週一 13:25,Martin J. Dürst via Unicode  寫道:

> Hello John,
>
> On 2018/03/01 12:31, via Unicode wrote:
>
> > Pen, or brush and paper is much more flexible. With thousands of names
> > of people and places still not encoded I am not sure if I would describe
> > hans (simplified Chinese characters) as well supported. nor with current
> > policy which limits China with over one billion people to submitting
> > less than 500 Chinese characters a year on average, and names not being
> > all to be added, it is hard to say which decade hans will be well
> > supported.
>
> I think this contains several misunderstandings. First, of course
> pen/brush and paper are more flexible than character encoding, but
> that's true for the Latin script, too.
>

In latin script, as an example, I can simply name myself "Phake", but in
Chinese with current Unicode-based environment, it would not be possible
for me to randomly name myself using a character  ⿰牜爲 as I would like to.


> Second, while I have heard that people create new characters for naming
> a baby in a traditional Han context, I haven't heard about this in a
> simplified Han context. And it's not frequent at all, the same way
> naming a baby John in the US is way more frequent than let's say Qvtwzx.
> I'd also assume that China has regulations on what characters can be
> used to name a baby, and that the parents in this age of smartphone
> communication will think at least twice before giving their baby a name
> that they cannot send to their relatives via some chat app.
>

Traditional character versus simplified characters in this context is just
like Fraktur vs Antiqua. The way to write some components have been changed
and then there are also orthographical changes that make some characters no
longer comprise of same component, but they are still Chinese characters
and their usage are still unchanged. I believe there are regulations on
naming but that regulations would have be manmade to adopt to the
limitations of current computational system. Plus, once in a while I still
often heard about news that people are having difficulties in using e.g.
train booking system or banking systems due to characters that they are
using. (Although in many case those are encoded characters not supported by
system)


> Third, I cannot confirm or deny the "500 characters a year" limit, but
> I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real need
> to encode more characters, everybody would find a way to handle these.


> Due to the nature of your claims, it's difficult to falsify many of
> them. It would be easier to prove them (assuming they were true), so if
> you have any supporting evidence, please provide it.
>
> Regards,   Martin.
>
> > John Knightley
>
>


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-04 Thread Martin J. Dürst via Unicode

Hello John,

On 2018/03/01 12:31, via Unicode wrote:

Pen, or brush and paper is much more flexible. With thousands of names 
of people and places still not encoded I am not sure if I would describe 
hans (simplified Chinese characters) as well supported. nor with current 
policy which limits China with over one billion people to submitting 
less than 500 Chinese characters a year on average, and names not being 
all to be added, it is hard to say which decade hans will be well 
supported.


I think this contains several misunderstandings. First, of course 
pen/brush and paper are more flexible than character encoding, but 
that's true for the Latin script, too.


Second, while I have heard that people create new characters for naming 
a baby in a traditional Han context, I haven't heard about this in a 
simplified Han context. And it's not frequent at all, the same way 
naming a baby John in the US is way more frequent than let's say Qvtwzx. 
I'd also assume that China has regulations on what characters can be 
used to name a baby, and that the parents in this age of smartphone 
communication will think at least twice before giving their baby a name 
that they cannot send to their relatives via some chat app.


Third, I cannot confirm or deny the "500 characters a year" limit, but 
I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real need 
to encode more characters, everybody would find a way to handle these.


Due to the nature of your claims, it's difficult to falsify many of 
them. It would be easier to prove them (assuming they were true), so if 
you have any supporting evidence, please provide it.


Regards,   Martin.


John Knightley




Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-02 Thread Mark Davis ☕️ via Unicode
No, the patterns should always have the right format. However, in the
supplemental data there is information as to the preferred data for each
language. This data isn't collected through the ST, so a ticket needs to be
filed.

In your particular case, the data has:



If DE just doesn't use hB, then you can file a ticket to say that it
shouldn't be in @allowed.

Note that the format permits either regions or locales, as in:




As to involvement, we try to encourage interaction on the forum. In some
languages those are quite active; in others not so much. (BTW, a number of
your suggestions made sense to me, but not being a native German speaker, I
don't weigh in on de.xml except for structural issues or where people seem
to miss the intent.) So people may look at the forum, disagree with the
proposal, but not respond why they disagree.



Mark

On Fri, Mar 2, 2018 at 3:22 PM, Christoph Päper via Unicode <
unicode@unicode.org> wrote:

> F'up2: cldr-us...@unicode.org
>
> Doug Ewell via unicode@unicode.org:
> >
> > I think that is a measurement of locale coverage -- whether the
> > collation tables and translations of "a.m." and "p.m." and "a week ago
> > Thursday" are correct and verified -- not character coverage.
>
> By the way, the binary `am` vs. `pm` distinction common in English and
> labelled `a` as a placeholder in CLDR formats is too simplistic for some
> languages when using the 12-hour clock (which they usually don't in written
> language). In German, for instance, you would always use a format with `B`
> instead (i.e. "morgens", "mittags", "abends", "nachts" or no identifier
> during daylight).
>
> How and where can I best suggest to change this in CLDR? The B formats
> have their own code, e.g. `Bhms` = `h:mm:ss B`. Should I just propose to
> set `hms` etc. to the same value next time the Survey Tool is open?
>
> In my experience, there are too few people reviewing even the "largest"
> languages (like German). I participated in v32 and v33, but other than me
> there were only contributions from (seemingly) a single employee from each
> of Apple, Google and Microsoft. Most improvements or corrections I
> suggested just got lost, i.e. nobody discussed or voted on them, so the old
> values remained.
>


RE: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-02 Thread Christoph Päper via Unicode
F'up2: cldr-us...@unicode.org

Doug Ewell via unicode@unicode.org:
> 
> I think that is a measurement of locale coverage -- whether the
> collation tables and translations of "a.m." and "p.m." and "a week ago
> Thursday" are correct and verified -- not character coverage.

By the way, the binary `am` vs. `pm` distinction common in English and labelled 
`a` as a placeholder in CLDR formats is too simplistic for some languages when 
using the 12-hour clock (which they usually don't in written language). In 
German, for instance, you would always use a format with `B` instead (i.e. 
"morgens", "mittags", "abends", "nachts" or no identifier during daylight).

How and where can I best suggest to change this in CLDR? The B formats have 
their own code, e.g. `Bhms` = `h:mm:ss B`. Should I just propose to set `hms` 
etc. to the same value next time the Survey Tool is open?

In my experience, there are too few people reviewing even the "largest" 
languages (like German). I participated in v32 and v33, but other than me there 
were only contributions from (seemingly) a single employee from each of Apple, 
Google and Microsoft. Most improvements or corrections I suggested just got 
lost, i.e. nobody discussed or voted on them, so the old values remained.


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-02 Thread Mark Davis ☕️ via Unicode
Right, Doug. I'll say a few more words.

In terms of language support, encoding of new characters in Unicode
benefits mostly digital heritage languages (via representation of historic
languages in Unicode, enabling preservation and scholarly work), although
there are some modern-use cases like Hanifi Rohingya. We do include digital
heritage under the umbrella of "digitally disadvantaged languages", but we
are not consistent in our terminology sometimes.

But encoding is just a first step. A vital first step, but just one step.

People tend to forget that adding new characters is just a part of what
Unicode does. For script support, it is just as important to have correct
Unicode algorithms and properties, such as correct values for the
Indic_Positional_Category
property (which together with the related work in with the Universal
Shaping Engine, allows for proper rendering of many languages). Behind the
scenes we have people like Ken and Laurentiu who have to dig through the
encoding proposals and fill in the many, many gaps to come up with
reasonable properties for such basic behavior as line-break.

As important as the work is on encoding, properties, and algorithms, when
we go up a level we get CLDR and ICU. Those have more impact on language
support for far more people in the world than the addition of new scripts
does. After all, approaching half of the population of the globe owns
smartphones: ICU provides programmatic access to the Unicode encoding,
properties, and algorithms, and CLDR + ICU together provide the core
language support on essentially every one of those smartphones.

But in terms of language coverage, the chart you reference (and the
corresponding
graph ) show
how very far CLDR still has to go. So we are gearing up for ways to extend
that graph: to move at least the basic coverage (the lower plateau in that
graph) to more languages, and to move basic-coverage languages up to more
in-depth coverage. We are focusing on ways to improve the CLDR survey tool
backend and frontend, since we know it currently cannot able to handle the
number of people that want to contribute, and has glitches in the UI that
make it clumsier to use than it should be.

Well, this turned out to be more than just a few words... sorry for going
on!

Mark

On Thu, Mar 1, 2018 at 9:10 PM, Doug Ewell via Unicode 
wrote:

> Tim Partridge wrote:
>
> > Perhaps the CLDR work the Consortium does is being referenced. That is
> > by language on this list
> > http://www.unicode.org/cldr/charts/32/supplemental/locale_
> coverage.html#ee
> > By the time it gets to the 100th entry the Modern percentage has "room
> > for improvement".
>
> I think that is a measurement of locale coverage -- whether the
> collation tables and translations of "a.m." and "p.m." and "a week ago
> Thursday" are correct and verified -- not character coverage.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>
>


RE: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-01 Thread Doug Ewell via Unicode
Tim Partridge wrote:

> Perhaps the CLDR work the Consortium does is being referenced. That is
> by language on this list
> http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html#ee
> By the time it gets to the 100th entry the Modern percentage has "room
> for improvement".

I think that is a measurement of locale coverage -- whether the
collation tables and translations of "a.m." and "p.m." and "a week ago
Thursday" are correct and verified -- not character coverage.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org




RE: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-01 Thread Tim Partridge via Unicode
Perhaps the CLDR work the Consortium does is being referenced. That is by 
language on this list 
http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html#ee By 
the time it gets to the 100th entry the Modern percentage has "room for 
improvement".

Regards,

Tim

From: Unicode [unicode-boun...@unicode.org] on behalf of James Kass via Unicode 
[unicode@unicode.org]
Sent: 01 March 2018 11:11
To: Unicode Public
Subject: Re: Unicode Emoji 11.0 characters now ready for adoption!

Here's a good opening line:

"The Unicode Standard encodes scripts rather than languages."

https://www.unicode.org/standard/supported.html

But, quoting from this page:

http://www.unicode.org/consortium/aboutdonations.html

" ... and provide universal access for the world's languages—past,
present, and future. The Consortium lays the groundwork to enable
universal access by encoding the characters for the world’s languages,
..."

That's inaccurate.  Languages don't use characters, technically.  It's
more about providing universal access for the world's communication,
data, and history.  You know, the sum of mankind's knowledge that's
been digitized so far.  Unicode encodes the characters used for the
world's computer data interchange and storage systems.

Salesmen and techies have different requirements for accuracy, however.




Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-01 Thread James Kass via Unicode
Here's a good opening line:

"The Unicode Standard encodes scripts rather than languages."

https://www.unicode.org/standard/supported.html

But, quoting from this page:

http://www.unicode.org/consortium/aboutdonations.html

" ... and provide universal access for the world's languages—past,
present, and future. The Consortium lays the groundwork to enable
universal access by encoding the characters for the world’s languages,
..."

That's inaccurate.  Languages don't use characters, technically.  It's
more about providing universal access for the world's communication,
data, and history.  You know, the sum of mankind's knowledge that's
been digitized so far.  Unicode encodes the characters used for the
world's computer data interchange and storage systems.

Salesmen and techies have different requirements for accuracy, however.



Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-01 Thread James Kass via Unicode
Christoph Päper wrote,

>> There are approximately 7,000 living human languages,
>> but fewer than 100 of these languages are well-supported on computers,
>> ...
>
> Why is the announcement mentioning those numbers of languages at all?
> The script coverage of written living human languages, except
> for constructed ones, is almost complete in Unicode and rendering
> for most of them is reasonably well supported by all modern
> operating systems ...

This page ...
https://www.unicode.org/standard/unsupported.html
... lists several modern scripts which are not yet encoded.  (Hanifi
Rohingya, Gunjala Gondi, Loma, Medefaidrin, Naxi Dongba (Moso), and
Nyiakeng Puachue Hmong.)  It's noted that there are additional unencoded
"minor modern scripts" shown on the Roadmap, which implies that those
listed are also "minor".


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread via Unicode

On 01.03.2018 06:33, Philippe Verdy via Unicode wrote:

2018-02-28 14:22 GMT+01:00 Christoph Päper via
Unicode :


There are approximately 7,000 living human languages,
but fewer than 100 of these languages are well-supported on

computers,

mobile phones, and other devices.


Fewer than 100 languages is a bit small, I can count nearly about 200
languages well supported with all the necessary basic support to
develop them with content. The limitation however is elsewhere: in
education and litteracy level for these languages so that people 
start

using them as well on the web and in other medias or use them more
easily in their daily life and improve the quality and coverage of
data available in these languages. This includes developing an
orthography (many languages dont have any developed and supported
orthography, even if there was attempts to create dictionnaries,
including online with Wikitionary).

With the encoded scripts, you can already type and view correctly
thousands of languages. This these languages are living, it should 
not

be difficult to support most of them with the existing scripts that
are already encoded (weve reched the point where we only have to
encode historic scripts, to preserve the cultures or languages that
have disappeared or are dying fast since the begining of the 20th
century). Even if major languages will persist and regional languages
will die, this should not be done without reintegrating in those 
major

languages some significant parts of the past regional cultures, which
can still become sources for enriching these major languages so that
they become more precise and more useful and allow then easier access
to past regional languages, possibly then directly in their original
script, with people then able to decipher them or being interested to
study them. Past languages and preserved texts will then remain as a
rich source for keeping existing languages alive, vivid, productive
for new terms, without having to necessarily borrow terms from less
than 20 large "international" languages (ar, de, en, es, fa, fr, nl,
id, ja, ko, pt, ru, hi, zh), written in only 6 well developed scripts
(Arab, Latn, Cyrl, Deva, Hang, Hans, Jpan).



Pen, or brush and paper is much more flexible. With thousands of names 
of people and places still not encoded I am not sure if I would describe 
hans (simplified Chinese characters) as well supported. nor with current 
policy which limits China with over one billion people to submitting 
less than 500 Chinese characters a year on average, and names not being 
all to be added, it is hard to say which decade hans will be well 
supported.


John Knightley




Links:
--
[1] mailto:unicode@unicode.org




Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Philippe Verdy via Unicode
 2018-02-28 14:22 GMT+01:00 Christoph Päper via Unicode :

> > There are approximately 7,000 living human languages,
> > but fewer than 100 of these languages are well-supported on computers,
> > mobile phones, and other devices.


Fewer than 100 languages is a bit small, I can count nearly about 200
languages well supported with all the necessary basic support to develop
them with content. The limitation however is elsewhere: in education and
litteracy level for these languages so that people start using them as well
on the web and in other medias or use them more easily in their daily life
and improve the quality and coverage of data available in these languages.
This includes developing an orthography (many languages don't have any
developed and supported orthography, even if there was attempts to create
dictionnaries, including online with Wikitionary).

With the encoded scripts, you can already type and view correctly thousands
of languages. This these languages are living, it should not be difficult
to support most of them with the existing scripts that are already encoded
(we've reched the point where we only have to encode historic scripts, to
preserve the cultures or languages that have disappeared or are dying fast
since the begining of the 20th century). Even if major languages will
persist and regional languages will die, this should not be done without
reintegrating in those major languages some significant parts of the past
regional cultures, which can still become sources for enriching these major
languages so that they become more precise and more useful and allow then
easier access to past regional languages, possibly then directly in their
original script, with people then able to decipher them or being interested
to study them. Past languages and preserved texts will then remain as a
rich source for keeping existing languages alive, vivid, productive for new
terms, without having to necessarily borrow terms from less than 20 large
"international" languages (ar, de, en, es, fa, fr, nl, id, ja, ko, pt, ru,
hi, zh), written in only 6 well developed scripts (Arab, Latn, Cyrl, Deva,
Hang, Hans, Jpan).


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Andrew West via Unicode
On 28 February 2018 at 13:22, Christoph Päper via Unicode
 wrote:
>>
>> The 157 new Emoji are now available for adoption
>
> But Unicode 11.0 (which all new emojis but Pirate Flag and Infinity rely 
> upon) is not even in beta yet.

Don't even get me started on that!

>> There are approximately 7,000 living human languages,
>> but fewer than 100 of these languages are well-supported on computers,
>> mobile phones, and other devices. Adopt-a-character donations are used
>> to improve Unicode support for digitally disadvantaged languages, and to
>> help preserve the world’s linguistic heritage.
>
> Why is the announcement mentioning those numbers of languages at all?

I agree, the figures are meaningless and misleading (and intended to
mislead). I could list a hundred languages that are written with the
Latin script without pausing for breath. There are very very few
scripts in modern daily use that are not yet encoded in the UCS, but
letting out that secret will not help the Unicode Consortium to raise
money from character adoption.

The latest grant to Anshu from Character Adoption money is for three
historic scripts
(http://blog.unicode.org/2018/02/adopt-character-grant-to-support-three.html).
If there were still so many digitally disadvantaged languages urgently
in need of script encoding then surely the Unicode Consortium would be
sponsoring those as a priority rather than historic scripts.

Andrew



Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Andrew West via Unicode
On 28 February 2018 at 10:48, Martin J. Dürst via Unicode
 wrote:
>>
>>> The 157 new Emoji are now available for adoption, to help the Unicode
>>> Consortium’s work on digitally disadvantaged languages.
>>
>> I'm quite curious what it the relation between the new emojis and the
>> digitally disadvantages languages. I see none.
>
> I think this was mentioned before on this list, in particular by Mark:
> The money collected from character adoptions (where emoji are a prominent
> target) is (mostly?) used to support work on not-yet-encoded (thus digitally
> disadvantaged) scripts.

Over $250,000 has been raised from Unicode character adoptions to
date. I am curious as to how much of this money has been spent, and
would very much like to see annual accounts showing how much money has
been received, and how much has been disbursed to whom and for what.

Andrew



. See e.g. the recent announcement at
> http://blog.unicode.org/2018/02/adopt-character-grant-to-support-three.html.
>
> Regards,   Martin.



Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Christoph Päper via Unicode
announceme...@unicode.org:
> 
> The 157 new Emoji are now available for adoption 
> ,

But Unicode 11.0 (which all new emojis but Pirate Flag and Infinity rely upon) 
is not even in beta yet.


> There are approximately 7,000 living human languages, 
> but fewer than 100 of these languages are well-supported on computers, 
> mobile phones, and other devices. Adopt-a-character donations are used 
> to improve Unicode support for digitally disadvantaged languages, and to 
> help preserve the world’s linguistic heritage.

Why is the announcement mentioning those numbers of languages at all? 
The script coverage of written living human languages, except for constructed 
ones, is almost complete in Unicode and rendering for most of them is 
reasonably well supported by all modern operating systems (despite recently 
discovered bugs). Availability of translations or original material is another 
matter entirely. Languages that have no literal tradition are irrelevant to 
Unicode (but not to the world's linguistic heritage). 

In other words, no future update to the UCS will significantly change that 100 
out of 7000 metric, but the announcement makes it sound like it would. CLDR may 
have some influence, but character adoptions and the research grants they 
enable are not at all associated with that.



Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Mark Davis ☕️ via Unicode
I'm more interested in what areas you found unclear, because wherever you
did I'm sure many others would as well. You can reply off-list if you want.

Mark

Mark

On Wed, Feb 28, 2018 at 12:22 PM, Janusz S. Bień 
wrote:

>
> Thanks to all who answered. The answers are very clear, but the original
> message and the adoption page are in my opinion much less clear. I can
> however live with it :-)
>
> Best regards
>
> Janusz
>
> On Wed, Feb 28 2018 at 11:53 +0100, m...@macchiato.com writes:
> > Also, please click through from the announcement to
> http://www.unicode.org/consortium/adopt-a-character.html.
> >
> > If it isn't apparent from that page what the relationship is, we have
> some work to do...
> >
> > Mark
>
> > On Wed, Feb 28, 2018 at 11:48 AM, Martin J. Dürst via Unicode <
> unicode@unicode.org> wrote:
> >
> >  On 2018/02/28 19:38, Janusz S. Bień via Unicode wrote:
> >
> >  On Tue, Feb 27 2018 at 13:45 -0800, announceme...@unicode.org writes:
> >
> >  The 157 new Emoji are now available for adoption, to help the Unicode
> >  Consortium’s work on digitally disadvantaged languages.
> >
> >  I'm quite curious what it the relation between the new emojis and the
> >  digitally disadvantages languages. I see none.
> >
> >  I think this was mentioned before on this list, in particular by Mark:
> >  The money collected from character adoptions (where emoji are a
> prominent target) is (mostly?) used to support work on not-yet-encoded
> (thus digitally
> >  disadvantaged) scripts. See e.g. the recent announcement at
> http://blog.unicode.org/2018/02/adopt-character-grant-to-
> support-three.html.
>
>
>
> --
>,
> Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra
> Lingwistyki Formalnej)
> Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
> jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~
> jsbien/
>


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Janusz S. Bień via Unicode

Thanks to all who answered. The answers are very clear, but the original
message and the adoption page are in my opinion much less clear. I can
however live with it :-)

Best regards

Janusz

On Wed, Feb 28 2018 at 11:53 +0100, m...@macchiato.com writes:
> Also, please click through from the announcement to 
> http://www.unicode.org/consortium/adopt-a-character.html.
>
> If it isn't apparent from that page what the relationship is, we have some 
> work to do...
>
> Mark

> On Wed, Feb 28, 2018 at 11:48 AM, Martin J. Dürst via Unicode 
>  wrote:
>
>  On 2018/02/28 19:38, Janusz S. Bień via Unicode wrote:
>
>  On Tue, Feb 27 2018 at 13:45 -0800, announceme...@unicode.org writes:
>
>  The 157 new Emoji are now available for adoption, to help the Unicode
>  Consortium’s work on digitally disadvantaged languages.
>
>  I'm quite curious what it the relation between the new emojis and the
>  digitally disadvantages languages. I see none.
>
>  I think this was mentioned before on this list, in particular by Mark:
>  The money collected from character adoptions (where emoji are a prominent 
> target) is (mostly?) used to support work on not-yet-encoded (thus digitally
>  disadvantaged) scripts. See e.g. the recent announcement at 
> http://blog.unicode.org/2018/02/adopt-character-grant-to-support-three.html.



-- 
   ,   
Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki 
Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Mark Davis ☕️ via Unicode
Also, please click through from the announcement to
http://www.unicode.org/consortium/adopt-a-character.html.

If it isn't apparent from that page what the relationship is, we have some
work to do...

Mark

On Wed, Feb 28, 2018 at 11:48 AM, Martin J. Dürst via Unicode <
unicode@unicode.org> wrote:

> On 2018/02/28 19:38, Janusz S. Bień via Unicode wrote:
>
>> On Tue, Feb 27 2018 at 13:45 -0800, announceme...@unicode.org writes:
>>
>> The 157 new Emoji are now available for adoption, to help the Unicode
>>> Consortium’s work on digitally disadvantaged languages.
>>>
>>
>> I'm quite curious what it the relation between the new emojis and the
>> digitally disadvantages languages. I see none.
>>
>
> I think this was mentioned before on this list, in particular by Mark:
> The money collected from character adoptions (where emoji are a prominent
> target) is (mostly?) used to support work on not-yet-encoded (thus
> digitally disadvantaged) scripts. See e.g. the recent announcement at
> http://blog.unicode.org/2018/02/adopt-character-grant-to-sup
> port-three.html.
>
> Regards,   Martin.
>


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Martin J. Dürst via Unicode

On 2018/02/28 19:38, Janusz S. Bień via Unicode wrote:

On Tue, Feb 27 2018 at 13:45 -0800, announceme...@unicode.org writes:


The 157 new Emoji are now available for adoption, to help the Unicode
Consortium’s work on digitally disadvantaged languages.


I'm quite curious what it the relation between the new emojis and the
digitally disadvantages languages. I see none.


I think this was mentioned before on this list, in particular by Mark:
The money collected from character adoptions (where emoji are a 
prominent target) is (mostly?) used to support work on not-yet-encoded 
(thus digitally disadvantaged) scripts. See e.g. the recent announcement 
at 
http://blog.unicode.org/2018/02/adopt-character-grant-to-support-three.html.


Regards,   Martin.


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-02-28 Thread Janusz S. Bień via Unicode
On Tue, Feb 27 2018 at 13:45 -0800, announceme...@unicode.org writes:

> The 157 new Emoji are now available for adoption, to help the Unicode
> Consortium’s work on digitally disadvantaged languages.

I'm quite curious what it the relation between the new emojis and the
digitally disadvantages languages. I see none.

Best regards

Janusz

-- 
   ,   
Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki 
Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/