Fwd: Emoji as East Asian Width = Wide

2018-03-05 Thread Oren Watson via Unicode
EAW is used in fixed-width settings to distinguish characters that should
take up one space versus two. I would also prefer that all these be
considered wide, since otherwise it causes format problems in these
settigns.
(unfortunately fixed-width appear to be largley ignored by unicode... 🙁)

On Sun, Mar 4, 2018 at 10:54 PM, fantasai via Unicode 
wrote:

> Why are the new emoji like U+1F600 Grinning Face EAW=Wide
> when other dingbats like U+263A Smiling Face are EAW=Neutral?
> This is making it difficult to have consistent formatting
> across emoticons. Also, emoji aren't really CJK context only
> now, are they.
>
> https://unicode.org/cldr/utility/character.jsp?a=1F600&B1=Show
> https://unicode.org/cldr/utility/character.jsp?a=263A&B1=Show
>
> ~fantasai
>


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-05 Thread Phake Nick via Unicode
在 2018年3月5日週一 13:25,Martin J. Dürst via Unicode  寫道:

> Hello John,
>
> On 2018/03/01 12:31, via Unicode wrote:
>
> > Pen, or brush and paper is much more flexible. With thousands of names
> > of people and places still not encoded I am not sure if I would describe
> > hans (simplified Chinese characters) as well supported. nor with current
> > policy which limits China with over one billion people to submitting
> > less than 500 Chinese characters a year on average, and names not being
> > all to be added, it is hard to say which decade hans will be well
> > supported.
>
> I think this contains several misunderstandings. First, of course
> pen/brush and paper are more flexible than character encoding, but
> that's true for the Latin script, too.
>

In latin script, as an example, I can simply name myself "Phake", but in
Chinese with current Unicode-based environment, it would not be possible
for me to randomly name myself using a character  ⿰牜爲 as I would like to.


> Second, while I have heard that people create new characters for naming
> a baby in a traditional Han context, I haven't heard about this in a
> simplified Han context. And it's not frequent at all, the same way
> naming a baby John in the US is way more frequent than let's say Qvtwzx.
> I'd also assume that China has regulations on what characters can be
> used to name a baby, and that the parents in this age of smartphone
> communication will think at least twice before giving their baby a name
> that they cannot send to their relatives via some chat app.
>

Traditional character versus simplified characters in this context is just
like Fraktur vs Antiqua. The way to write some components have been changed
and then there are also orthographical changes that make some characters no
longer comprise of same component, but they are still Chinese characters
and their usage are still unchanged. I believe there are regulations on
naming but that regulations would have be manmade to adopt to the
limitations of current computational system. Plus, once in a while I still
often heard about news that people are having difficulties in using e.g.
train booking system or banking systems due to characters that they are
using. (Although in many case those are encoded characters not supported by
system)


> Third, I cannot confirm or deny the "500 characters a year" limit, but
> I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real need
> to encode more characters, everybody would find a way to handle these.


> Due to the nature of your claims, it's difficult to falsify many of
> them. It would be easier to prove them (assuming they were true), so if
> you have any supporting evidence, please provide it.
>
> Regards,   Martin.
>
> > John Knightley
>
>


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-05 Thread James Kass via Unicode
Phake Nick wrote,

> In latin script, as an example, I can simply name myself
> "Phake", but in Chinese with current Unicode-based environment,
> it would not be possible for me to randomly name myself using
> a character  ⿰牜爲

Isn't that U+246E8? "𤛨"



Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-05 Thread Phake Nick via Unicode
ah right that's it.

2018年3月5日 19:25 於 "James Kass"  寫道:

Phake Nick wrote,


> In latin script, as an example, I can simply name myself
> "Phake", but in Chinese with current Unicode-based environment,
> it would not be possible for me to randomly name myself using
> a character  ⿰牜爲

Isn't that U+246E8? "𤛨"


Re: Emoji as East Asian Width = Wide

2018-03-05 Thread Philippe Verdy via Unicode
I think that fixed-width rendering properties for East-Asian characters was
meant only for rendering letters or symbols as plain-text, not for the new
rendering with emoji styles.
If the symbols are rendered as emojis, these properties don't apply at all,
the Emojis style overrides that completely.

Note that when characters have both styles (notably the oldest dingbats),
there's a variant selector available to select the emoji (EAW ignored)
style vs. plain-text style (where EAW is suitable). Characters that have
only Emoji styles and no selectors should not have any EAW property (only
the default one applicable to all Emojis).


2018-03-05 8:58 GMT+01:00 Oren Watson via Unicode :

> EAW is used in fixed-width settings to distinguish characters that should
> take up one space versus two. I would also prefer that all these be
> considered wide, since otherwise it causes format problems in these
> settigns.
> (unfortunately fixed-width appear to be largley ignored by unicode... 🙁)
>
> On Sun, Mar 4, 2018 at 10:54 PM, fantasai via Unicode  > wrote:
>
>> Why are the new emoji like U+1F600 Grinning Face EAW=Wide
>> when other dingbats like U+263A Smiling Face are EAW=Neutral?
>> This is making it difficult to have consistent formatting
>> across emoticons. Also, emoji aren't really CJK context only
>> now, are they.
>>
>> https://unicode.org/cldr/utility/character.jsp?a=1F600&B1=Show
>> https://unicode.org/cldr/utility/character.jsp?a=263A&B1=Show
>>
>> ~fantasai
>>
>
>
>


Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-05 Thread via Unicode

Dear All,

to simplify discussion I have split the points.

On 05.03.2018 16:57, Phake Nick via Unicode wrote:

在 2018年3月5日週一 13:25,Martin J. Dürst via Unicode
 寫道:


Hello John,

On 2018/03/01 12:31, via Unicode wrote:

Third, I cannot confirm or deny the "500 characters a year" limit, 
but
I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real 
need
to encode more characters, everybody would find a way to handle 
these.



Due to the nature of your claims, it's difficult to falsify many of
them. It would be easier to prove them (assuming they were true), 
so if

you have any supporting evidence, please provide it.


Chinese characters for Unicode first go to IRG (or ISO/IEC 
JTC1/SC2/WG2/IRG) website. The limit of 500 a year for China is an 
average based on IRG #48 document regarding working set 2017 
http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2220_IRG48Recommends.pdf 
which explicitly states "each submission shall not exceed 1,000 
characters". The People's Republic of China as one member of IRG is 
limited to 1,000 characters, which hopefully we can all agree has a 
population of over 1,000,000,000 , therefore was limited to submitting 
at most 1,000 characters. The earliest possible date for the next 
working set is two or three years later, that is 2019 or 2020, so that's 
an average limit of either 500 or 333 characters a year.


Regards
John


Regards,   Martin.





Re: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-05 Thread via Unicode


Dear All,

here is reply to points one and two.

On 05.03.2018 16:57, Phake Nick via Unicode wrote:

在 2018年3月5日週一 13:25,Martin J. Dürst via Unicode
 寫道:


Hello John,

On 2018/03/01 12:31, via Unicode wrote:

> Pen, or brush and paper is much more flexible. With thousands of
names
> of people and places still not encoded I am not sure if I would
describe
> hans (simplified Chinese characters) as well supported. nor with
current
> policy which limits China with over one billion people to
submitting
> less than 500 Chinese characters a year on average, and names not
being
> all to be added, it is hard to say which decade hans will be well
> supported.

I think this contains several misunderstandings. First, of course
pen/brush and paper are more flexible than character encoding, but
thats true for the Latin script, too.


In latin script, as an example, I can simply name myself "Phake", but
in Chinese with current Unicode-based environment, it would not be
possible for me to randomly name myself using a character  ⿰牜爲
as I would like to.


Second, while I have heard that people create new characters for
naming
a baby in a traditional Han context, I havent heard about this in a
simplified Han context. And its not frequent at all, the same way
naming a baby John in the US is way more frequent than lets say
Qvtwzx.
Id also assume that China has regulations on what characters can be
used to name a baby, and that the parents in this age of smartphone
communication will think at least twice before giving their baby a
name
that they cannot send to their relatives via some chat app.




In most cases the answer to the above may well be the same, the 
unencoded names of people and places are not new names, but rather names 
of places and poeple in use from before Unicode and often before 
computers. In IRG #48 People's Republic of China 
http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2187ChinaActivityReport.pdf 
that states of over 3,000 names of people and places are under 
condideration for IRG working set 2017 and at least half require 
encoding. The document also list other categories of CJK ideographs 
under consideration for submission to Unicode.


Regards
John





Links:types
--
[1] mailto:unicode@unicode.org




Re: [Unicode] Re: Fonts and font sizes used in the Unicode

2018-03-05 Thread suzuki toshiya via Unicode

Hi,

I remember, the front page of the code charts by
Unicode has following note:


Fonts
The shapes of the reference glyphs used in these code
charts are not prescriptive. Considerable variation is
to be expected in actual fonts. The particular fonts
used in these charts were provided to the Unicode
Consortium by a number of different font designers,
who own the rights to the fonts.

See http://www.unicode.org/charts/fonts.html for a list.


--

I have a question; if some people try to make a
translated version of Unicode, they should contact
all font contributors and ask for the license?
Unicode Consortium cannot give any sublicense?

If I understand correctly, ISO/IEC JTC1 hold the
copyright of the materials used in the published
documents of JTC1 standard, because they have to
permit the production of the translated version of
their standards, the reuse of the content of a spec
by another spec, etc.

Thus, I guess, it would not be so irrelevant to ask
the permission to JTC1, about the fonts used in
ISO/IEC 10646 - although it does not mean that
JTC1 would permit anything. If I'm misunderstanding,
please correct me.

Regards,
mpsuzuki

On 3/5/2018 4:49 AM, Asmus Freytag via Unicode wrote:

On 3/4/2018 9:12 AM, Markus Scherer via Unicode wrote:
On Sun, Mar 4, 2018 at 6:10 AM, Helena Miton via Unicode 
mailto:unicode@unicode.org>> wrote:
Greetings. Is there a way to know which font and font size have been used in 
the Unicode charts (for various writing systems)? Many thanks!

What are you trying to do?

Many of the fonts are unique to the Unicode chart production, and are not 
licensed for other uses. Some are not even generally usable.

markus

The editors of the Unicode charts will use any font resource that gets the job 
done (that is, results in a chart that correctly displays the characters in the 
standard). These fonts are often not production fonts, and may lack any of the 
many tables needed to actually display running text. They may also, as has been 
mentioned, be licensed solely for the purpose of publishing the standard. In 
some cases, they are custom built.

For most scripts, the font size is nominally set to 22pt in the main code 
charts, but the tool that the editors use allow a different size to be selected 
for any range of code points, or individual characters. There are some examples 
where a character is very wide or tall where it had to be scaled down 
individually to fit the cell.

The purpose of the code charts is *exclusively* that of helping users of the 
standard identify which character is encoded at what code position. They are 
not intended as a font resource or normative description of the glyphs. Any 
usage scenario that is outside the very narrow scope is unsupported and reverse 
engineering / extracting font resources is explicitly in violation of the terms 
of use.

A./





Re: [Unicode] Re: Fonts and font sizes used in the Unicode

2018-03-05 Thread Markus Scherer via Unicode
On Mon, Mar 5, 2018 at 9:03 AM, suzuki toshiya via Unicode <
unicode@unicode.org> wrote:

> I have a question; if some people try to make a
> translated version of Unicode, they should contact
> all font contributors and ask for the license?
> Unicode Consortium cannot give any sublicense?
>

If you want to translate the Unicode Standard or its companion standards
(UAX, UTS, ...), then please contact the Unicode Consortium.

Thus, I guess, it would not be so irrelevant to ask
> the permission to JTC1, about the fonts used in
> ISO/IEC 10646 - although it does not mean that
> JTC1 would permit anything. If I'm misunderstanding,
> please correct me.
>

The production of the ISO 10646 standard is done by the Unicode Consortium.
I am fuzzy on what exactly that means for copyright. If you need to find
out, then please contact the consortium.

markus


CJK Ideograph Encoding Velocity (was: Re: Unicode Emoji 11.0 characters now ready for adoption!)

2018-03-05 Thread Ken Whistler via Unicode

John,

I think this may be giving the list a somewhat misleading picture of the 
actual statistics for encoding of CJK unified ideographs. The "500 
characters a year" or "1000 characters a year" limits are administrative 
limits set by the IRG for national bodies (and others) submitting 
repertoire to the "working set" that the IRG then segments into chunks 
for processing to prepare new increments for actual encoding.


In point of fact, if we take 1991 as the base year, the *average* rate 
of encoding new CJK unified ideographs now stands at 3379 per annum 
(87,860 as of Unicode 10.0). By "encoding" here, I mean, final, finished 
publication of the encoded characters -- not the larger number of 
potentially unifiable submissions that eventually go into a publication 
increment. There is a gradual downward drift in that number over time, 
because of the impact on the stats of the "big bang" encoding of 42,711 
ideographs for Extension B back in 2001, but recently, the numbers have 
been quite consistent with an average incremental rate of about 3000 new 
ideographs per year:


5762 added for Extension E in 2015

7463 added for Extension F in 2017

~ 4934 to be added for Extension G, probably to be published in 2020

If you run the average calculation including Extension G, assuming 2020, 
you end up with a cumulative per annum rate of 3200, not much different 
than the calculation done as of today.


And as for the implication that China, in particular, is somehow limited 
by these numbers, one should note that the vast majority of Extension G 
is associated with Chinese sources. Although a substantial chunk is 
formally labeled with a "UK" source this time around, almost all of 
those characters represent a roll-in of systematic simplifications, of 
various sorts, associated with PRC usage. (People who want to check can 
take a look at L2/17-366R in the UTC document registry.)


--Ken


On 3/5/2018 7:13 AM, via Unicode wrote:

Dear All,

to simplify discussion I have split the points. 








On 2018/03/01 12:31, via Unicode wrote:


Third, I cannot confirm or deny the "500 characters a year" limit, but
I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real 
need

to encode more characters, everybody would find a way to handle these.



Chinese characters for Unicode first go to IRG (or ISO/IEC 
JTC1/SC2/WG2/IRG) website. The limit of 500 a year for China is an 
average based on IRG #48 document regarding working set 2017 
http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2220_IRG48Recommends.pdf 
which explicitly states "each submission shall not exceed 1,000 
characters". The People's Republic of China as one member of IRG is 
limited to 1,000 characters, which hopefully we can all agree has a 
population of over 1,000,000,000 , therefore was limited to submitting 
at most 1,000 characters. The earliest possible date for the next 
working set is two or three years later, that is 2019 or 2020, so 
that's an average limit of either 500 or 333 characters a year.


Regards
John








Re: [Unicode] Re: Fonts and font sizes used in the Unicode

2018-03-05 Thread Asmus Freytag via Unicode

  
  
On 3/5/2018 9:03 AM, suzuki toshiya via
  Unicode wrote:

I
  have a question; if some people try to make a
  
  translated version of Unicode, they should contact
  
  all font contributors and ask for the license?
  
  Unicode Consortium cannot give any sublicense?

Be happy to help you get an answer to that
off-list,
in case you are actually working on a translation 
project. The issue is too specific and complex to
give general answers.
A./
  
  



Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-05 Thread Ken Whistler via Unicode


On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote:

I have a question; if some people try to make a
translated version of Unicode


And to add to Asmus' response, folks on the list should understand that 
even with the best of effort, the concept of a "translated version of 
Unicode" is a near impossibility. In fairly recent times, two serious 
efforts to translate *just *the core specification -- one in Japanese, 
and a somewhat later attempt for Chinese -- crashed and burned, for a 
variety of reasons. The core specification is huge, contains a lot of 
very specific technical terminology that is difficult to translate, 
along with a large collection of script- and language-specific detail, 
also hard to translate. Worse, it keeps changing, with updates now 
coming out once every year. Some large parts are stable, but it is 
impossible to predict what sections might be impacted by the next year's 
encoding decisions.


That is not including that fact that "the Unicode Standard" now also 
includes 14 separate HTML (or XHTML) annexes, all of which are also 
moving targets, along with the UCD data files, which often contain 
important information in their headers that would also require 
translation. And then, of course, there are the 2000+ pages of the 
formatted code charts, which require highly specific and very 
complicated custom tooling and font usage to produce.


It would require a dedicated (and expensive) small army of translators, 
terminologists, editors, programmers, font designers, and project 
managers to replicate all of this into another language publication -- 
and then they would have to do it again the next year, and again the 
next year, in perpetuity. Basically, given the current situation, it 
would be a fool's errand, more likely to introduce errors and 
inconsistencies than to help anybody with actual implementation.


People who want accessibility to the Unicode Standard in other languages 
need to scale down their expectations considerably, and focus on 
preparing reasonably short and succinct introductions to the terminology 
and complexity involved in the full standard. Such projects are 
feasible. But a full translation of "the Unicode Standard" simply is not.


--Ken


Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)

2018-03-05 Thread Philippe Verdy via Unicode
There's been significant efforts to "translate" or more precisely "adapt"
significant parts of the standard with good presentations in Wikipedia and
various sites for scoped topics. So there are alternate charts, and instead
of translating all, the concepts are summarized, reexplained, but still
give links to the original version in English everytime more info is needed.
All UCD files don't need to be translated, they can also be automatically
processed to generate alternate presentations or datatables in other
formats. There's no value in taking efforts to translate them manually,
it's better to develop a tool that will process them in the format users
can read.

So remove the UCD files and the tables from the count, as well as sample
code (which is jsut demontrative and uses simplified non optimal
implementation to keep this code clear). We an now have separate tools or
websites presenting them and proposing commented code which is also better
performing. We have large collections of i18n libraries that were developed
for various development platforms and usage documentation in various
languages.

The only efforts is in:
* naming characters (Wikipedia is great to distribute the effort and have
articles showing relevant collections of characters and document alternate
names or disambiguate synonyms).
* the core text of the standard (section 3 about conformance and
requirements is the first thing to adapt). There's absolutely no need
however to do that as a pure translation, it can be rewritten and presented
with the goals wanted by users. Here again Wikiepdia has done significant
efforts there, in various languages
* keeping the tools developed in the previous paragraph in sync and
conformity with the standard (sync the UCD files they use).

2018-03-05 19:21 GMT+01:00 Ken Whistler via Unicode :

>
> On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote:
>
> I have a question; if some people try to make a
> translated version of Unicode
>
>
> And to add to Asmus' response, folks on the list should understand that
> even with the best of effort, the concept of a "translated version of
> Unicode" is a near impossibility. In fairly recent times, two serious
> efforts to translate *just *the core specification -- one in Japanese,
> and a somewhat later attempt for Chinese -- crashed and burned, for a
> variety of reasons. The core specification is huge, contains a lot of very
> specific technical terminology that is difficult to translate, along with a
> large collection of script- and language-specific detail, also hard to
> translate. Worse, it keeps changing, with updates now coming out once every
> year. Some large parts are stable, but it is impossible to predict what
> sections might be impacted by the next year's encoding decisions.
>
> That is not including that fact that "the Unicode Standard" now also
> includes 14 separate HTML (or XHTML) annexes, all of which are also moving
> targets, along with the UCD data files, which often contain important
> information in their headers that would also require translation. And then,
> of course, there are the 2000+ pages of the formatted code charts, which
> require highly specific and very complicated custom tooling and font usage
> to produce.
>
> It would require a dedicated (and expensive) small army of translators,
> terminologists, editors, programmers, font designers, and project managers
> to replicate all of this into another language publication -- and then they
> would have to do it again the next year, and again the next year, in
> perpetuity. Basically, given the current situation, it would be a fool's
> errand, more likely to introduce errors and inconsistencies than to help
> anybody with actual implementation.
>
> People who want accessibility to the Unicode Standard in other languages
> need to scale down their expectations considerably, and focus on preparing
> reasonably short and succinct introductions to the terminology and
> complexity involved in the full standard. Such projects are feasible. But a
> full translation of "the Unicode Standard" simply is not.
>
> --Ken
>