Fwd: Emoji as East Asian Width = Wide
EAW is used in fixed-width settings to distinguish characters that should take up one space versus two. I would also prefer that all these be considered wide, since otherwise it causes format problems in these settigns. (unfortunately fixed-width appear to be largley ignored by unicode... 🙁) On Sun, Mar 4, 2018 at 10:54 PM, fantasai via Unicode wrote: > Why are the new emoji like U+1F600 Grinning Face EAW=Wide > when other dingbats like U+263A Smiling Face are EAW=Neutral? > This is making it difficult to have consistent formatting > across emoticons. Also, emoji aren't really CJK context only > now, are they. > > https://unicode.org/cldr/utility/character.jsp?a=1F600&B1=Show > https://unicode.org/cldr/utility/character.jsp?a=263A&B1=Show > > ~fantasai >
Re: Unicode Emoji 11.0 characters now ready for adoption!
在 2018年3月5日週一 13:25,Martin J. Dürst via Unicode 寫道: > Hello John, > > On 2018/03/01 12:31, via Unicode wrote: > > > Pen, or brush and paper is much more flexible. With thousands of names > > of people and places still not encoded I am not sure if I would describe > > hans (simplified Chinese characters) as well supported. nor with current > > policy which limits China with over one billion people to submitting > > less than 500 Chinese characters a year on average, and names not being > > all to be added, it is hard to say which decade hans will be well > > supported. > > I think this contains several misunderstandings. First, of course > pen/brush and paper are more flexible than character encoding, but > that's true for the Latin script, too. > In latin script, as an example, I can simply name myself "Phake", but in Chinese with current Unicode-based environment, it would not be possible for me to randomly name myself using a character ⿰牜爲 as I would like to. > Second, while I have heard that people create new characters for naming > a baby in a traditional Han context, I haven't heard about this in a > simplified Han context. And it's not frequent at all, the same way > naming a baby John in the US is way more frequent than let's say Qvtwzx. > I'd also assume that China has regulations on what characters can be > used to name a baby, and that the parents in this age of smartphone > communication will think at least twice before giving their baby a name > that they cannot send to their relatives via some chat app. > Traditional character versus simplified characters in this context is just like Fraktur vs Antiqua. The way to write some components have been changed and then there are also orthographical changes that make some characters no longer comprise of same component, but they are still Chinese characters and their usage are still unchanged. I believe there are regulations on naming but that regulations would have be manmade to adopt to the limitations of current computational system. Plus, once in a while I still often heard about news that people are having difficulties in using e.g. train booking system or banking systems due to characters that they are using. (Although in many case those are encoded characters not supported by system) > Third, I cannot confirm or deny the "500 characters a year" limit, but > I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real need > to encode more characters, everybody would find a way to handle these. > Due to the nature of your claims, it's difficult to falsify many of > them. It would be easier to prove them (assuming they were true), so if > you have any supporting evidence, please provide it. > > Regards, Martin. > > > John Knightley > >
Re: Unicode Emoji 11.0 characters now ready for adoption!
Phake Nick wrote, > In latin script, as an example, I can simply name myself > "Phake", but in Chinese with current Unicode-based environment, > it would not be possible for me to randomly name myself using > a character ⿰牜爲 Isn't that U+246E8? "𤛨"
Re: Unicode Emoji 11.0 characters now ready for adoption!
ah right that's it. 2018年3月5日 19:25 於 "James Kass" 寫道: Phake Nick wrote, > In latin script, as an example, I can simply name myself > "Phake", but in Chinese with current Unicode-based environment, > it would not be possible for me to randomly name myself using > a character ⿰牜爲 Isn't that U+246E8? "𤛨"
Re: Emoji as East Asian Width = Wide
I think that fixed-width rendering properties for East-Asian characters was meant only for rendering letters or symbols as plain-text, not for the new rendering with emoji styles. If the symbols are rendered as emojis, these properties don't apply at all, the Emojis style overrides that completely. Note that when characters have both styles (notably the oldest dingbats), there's a variant selector available to select the emoji (EAW ignored) style vs. plain-text style (where EAW is suitable). Characters that have only Emoji styles and no selectors should not have any EAW property (only the default one applicable to all Emojis). 2018-03-05 8:58 GMT+01:00 Oren Watson via Unicode : > EAW is used in fixed-width settings to distinguish characters that should > take up one space versus two. I would also prefer that all these be > considered wide, since otherwise it causes format problems in these > settigns. > (unfortunately fixed-width appear to be largley ignored by unicode... 🙁) > > On Sun, Mar 4, 2018 at 10:54 PM, fantasai via Unicode > wrote: > >> Why are the new emoji like U+1F600 Grinning Face EAW=Wide >> when other dingbats like U+263A Smiling Face are EAW=Neutral? >> This is making it difficult to have consistent formatting >> across emoticons. Also, emoji aren't really CJK context only >> now, are they. >> >> https://unicode.org/cldr/utility/character.jsp?a=1F600&B1=Show >> https://unicode.org/cldr/utility/character.jsp?a=263A&B1=Show >> >> ~fantasai >> > > >
Re: Unicode Emoji 11.0 characters now ready for adoption!
Dear All, to simplify discussion I have split the points. On 05.03.2018 16:57, Phake Nick via Unicode wrote: 在 2018年3月5日週一 13:25,Martin J. Dürst via Unicode 寫道: Hello John, On 2018/03/01 12:31, via Unicode wrote: Third, I cannot confirm or deny the "500 characters a year" limit, but I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real need to encode more characters, everybody would find a way to handle these. Due to the nature of your claims, it's difficult to falsify many of them. It would be easier to prove them (assuming they were true), so if you have any supporting evidence, please provide it. Chinese characters for Unicode first go to IRG (or ISO/IEC JTC1/SC2/WG2/IRG) website. The limit of 500 a year for China is an average based on IRG #48 document regarding working set 2017 http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2220_IRG48Recommends.pdf which explicitly states "each submission shall not exceed 1,000 characters". The People's Republic of China as one member of IRG is limited to 1,000 characters, which hopefully we can all agree has a population of over 1,000,000,000 , therefore was limited to submitting at most 1,000 characters. The earliest possible date for the next working set is two or three years later, that is 2019 or 2020, so that's an average limit of either 500 or 333 characters a year. Regards John Regards, Martin.
Re: Unicode Emoji 11.0 characters now ready for adoption!
Dear All, here is reply to points one and two. On 05.03.2018 16:57, Phake Nick via Unicode wrote: 在 2018年3月5日週一 13:25,Martin J. Dürst via Unicode 寫道: Hello John, On 2018/03/01 12:31, via Unicode wrote: > Pen, or brush and paper is much more flexible. With thousands of names > of people and places still not encoded I am not sure if I would describe > hans (simplified Chinese characters) as well supported. nor with current > policy which limits China with over one billion people to submitting > less than 500 Chinese characters a year on average, and names not being > all to be added, it is hard to say which decade hans will be well > supported. I think this contains several misunderstandings. First, of course pen/brush and paper are more flexible than character encoding, but thats true for the Latin script, too. In latin script, as an example, I can simply name myself "Phake", but in Chinese with current Unicode-based environment, it would not be possible for me to randomly name myself using a character ⿰牜爲 as I would like to. Second, while I have heard that people create new characters for naming a baby in a traditional Han context, I havent heard about this in a simplified Han context. And its not frequent at all, the same way naming a baby John in the US is way more frequent than lets say Qvtwzx. Id also assume that China has regulations on what characters can be used to name a baby, and that the parents in this age of smartphone communication will think at least twice before giving their baby a name that they cannot send to their relatives via some chat app. In most cases the answer to the above may well be the same, the unencoded names of people and places are not new names, but rather names of places and poeple in use from before Unicode and often before computers. In IRG #48 People's Republic of China http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2187ChinaActivityReport.pdf that states of over 3,000 names of people and places are under condideration for IRG working set 2017 and at least half require encoding. The document also list other categories of CJK ideographs under consideration for submission to Unicode. Regards John Links:types -- [1] mailto:unicode@unicode.org
Re: [Unicode] Re: Fonts and font sizes used in the Unicode
Hi, I remember, the front page of the code charts by Unicode has following note: Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts. See http://www.unicode.org/charts/fonts.html for a list. -- I have a question; if some people try to make a translated version of Unicode, they should contact all font contributors and ask for the license? Unicode Consortium cannot give any sublicense? If I understand correctly, ISO/IEC JTC1 hold the copyright of the materials used in the published documents of JTC1 standard, because they have to permit the production of the translated version of their standards, the reuse of the content of a spec by another spec, etc. Thus, I guess, it would not be so irrelevant to ask the permission to JTC1, about the fonts used in ISO/IEC 10646 - although it does not mean that JTC1 would permit anything. If I'm misunderstanding, please correct me. Regards, mpsuzuki On 3/5/2018 4:49 AM, Asmus Freytag via Unicode wrote: On 3/4/2018 9:12 AM, Markus Scherer via Unicode wrote: On Sun, Mar 4, 2018 at 6:10 AM, Helena Miton via Unicode mailto:unicode@unicode.org>> wrote: Greetings. Is there a way to know which font and font size have been used in the Unicode charts (for various writing systems)? Many thanks! What are you trying to do? Many of the fonts are unique to the Unicode chart production, and are not licensed for other uses. Some are not even generally usable. markus The editors of the Unicode charts will use any font resource that gets the job done (that is, results in a chart that correctly displays the characters in the standard). These fonts are often not production fonts, and may lack any of the many tables needed to actually display running text. They may also, as has been mentioned, be licensed solely for the purpose of publishing the standard. In some cases, they are custom built. For most scripts, the font size is nominally set to 22pt in the main code charts, but the tool that the editors use allow a different size to be selected for any range of code points, or individual characters. There are some examples where a character is very wide or tall where it had to be scaled down individually to fit the cell. The purpose of the code charts is *exclusively* that of helping users of the standard identify which character is encoded at what code position. They are not intended as a font resource or normative description of the glyphs. Any usage scenario that is outside the very narrow scope is unsupported and reverse engineering / extracting font resources is explicitly in violation of the terms of use. A./
Re: [Unicode] Re: Fonts and font sizes used in the Unicode
On Mon, Mar 5, 2018 at 9:03 AM, suzuki toshiya via Unicode < unicode@unicode.org> wrote: > I have a question; if some people try to make a > translated version of Unicode, they should contact > all font contributors and ask for the license? > Unicode Consortium cannot give any sublicense? > If you want to translate the Unicode Standard or its companion standards (UAX, UTS, ...), then please contact the Unicode Consortium. Thus, I guess, it would not be so irrelevant to ask > the permission to JTC1, about the fonts used in > ISO/IEC 10646 - although it does not mean that > JTC1 would permit anything. If I'm misunderstanding, > please correct me. > The production of the ISO 10646 standard is done by the Unicode Consortium. I am fuzzy on what exactly that means for copyright. If you need to find out, then please contact the consortium. markus
CJK Ideograph Encoding Velocity (was: Re: Unicode Emoji 11.0 characters now ready for adoption!)
John, I think this may be giving the list a somewhat misleading picture of the actual statistics for encoding of CJK unified ideographs. The "500 characters a year" or "1000 characters a year" limits are administrative limits set by the IRG for national bodies (and others) submitting repertoire to the "working set" that the IRG then segments into chunks for processing to prepare new increments for actual encoding. In point of fact, if we take 1991 as the base year, the *average* rate of encoding new CJK unified ideographs now stands at 3379 per annum (87,860 as of Unicode 10.0). By "encoding" here, I mean, final, finished publication of the encoded characters -- not the larger number of potentially unifiable submissions that eventually go into a publication increment. There is a gradual downward drift in that number over time, because of the impact on the stats of the "big bang" encoding of 42,711 ideographs for Extension B back in 2001, but recently, the numbers have been quite consistent with an average incremental rate of about 3000 new ideographs per year: 5762 added for Extension E in 2015 7463 added for Extension F in 2017 ~ 4934 to be added for Extension G, probably to be published in 2020 If you run the average calculation including Extension G, assuming 2020, you end up with a cumulative per annum rate of 3200, not much different than the calculation done as of today. And as for the implication that China, in particular, is somehow limited by these numbers, one should note that the vast majority of Extension G is associated with Chinese sources. Although a substantial chunk is formally labeled with a "UK" source this time around, almost all of those characters represent a roll-in of systematic simplifications, of various sorts, associated with PRC usage. (People who want to check can take a look at L2/17-366R in the UTC document registry.) --Ken On 3/5/2018 7:13 AM, via Unicode wrote: Dear All, to simplify discussion I have split the points. On 2018/03/01 12:31, via Unicode wrote: Third, I cannot confirm or deny the "500 characters a year" limit, but I'm quite sure that if China (or Hong Kong, Taiwan,...) had a real need to encode more characters, everybody would find a way to handle these. Chinese characters for Unicode first go to IRG (or ISO/IEC JTC1/SC2/WG2/IRG) website. The limit of 500 a year for China is an average based on IRG #48 document regarding working set 2017 http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg48/IRGN2220_IRG48Recommends.pdf which explicitly states "each submission shall not exceed 1,000 characters". The People's Republic of China as one member of IRG is limited to 1,000 characters, which hopefully we can all agree has a population of over 1,000,000,000 , therefore was limited to submitting at most 1,000 characters. The earliest possible date for the next working set is two or three years later, that is 2019 or 2020, so that's an average limit of either 500 or 333 characters a year. Regards John
Re: [Unicode] Re: Fonts and font sizes used in the Unicode
On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote: I have a question; if some people try to make a translated version of Unicode, they should contact all font contributors and ask for the license? Unicode Consortium cannot give any sublicense? Be happy to help you get an answer to that off-list, in case you are actually working on a translation project. The issue is too specific and complex to give general answers. A./
Translating the standard (was: Re: Fonts and font sizes used in the Unicode)
On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote: I have a question; if some people try to make a translated version of Unicode And to add to Asmus' response, folks on the list should understand that even with the best of effort, the concept of a "translated version of Unicode" is a near impossibility. In fairly recent times, two serious efforts to translate *just *the core specification -- one in Japanese, and a somewhat later attempt for Chinese -- crashed and burned, for a variety of reasons. The core specification is huge, contains a lot of very specific technical terminology that is difficult to translate, along with a large collection of script- and language-specific detail, also hard to translate. Worse, it keeps changing, with updates now coming out once every year. Some large parts are stable, but it is impossible to predict what sections might be impacted by the next year's encoding decisions. That is not including that fact that "the Unicode Standard" now also includes 14 separate HTML (or XHTML) annexes, all of which are also moving targets, along with the UCD data files, which often contain important information in their headers that would also require translation. And then, of course, there are the 2000+ pages of the formatted code charts, which require highly specific and very complicated custom tooling and font usage to produce. It would require a dedicated (and expensive) small army of translators, terminologists, editors, programmers, font designers, and project managers to replicate all of this into another language publication -- and then they would have to do it again the next year, and again the next year, in perpetuity. Basically, given the current situation, it would be a fool's errand, more likely to introduce errors and inconsistencies than to help anybody with actual implementation. People who want accessibility to the Unicode Standard in other languages need to scale down their expectations considerably, and focus on preparing reasonably short and succinct introductions to the terminology and complexity involved in the full standard. Such projects are feasible. But a full translation of "the Unicode Standard" simply is not. --Ken
Re: Translating the standard (was: Re: Fonts and font sizes used in the Unicode)
There's been significant efforts to "translate" or more precisely "adapt" significant parts of the standard with good presentations in Wikipedia and various sites for scoped topics. So there are alternate charts, and instead of translating all, the concepts are summarized, reexplained, but still give links to the original version in English everytime more info is needed. All UCD files don't need to be translated, they can also be automatically processed to generate alternate presentations or datatables in other formats. There's no value in taking efforts to translate them manually, it's better to develop a tool that will process them in the format users can read. So remove the UCD files and the tables from the count, as well as sample code (which is jsut demontrative and uses simplified non optimal implementation to keep this code clear). We an now have separate tools or websites presenting them and proposing commented code which is also better performing. We have large collections of i18n libraries that were developed for various development platforms and usage documentation in various languages. The only efforts is in: * naming characters (Wikipedia is great to distribute the effort and have articles showing relevant collections of characters and document alternate names or disambiguate synonyms). * the core text of the standard (section 3 about conformance and requirements is the first thing to adapt). There's absolutely no need however to do that as a pure translation, it can be rewritten and presented with the goals wanted by users. Here again Wikiepdia has done significant efforts there, in various languages * keeping the tools developed in the previous paragraph in sync and conformity with the standard (sync the UCD files they use). 2018-03-05 19:21 GMT+01:00 Ken Whistler via Unicode : > > On 3/5/2018 9:03 AM, suzuki toshiya via Unicode wrote: > > I have a question; if some people try to make a > translated version of Unicode > > > And to add to Asmus' response, folks on the list should understand that > even with the best of effort, the concept of a "translated version of > Unicode" is a near impossibility. In fairly recent times, two serious > efforts to translate *just *the core specification -- one in Japanese, > and a somewhat later attempt for Chinese -- crashed and burned, for a > variety of reasons. The core specification is huge, contains a lot of very > specific technical terminology that is difficult to translate, along with a > large collection of script- and language-specific detail, also hard to > translate. Worse, it keeps changing, with updates now coming out once every > year. Some large parts are stable, but it is impossible to predict what > sections might be impacted by the next year's encoding decisions. > > That is not including that fact that "the Unicode Standard" now also > includes 14 separate HTML (or XHTML) annexes, all of which are also moving > targets, along with the UCD data files, which often contain important > information in their headers that would also require translation. And then, > of course, there are the 2000+ pages of the formatted code charts, which > require highly specific and very complicated custom tooling and font usage > to produce. > > It would require a dedicated (and expensive) small army of translators, > terminologists, editors, programmers, font designers, and project managers > to replicate all of this into another language publication -- and then they > would have to do it again the next year, and again the next year, in > perpetuity. Basically, given the current situation, it would be a fool's > errand, more likely to introduce errors and inconsistencies than to help > anybody with actual implementation. > > People who want accessibility to the Unicode Standard in other languages > need to scale down their expectations considerably, and focus on preparing > reasonably short and succinct introductions to the terminology and > complexity involved in the full standard. Such projects are feasible. But a > full translation of "the Unicode Standard" simply is not. > > --Ken >