Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))
On Thursday 31 May 2012, Doug Ewell d...@ewellic.org wrote: William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote: Further to that point of order, is there any rule that absolutely prevents the deprecated status of a character or collection of characters being removed? UTC has not ever shown the slightest inclination to do so, if that answers your question. Thank you for replying. What I was wondering about was whether if someone proposes U+E0002 for encoding for a future new technology, whether the fact that tags are currently deprecated would automatically stop that proposal being accepted for encoding because of perhaps some guarantee in the rules never to reverse deprecation or something like that. I feel that by hybridizing the suggestions of Doug and Philippe that an elegant solution using tags and an advanced format font could be designed. Thinking about this after posting and thinking of the vast coding space that could be opened up for flag encoding by just adding U+E0002 into regular Unicode, I began to think of the possibility of proposing the addition of U+E0007 so as to open up another encoding space where each item in that encoding space could be displayed either as a sequence of tag glyphs using an ordinary font, or displayed as one glyph by using glyph substitution technology with an advanced format font or displayed localized using a database technology with the item in that encoding space used as a key to the database. I was thinking that the above would involve visible glyphs for the tag characters. I was thinking of the possibilities, then I noticed something. In a later post Philippe Verdy wrote as follows. (or in Place 14, but that plane is not intended for visible symbols). Ah! There is a font that has visible glyphs for the tag characters, together with a visible glyph for a Private Use Area tag-style character at U+2 available as a free download from the following forum post. http://forum.high-logic.com/viewtopic.php?p=10587#p10587 William Overington 1 June 2012
Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))
Note that I absolutely do not advocate the reuse of language tags for something else. They are deprecated and should remain deprecated. They were not intended to be visible symbols. I much prefer a solution that generates **true** symbols that can be combined, and **optionally** (but safely) rendered as ligatures (by design of the encoding itself) to render the true flags instead of showing their code in the list of glyphs (the default rendering in absence of recongnized ligatures). The ligature-based solution can still be disabled to show the symbols using a single ZWNJ format control in the middle of the sequence, but this is for limited use. It is expected that these sequences of symbols **should** be rendered as ligatures by default each time these ligatures are recognized, i.e. when they match a flag code that has been registered somewhere (in a separate registry which is not immediately necessary for the encoding of this subset). This new small subset should be trated as a new separate script, which is definitely NOT Latin, as it will not support most other assumptions and features of the Latin script, and it must not be treated at the same level as the other surrounding Latin letters). Encoded sequences are not breakable in the middle for word-breaking purpose. In a limited plain-text environment, these codes could be rendered or converted in a lossy way by remapping these symbols to the Basic Latin block, surrounding them with punctuations like in [US] but it will be only a last chance fallback. This last-chance fallback conversion may be specified with a NFKC decomposition mapping. For example this font compatibility mapping : XXX00 ; FLAG SYMBOL INITIAL HYPHEN ; ... ; So ; ... ; font005B 002D ; XXX01 ; FLAG SYMBOL INITIAL A ; ... ; So ; ... ; font005B 0041 ; XXX1A ; FLAG SYMBOL INITIAL Z ; ... ; So ; ... ; font005B 005A ; XXX20 ; FLAG SYMBOL INITIAL ZERO ; ... ; So ; ... ; font005B 0030 ; XXX29 ; FLAG SYMBOL INITIAL NINE ; ... ; So ; ... ; font005B 0039 ; ... XXX30 ; FLAG SYMBOL MEDIAL HYPHEN ; ... ; So ; ... ; font002D ; XXX31 ; FLAG SYMBOL MEDIAL A ; ... ; So ; ... ; font0041 ; XXX4A ; FLAG SYMBOL MEDIAL Z ; ... ; So ; ... ; font005A ; XXX50 ; FLAG SYMBOL MEDIAL ZERO ; ... ; So ; ... ; font0030 ; XXX59 ; FLAG SYMBOL MEDIAL NINE ; ... ; So ; ... ; font0039 ; ... XXX60 ; FLAG SYMBOL FINAL HYPHEN ; ... ; So ; ... ; font002D ; XXX61 ; FLAG SYMBOL FINAL A ; ... ; So ; ... ; font0041 005D ; XXX7A ; FLAG SYMBOL FINAL Z ; ... ; So ; ... ; font005A 005D ; XXX80 ; FLAG SYMBOL FINAL ZERO ; ... ; So ; ... ; font0030 005D ; XXX89 ; FLAG SYMBOL FINAL NINE ; ... ; So ; ... ; font0039 005D ; (this also gives an hint for how to collate these symbols, and the minimum size of the block to encode : 3 columns for each of the 3 subsets, including some code points reserved in each subsets for additional punctuation-like symbols that may be needed to implement namespaces in the registry of flags) 2012/6/1 William_J_G Overington wjgo_10...@btinternet.com: On Thursday 31 May 2012, Doug Ewell d...@ewellic.org wrote: William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote: Further to that point of order, is there any rule that absolutely prevents the deprecated status of a character or collection of characters being removed? UTC has not ever shown the slightest inclination to do so, if that answers your question. Thank you for replying. What I was wondering about was whether if someone proposes U+E0002 for encoding for a future new technology, whether the fact that tags are currently deprecated would automatically stop that proposal being accepted for encoding because of perhaps some guarantee in the rules never to reverse deprecation or something like that. I feel that by hybridizing the suggestions of Doug and Philippe that an elegant solution using tags and an advanced format font could be designed. Thinking about this after posting and thinking of the vast coding space that could be opened up for flag encoding by just adding U+E0002 into regular Unicode, I began to think of the possibility of proposing the addition of U+E0007 so as to open up another encoding space where each item in that encoding space could be displayed either as a sequence of tag glyphs using an ordinary font, or displayed as one glyph by using glyph substitution technology with an advanced format font or displayed localized using a database technology with the item in that encoding space used as a key to the database. I was thinking that the above would involve visible glyphs for the tag characters. I was thinking of the possibilities, then I noticed something. In a later post Philippe Verdy wrote as follows. (or in Place 14, but that plane is not intended for visible symbols). Ah! There is a font that has visible glyphs for the tag characters, together with a visible glyph for a Private Use Area tag-style character at U+2
Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))
William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote: What I was wondering about was whether if someone proposes U+E0002 for encoding for a future new technology, whether the fact that tags are currently deprecated would automatically stop that proposal being accepted for encoding because of perhaps some guarantee in the rules never to reverse deprecation or something like that. These are my personal opinions. Please keep in mind I am not a UTC or WG2 member, and have often been taken to task for trying to predict or advise people what UTC or WG2 will or will not do. 1. There is probably no formal provision for automatic rejection of a proposed new Plane 14 tag character. It would probably be at least considered, not thrown away at the receptionist's desk. 2. Both the act of formally deprecating the Plane 14 tag mechanism, and the comments I've seen on this list from UTC participants over the years, suggest to me that a proposal for a new Plane 14 tag character would be very unlikely to be approved. 3. Stating in a proposal that either this new tag character, or any character, is being proposed for a future new technology may reduce the likelihood that the proposal will be approved. But the only way to find out for sure is to submit a proposal. Thinking about this after posting and thinking of the vast coding space that could be opened up for flag encoding by just adding U+E0002 into regular Unicode, I began to think of the possibility of proposing the addition of U+E0007 so as to open up another encoding space where each item in that encoding space could be displayed either as a sequence of tag glyphs using an ordinary font, or displayed as one glyph by using glyph substitution technology with an advanced format font or displayed localized using a database technology with the item in that encoding space used as a key to the database. My opinion is that nothing about the Unicode code space, including Plane 14 tags, is intended to serve as an indexing mechanism into another standard. I was thinking that the above would involve visible glyphs for the tag characters. My opinion is that, while a font may include glyphs for tag characters, that is not the normal use case for tag characters. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))
2012/6/1 Doug Ewell d...@ewellic.org: My opinion is that, while a font may include glyphs for tag characters, that is not the normal use case for tag characters. I have exactly the same position about glyphs found in fonts for any format controls. They are not intended to be rendered, except in very specific technical contexts, or using some fallback mechanism if their intended function is not supported or implemented, and one wants to still be able to edits texts containing them (using a visible controls edit mode).
RE: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: Note that I absolutely do not advocate the reuse of language tags for something else. They are deprecated and should remain deprecated. They were not intended to be visible symbols. Just as a matter of terminology, the deprecated Plane 14 block is for tags and not just for language tags. The idea for such a block did come from the proposal to support inline language tagging, and the only defined type of tag is U+E0001 LANGUAGE TAG, but other tags could have been introduced later for other purposes. By deprecating the entire block and not just U+E0001, UTC essentially deprecated the whole tag concept. I much prefer a solution that generates **true** symbols that can be combined, and **optionally** (but safely) rendered as ligatures (by design of the encoding itself) to render the true flags instead of showing their code in the list of glyphs (the default rendering in absence of recongnized ligatures). I wish we would use some other term for these than ligatures. They are definitely not ligatures in the sense that any typographer, sign painter, or reader would think of them. A picture of a French flag has no imaginable visual relationship to the letter F or the letter R. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
RE: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: Just as a matter of terminology, the deprecated Plane 14 block is for tags and not just for language tags. The idea for such a block did come from the proposal to support inline language tagging, and the only defined type of tag is U+E0001 LANGUAGE TAG, but other tags could have been introduced later for other purposes. By deprecating the entire block and not just U+E0001, UTC essentially deprecated the whole tag concept. Fine. But the Plane 14 was not deprecated at the same time as a whole. Anyway, given that I propose symbols, they are NOT tags. I haev no opinion however about which plane should be used to allocated them. The plane 14 is fine for me, like any other plane (except the BMP and the SIP), even if they are not tags. You seem to think that the whole plane is for tags. I don't think so. Only the **existing** blocks assigned in Plane 14 are deprecated. No, I said the block was deprecated, not the plane. The deprecated Plane 14 block meant the deprecated block which is in Plane 14. Indeed, there are 240 variation selectors in Plane 14 which are not deprecated. They are definitely not ligatures in the sense that any typographer, sign painter, or reader would think of them. You're right, in terms of typography. But all the technologies used for producing the ligatures are perfectly usable here to give the desired effect, with the same usage policies : they will remain optional, even if they are desirable (and should be enabled by default, just like the LAM-ALEF ligature in the Arabic script). I accept that the technology for making a font and rendering engine perform this visual transformation is the same as that used to combine letters into typographical ligatures. Font guys can look at it that way. I think if Unicode does embark on something like this—not to say they should—or to the extent they already have with the Regional Indicator Symbols, they should avoid the word ligature, and in fact the passage on page 534 of TUS 6.1 simply talks about how those symbols could be rendered. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))
Coding solutions that require substantial support across implementations are successful, if (and I argue, only if) you can't successfully sell your implementation in a given market without support for that feature. Mathematical layout is not needed by the majority of users, but those users that do need it, can't be accommodated with a substitute. Hence, anyone trying to sell into that market has to make a decent job of it. Looks like there are enough people in that market that even general purpose software, like Word, has a decent (nay, excellent) equation editor. Arabic shaping is so essential to the script that you either support it, or you don't support Arabic. Placing accents on Latin characters is widely needed, but the most widely needed cases are covered by precomposed legacy characters. Hence. the support for this feature is spotty. Curiously, this remains the case, even though, taken together, the diverse users of particular combinations of letters and accents for the Latin/Greek/Cyrillic probably reach substantial numbers, and a common solution would seem to support all of them. Support for Ideographic Variation Sequences is needed for all sorts of high-end CJK work. It can be expected to be supported in those market areas, but probably not necessarily in mainstream implementations. Time will tell. And so on. The chances that any form of meta encoding for symbols (including ligation) will ever reach critical mass in support is less than for Latin/Greek/Cyrillic accents, because - as of today - there's no established use for any of these schemes. All of these things remain solutions in search of a problem. The interesting thing I note is the level of enthusiasm with which these are discussed here, when, at the same time, a lowly single character currency symbol, with no special meta-coding, layout support, algorithm changes, etc. was so roundly dismissed - despite all the evidence that not supporting it in face of user demands would impact the ability of implementers to sell into a not insubstantial market. Sometimes I wonder what's going on ... A./
Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))
In addition, I am firmly convinced that the renderers used in browsers will be able to synthetize themselves the flags according to their wellknow ISO 31166-1 codes, in absence of font support: this will just require for them to ship a small collection of SVG graphics (something that is already widely available). This will be valid substitution immediately, in absence of a more general technology based on an external registry, and of support in fonts. The technical needs for developping it in renderer software is very small. It will also be easy to test as there's no complication.
Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))
On 6/1/2012 12:01 PM, Philippe Verdy wrote: 2012/6/1 Asmus Freytagasm...@ix.netcom.com: The chances that any form of meta encoding for symbols (including ligation) will ever reach critical mass in support is less than for Latin/Greek/Cyrillic accents, because - as of today - there's no established use for any of these schemes. All of these things remain solutions in search of a problem. No, my poposal gives something that is immediately usable, and does not create any ambiguity. It is simple to implement even without the presence of a technical ligaturing solution. It's still a solution in search of a problem. There's no demand out there for this feature. A./
Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))
There's at least the demand coming from their use as Emoji. Attested as well in many books and many applications (not always colorful). May be the UTC did not receive aformal request before, but the demand REALLY exists for the encoding of flags in plain text (not just rich texts). They are semantically significant and are not just a question of presentation. 2012/6/1 Asmus Freytag asm...@ix.netcom.com: On 6/1/2012 12:01 PM, Philippe Verdy wrote: There's no demand out there for this feature.
Shift-JIS encoded text (was: RE: Tags and future new technologies [...])
Peter Constable petercon at microsoft dot com wrote: The only requirement of Unicode was to provide a way to map Shift-JIS encoded text involving emoji to Unicode / 10646 in a way that could be round-tripped, This is the part that has always confused me. At what point does text encoded in a vendor's private-use extension to Shift-JIS become Shift-JIS encoded text? Because I know for sure that I'm not supposed to refer to characters assigned to the Unicode PUA, my own or anyone else's, as being encoded in Unicode. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
Re: Shift-JIS encoded text (was: RE: Tags and future new technologies [...])
2012/6/1 Doug Ewell d...@ewellic.org: Peter Constable petercon at microsoft dot com wrote: The only requirement of Unicode was to provide a way to map Shift-JIS encoded text involving emoji to Unicode / 10646 in a way that could be round-tripped, This is the part that has always confused me. At what point does text encoded in a vendor's private-use extension to Shift-JIS become Shift-JIS encoded text? Because I know for sure that I'm not supposed to refer to characters assigned to the Unicode PUA, my own or anyone else's, as being encoded in Unicode. May be because, without admitting it publicly, those symbols really have a much wider use than in these private Shift-JIs extensions. In which case, the need for roundtrip compatibility is definitely not the main reason for their encoding, and these symbols should be considered more globally (as they are certainly needed in other countries or for other private implementations, but without the interoperability that one could expect between these implementations when they obviously mean the same thing and play the same role in texts including them). The private extension is just a sign that it was needed. The pressure to include them in standard Shift-JIS is another sign, and then the need to map them as well into the UCS, via their standardization in Shift-JIS, whever it succeeds or not in that standard). Of course, encoding flags visually in an international standard is much more difficult, if one wants to encode some flags and not some others, also because of political issues. That's why I propose another way to represent them. This won't affect the private-use Shift-JIS encoding, which can now have a roundtrip compatibility with its existing symbols, even if the standard Shift-JIS will now prefer using the more generic symbols instead of integrating the private-use extension.
Re: Shift-JIS encoded text (was: RE: Tags and future new technologies [...])
On 6/1/2012 1:51 PM, Doug Ewell wrote: At what point does text encoded in a vendor's private-use extension to Shift-JIS become Shift-JIS encoded text? A possibly less confusing way to put this is: At what point does text encoded in a vendor's private-use extension to *JIS X 0208* become Shift-JIS encoded text? The reason for putting it that way is that JIS X 0208 is a character encoding standard. It defines the repertoire of characters and assigns numbers to them. But 2022-JP, EUC-JP, and Shift-JIS are then 3 different ways of turning JIS X 0208 character codes (and possibly vendor or other extensions) into streams of bytes. Think of them as character encoding schemes (in the Unicode character encoding model sense). One of the reasons why there are many Shift-JIS's is not that the principle of how to shift JIS X 0208 code values into bytes changes, but because there are many different private extensions, all making use of the same general principle for how to move the byte values into a particular scheme for processing. In summary, Shift-JIS is not a character encoding standard -- it is a scheme for turning JIS (and various extensions) into a particular format for processing. --Ken
[OT] Flerovium and livermorium get names on the periodic table of elements
FYI – I know at least some folk here will find this of interest: http://www.theverge.com/2012/6/1/3057261/flerovium-livermorium-periodic-table-of-elements Peter
RE: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))
Philippe Verdy wrote: No, my poposal gives something that is immediately usable, and does not create any ambiguity. It is simple to implement even without the presence of a technical ligaturing solution. Those flags will be immediately usable, without any of the political complications created by the case of flags. It will avoid prolieferations of proposals, and infinite debates for encoding or not some flags, or for changing the representative glyphs. Again, not saying Unicode should do this, but: Doesn't there at least have to be a well-defined convention for representing flags before any of this works? How do I represent: 1. the flag of the United States 2. the flag of the state of Colorado 3. the flag of Adams County, Colorado 4. the flag of the city of Thornton Not all of these might be defined right away, but an extensible structure within which to define them would have to be in place. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
RE: Shift-JIS encoded text (was: RE: Tags and future new technologies [...])
I hadn't thought that Peter was talking about text encoded according to the Shift-JIS model, without specifying the encoding. I'm not sure that changes my question. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell Original Message Subject: Re: Shift-JIS encoded text (was: RE: Tags and future new technologies [...]) From: Ken Whistler k...@sybase.com Date: Fri, June 01, 2012 3:17 pm To: unicode@unicode.org On 6/1/2012 1:51 PM, Doug Ewell wrote: At what point does text encoded in a vendor's private-use extension to Shift-JIS become Shift-JIS encoded text? A possibly less confusing way to put this is: At what point does text encoded in a vendor's private-use extension to *JIS X 0208* become Shift-JIS encoded text? The reason for putting it that way is that JIS X 0208 is a character encoding standard. It defines the repertoire of characters and assigns numbers to them. But 2022-JP, EUC-JP, and Shift-JIS are then 3 different ways of turning JIS X 0208 character codes (and possibly vendor or other extensions) into streams of bytes. Think of them as character encoding schemes (in the Unicode character encoding model sense). One of the reasons why there are many Shift-JIS's is not that the principle of how to shift JIS X 0208 code values into bytes changes, but because there are many different private extensions, all making use of the same general principle for how to move the byte values into a particular scheme for processing. In summary, Shift-JIS is not a character encoding standard -- it is a scheme for turning JIS (and various extensions) into a particular format for processing. --Ken
Re: [OT] Flerovium and livermorium get names on the periodic table of elements
2012/6/2 Peter Constable peter...@microsoft.com: FYI – I know at least some folk here will find this of interest: http://www.theverge.com/2012/6/1/3057261/flerovium-livermorium-periodic-table-of-elements Well they are already in the tables shown in Wikipedia (the English, French pages at least). Time for inclusion in Wikitionnary (unless this is already done for these names, but some languages will need transliterations)...
Re: [OT] Flerovium and livermorium get names on the periodic table of elements
On 1 June 2012 23:02, Peter Constable peter...@microsoft.com wrote: http://www.theverge.com/2012/6/1/3057261/flerovium-livermorium-periodic-table-of-elements There don't appear to have been any Chinese characters assigned to these two elements yet, but it is interesting to note that there are no simplified forms for eight of the elements with highest atomic numbers: 104 Rf 鑪 钅卢 105 Db 觀 钅杜 106 Sg 譎 钅喜 107 Bh 訏 钅波 108 Hs 譆 钅黑 109 Mt 䥑 钅麦 111 Rg 錀 钅仑 112 Cn 鎶 钅哥 which are represented with PUA characters at: http://zh.wikipedia.org/wiki/%E5%85%83%E7%B4%A0%E5%91%A8%E6%9C%9F%E8%A1%A8 and as components at: http://zh.wikipedia.org/wiki/%E6%89%A9%E5%B1%95%E5%85%83%E7%B4%A0%E5%91%A8%E6%9C%9F%E8%A1%A8 (110 Ds is already encoded in CJK-D as U+2B7FC 럼) Seem like candidates for urgent encoding to me. Andrew
Re: [OT] Flerovium and livermorium get names on the periodic table of elements
Can't they be represented by fusion of other elements? ;-) Sent from my Verizon Wireless BlackBerry -Original Message- From: Andrew West andrewcw...@gmail.com Sender: unicode-bou...@unicode.org Date: Fri, 1 Jun 2012 23:50:42 To: Peter Constablepeter...@microsoft.com Cc: unicode@unicode.orgunicode@unicode.org Subject: Re: [OT] Flerovium and livermorium get names on the periodic table of elements On 1 June 2012 23:02, Peter Constable peter...@microsoft.com wrote: http://www.theverge.com/2012/6/1/3057261/flerovium-livermorium-periodic-table-of-elements There don't appear to have been any Chinese characters assigned to these two elements yet, but it is interesting to note that there are no simplified forms for eight of the elements with highest atomic numbers: 104 Rf 鑪 钅卢 105 Db 觀 钅杜 106 Sg 譎 钅喜 107 Bh 訏 钅波 108 Hs 譆 钅黑 109 Mt 䥑 钅麦 111 Rg 錀 钅仑 112 Cn 鎶 钅哥 which are represented with PUA characters at: http://zh.wikipedia.org/wiki/%E5%85%83%E7%B4%A0%E5%91%A8%E6%9C%9F%E8%A1%A8 and as components at: http://zh.wikipedia.org/wiki/%E6%89%A9%E5%B1%95%E5%85%83%E7%B4%A0%E5%91%A8%E6%9C%9F%E8%A1%A8 (110 Ds is already encoded in CJK-D as U+2B7FC 럼) Seem like candidates for urgent encoding to me. Andrew
RE: [OT] Flerovium and livermorium get names on the periodic table of elements
You mean like--if we considered characters such as 0321 or FE73 as character analogues of sub-atomic particles--bombarding other characters with the likes of 0321, FE73, etc.? P. -Original Message- From: texte...@xencraft.com [mailto:texte...@xencraft.com] Sent: June-01-12 4:09 PM To: Andrew West; unicode-bou...@unicode.org; Peter Constable Cc: unicode@unicode.org Subject: Re: [OT] Flerovium and livermorium get names on the periodic table of elements Can't they be represented by fusion of other elements? ;-) Sent from my Verizon Wireless BlackBerry -Original Message- From: Andrew West andrewcw...@gmail.com Sender: unicode-bou...@unicode.org Date: Fri, 1 Jun 2012 23:50:42 To: Peter Constablepeter...@microsoft.com Cc: unicode@unicode.orgunicode@unicode.org Subject: Re: [OT] Flerovium and livermorium get names on the periodic table of elements On 1 June 2012 23:02, Peter Constable peter...@microsoft.com wrote: http://www.theverge.com/2012/6/1/3057261/flerovium-livermorium-periodi c-table-of-elements There don't appear to have been any Chinese characters assigned to these two elements yet, but it is interesting to note that there are no simplified forms for eight of the elements with highest atomic numbers: 104 Rf 鑪 钅卢 105 Db 觀 钅杜 106 Sg 譎 钅喜 107 Bh 訏 钅波 108 Hs 譆 钅黑 109 Mt 䥑 钅麦 111 Rg 錀 钅仑 112 Cn 鎶 钅哥 which are represented with PUA characters at: http://zh.wikipedia.org/wiki/%E5%85%83%E7%B4%A0%E5%91%A8%E6%9C%9F%E8%A1%A8 and as components at: http://zh.wikipedia.org/wiki/%E6%89%A9%E5%B1%95%E5%85%83%E7%B4%A0%E5%91%A8%E6%9C%9F%E8%A1%A8 (110 Ds is already encoded in CJK-D as U+2B7FC 럼) Seem like candidates for urgent encoding to me. Andrew
Re: [OT] Flerovium and livermorium get names on the periodic table of elements
On 06/01/2012 07:09 PM, texte...@xencraft.com wrote: Can't they be represented by fusion of other elements? ;-) Sent from my Verizon Wireless BlackBerry Sure. Just like two Hafnium nuclei make a Holmium. (Meanwhile, Fl for Flerovium, I think it is? Like people aren't already confused as to whether Fluorine is F or Fl? Should have gone with Fv.) (Meanwhile meanwhile: Who's with me for pushing for a moseleium for http://en.wikipedia.com/wiki/Henry_Moseley ?) ~mark
Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))
The principales used in ISO 3166, and those used for the extension of language tags (with its locale extension subtags) could work as well. If the first need is to represent current country flags simply (ignoring the dated versions), and the first level of subdivisions in those countries, then ISO 3166 already provides the basic codes (we just need the convention that any codes that consists in two letters, or start by two letters, and hyphen must obey to ISO 3166-1 or ISO 3166-2. Further extensions will wait the development of a more complete registry, which will allow defining codes using other prefixes acting like namespaces. ISO 3166 also realsy has codes for private use, notably any code starting by X, so that the registry can preserve the use of the prefix X-, while keeping for itself some other prefix staring by X and another letter. These mechanisms are not really new and easy to understand as they work in other standards. We don't need to reinvent the wheel. 2012/6/2 Doug Ewell d...@ewellic.org: Philippe Verdy wrote: No, my poposal gives something that is immediately usable, and does not create any ambiguity. It is simple to implement even without the presence of a technical ligaturing solution. Those flags will be immediately usable, without any of the political complications created by the case of flags. It will avoid prolieferations of proposals, and infinite debates for encoding or not some flags, or for changing the representative glyphs. Again, not saying Unicode should do this, but: Doesn't there at least have to be a well-defined convention for representing flags before any of this works? How do I represent: 1. the flag of the United States 2. the flag of the state of Colorado 3. the flag of Adams County, Colorado 4. the flag of the city of Thornton Not all of these might be defined right away, but an extensible structure within which to define them would have to be in place. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
[OT] Flag coding (was: Re: Tags and future new technologies [...])
Philippe Verdy wrote: If the first need is to represent current country flags simply (ignoring the dated versions), and the first level of subdivisions in those countries, then ISO 3166 already provides the basic codes (we just need the convention that any codes that consists in two letters, or start by two letters, and hyphen must obey to ISO 3166-1 or ISO 3166-2. Further extensions will wait the development of a more complete registry, which will allow defining codes using other prefixes acting like namespaces. For flags belonging to nations and subnational entities, of course one would expect a flags code to use widely recognized standards, starting with ISO 3166. For my four examples, it might have: 1. the United States → US 2. the state of Colorado → US-CO 3. Adams County, Colorado → US-CO-001 (using FIPS 6-4; although that standard has been withdrawn, I can’t find what replaced it; other standards would be needed for second-level subdivisions of other countries) 4. the city of Thornton → US-THT (using UN/LOCODE) There are other possibilities. But this only tells part of the story; one would probably want the flags code to cover current or historical entities without standard code elements, such as the Holy Roman Empire or NATO, or other types of domains, such as maritime and military and auto racing and the Olympic Games and classical pirates (and maybe modern ones too). There would have to be a coding mechanism for this—not necessarily all the code elements, not right away, but a way to expand to include them. I think this is getting off-topic for Unicode, though I know Philippe thinks of it as the basis for a great addition to Unicode. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
A question about the default grapheme cluster boundaries with U+0020 as the grapheme base
It seems like there is an inconsistency between what the default grapheme clusters specification says and what the test results are expected to be: The UAX#29 says: Another key feature (of default Unicode grapheme clusters) is that bdefault Unicode grapheme clusters are atomic units with respect to the process of determining the Unicode default line, word, and sentence boundaries/b. Also this mentioned in UAX#14: Example 6. Some implementations may wish to tailor the line breaking algorithm to resolve grapheme clusters according to Unicode Standard Annex #29, “Unicode Text Segmentation” [UAX29], as a first stage. bGenerally, the line breaking algorithm does not create line break opportunities within default grapheme clusters/b; therefore such a tailoring would be expected to produce results that are close to those defined by the default algorithm. However, if such a tailoring is chosen, characters that are members of line break class CM but not part of the definition of default grapheme clusters must still be handled by rules LB9 and LB10, or by some additional tailoring. However, U+0020 (SP), U+0308 (CM) in the line breaking algorithm is handled by the rules LB10+LB18 and produces a break opportunity while GB9 prohibits break between U+0020 (Other), U+0308 (Entend). Section 9.2 Legacy Support for Space Character as Base for Combining Marks in UAX#29 clarifies why there is a line break occurs, but the fact that the statements above are false statements and introduce some ambiguility. In case the space character is not a grapheme base anymore the grapheme cluster breaking rules need to be updated. Kind regards, Konstantin