Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

2012-06-01 Thread William_J_G Overington
On Thursday 31 May 2012, Doug Ewell d...@ewellic.org wrote:

 William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote:

  Further to that point of order, is there any rule that absolutely prevents 
  the deprecated status of a character or collection of characters being 
  removed?

 UTC has not ever shown the slightest inclination to do so, if that answers 
 your question.

Thank you for replying.

What I was wondering about was whether if someone proposes U+E0002 for encoding 
for a future new technology, whether the fact that tags are currently 
deprecated would automatically stop that proposal being accepted for encoding 
because of perhaps some guarantee in the rules never to reverse deprecation or 
something like that.

  I feel that by hybridizing the suggestions of Doug and Philippe that an 
  elegant solution using tags and an advanced format font could be designed.

Thinking about this after posting and thinking of the vast coding space that 
could be opened up for flag encoding by just adding U+E0002 into regular 
Unicode, I began to think of the possibility of proposing the addition of 
U+E0007 so as to open up another encoding space where each item in that 
encoding space could be displayed either as a sequence of tag glyphs using an 
ordinary font, or displayed as one glyph by using glyph substitution technology 
with an advanced format font or displayed localized using a database technology 
with the item in that encoding space used as a key to the database.

I was thinking that the above would involve visible glyphs for the tag 
characters.

I was thinking of the possibilities, then I noticed something.

In a later post Philippe Verdy wrote as follows.

  (or in Place 14, but that plane is not intended for visible symbols).

Ah!

There is a font that has visible glyphs for the tag characters, together with a 
visible glyph for a Private Use Area tag-style character at U+2 available 
as a free download from the following forum post.

http://forum.high-logic.com/viewtopic.php?p=10587#p10587

William Overington

1 June 2012








Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

2012-06-01 Thread Philippe Verdy
Note that I absolutely do not advocate the reuse of language tags for
something else. They are deprecated and should remain deprecated. They
were not intended to be visible symbols.

I much prefer a solution that generates **true** symbols that can be
combined, and **optionally** (but safely) rendered as ligatures (by
design of the encoding itself) to render the true flags instead of
showing their code in the list of glyphs (the default rendering in
absence of recongnized ligatures).

The ligature-based solution can still be disabled to show the symbols
using a single ZWNJ format control in the middle of the sequence, but
this is for limited use. It is expected that these sequences of
symbols **should** be rendered as ligatures by default each time these
ligatures are recognized, i.e. when they match a flag code that has
been registered somewhere (in a separate registry which is not
immediately necessary for the encoding of this subset).

This new small subset should be trated as a new separate script, which
is definitely NOT Latin, as it will not support most other assumptions
and features of the Latin script, and it must not be treated at the
same level as the other surrounding Latin letters). Encoded sequences
are not breakable in the middle for word-breaking purpose.

In a limited plain-text environment, these codes could be rendered or
converted in a lossy way by remapping these symbols to the Basic Latin
block, surrounding them with punctuations like in [US] but it will be
only a last chance fallback.

This last-chance fallback conversion may be specified with a NFKC
decomposition mapping. For example this font compatibility mapping :

 XXX00 ; FLAG SYMBOL INITIAL HYPHEN ; ... ; So ; ... ; font005B 002D ;
 XXX01 ; FLAG SYMBOL INITIAL A ; ... ; So ; ... ; font005B 0041 ;
 XXX1A ; FLAG SYMBOL INITIAL Z ; ... ; So ; ... ; font005B 005A ;
 XXX20 ; FLAG SYMBOL INITIAL ZERO ; ... ; So ; ... ; font005B 0030 ;
 XXX29 ; FLAG SYMBOL INITIAL NINE ; ... ; So ; ... ; font005B 0039 ;
 ...
 XXX30 ; FLAG SYMBOL MEDIAL HYPHEN ; ... ; So ; ... ; font002D ;
 XXX31 ; FLAG SYMBOL MEDIAL A ; ... ; So ; ... ; font0041 ;
 XXX4A ; FLAG SYMBOL MEDIAL Z ; ... ; So ; ... ; font005A ;
 XXX50 ; FLAG SYMBOL MEDIAL ZERO ; ... ; So ; ... ; font0030 ;
 XXX59 ; FLAG SYMBOL MEDIAL NINE ; ... ; So ; ... ; font0039 ;
 ...
 XXX60 ; FLAG SYMBOL FINAL HYPHEN ; ... ; So ; ... ; font002D ;
 XXX61 ; FLAG SYMBOL FINAL A ; ... ; So ; ... ; font0041 005D ;
 XXX7A ; FLAG SYMBOL FINAL Z ; ... ; So ; ... ; font005A 005D ;
 XXX80 ; FLAG SYMBOL FINAL ZERO ; ... ; So ; ... ; font0030 005D ;
 XXX89 ; FLAG SYMBOL FINAL NINE ; ... ; So ; ... ; font0039 005D ;

(this also gives an hint for how to collate these symbols, and the
minimum size of the block to encode : 3 columns for each of the 3
subsets, including some code points reserved in each subsets for
additional punctuation-like symbols that may be needed to implement
namespaces in the registry of flags)

2012/6/1 William_J_G Overington wjgo_10...@btinternet.com:
 On Thursday 31 May 2012, Doug Ewell d...@ewellic.org wrote:

 William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote:

  Further to that point of order, is there any rule that absolutely prevents 
  the deprecated status of a character or collection of characters being 
  removed?

 UTC has not ever shown the slightest inclination to do so, if that answers 
 your question.

 Thank you for replying.

 What I was wondering about was whether if someone proposes U+E0002 for 
 encoding for a future new technology, whether the fact that tags are 
 currently deprecated would automatically stop that proposal being accepted 
 for encoding because of perhaps some guarantee in the rules never to reverse 
 deprecation or something like that.

  I feel that by hybridizing the suggestions of Doug and Philippe that an 
  elegant solution using tags and an advanced format font could be designed.

 Thinking about this after posting and thinking of the vast coding space that 
 could be opened up for flag encoding by just adding U+E0002 into regular 
 Unicode, I began to think of the possibility of proposing the addition of 
 U+E0007 so as to open up another encoding space where each item in that 
 encoding space could be displayed either as a sequence of tag glyphs using an 
 ordinary font, or displayed as one glyph by using glyph substitution 
 technology with an advanced format font or displayed localized using a 
 database technology with the item in that encoding space used as a key to the 
 database.

 I was thinking that the above would involve visible glyphs for the tag 
 characters.

 I was thinking of the possibilities, then I noticed something.

 In a later post Philippe Verdy wrote as follows.

  (or in Place 14, but that plane is not intended for visible symbols).

 Ah!

 There is a font that has visible glyphs for the tag characters, together with 
 a visible glyph for a Private Use Area tag-style character at U+2 
 

Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

2012-06-01 Thread Doug Ewell
William_J_G Overington wjgo underscore 10009 at btinternet dot com
wrote:

 What I was wondering about was whether if someone proposes U+E0002 for
 encoding for a future new technology, whether the fact that tags are
 currently deprecated would automatically stop that proposal being
 accepted for encoding because of perhaps some guarantee in the rules
 never to reverse deprecation or something like that.

These are my personal opinions. Please keep in mind I am not a UTC or
WG2 member, and have often been taken to task for trying to predict or
advise people what UTC or WG2 will or will not do.

1. There is probably no formal provision for automatic rejection of a
proposed new Plane 14 tag character. It would probably be at least
considered, not thrown away at the receptionist's desk.

2. Both the act of formally deprecating the Plane 14 tag mechanism, and
the comments I've seen on this list from UTC participants over the
years, suggest to me that a proposal for a new Plane 14 tag character
would be very unlikely to be approved.

3. Stating in a proposal that either this new tag character, or any
character, is being proposed for a future new technology may reduce
the likelihood that the proposal will be approved.

But the only way to find out for sure is to submit a proposal.

 Thinking about this after posting and thinking of the vast coding
 space that could be opened up for flag encoding by just adding U+E0002
 into regular Unicode, I began to think of the possibility of proposing
 the addition of U+E0007 so as to open up another encoding space where
 each item in that encoding space could be displayed either as a
 sequence of tag glyphs using an ordinary font, or displayed as one
 glyph by using glyph substitution technology with an advanced format
 font or displayed localized using a database technology with the item
 in that encoding space used as a key to the database.

My opinion is that nothing about the Unicode code space, including Plane
14 tags, is intended to serve as an indexing mechanism into another
standard.

 I was thinking that the above would involve visible glyphs for the tag
 characters.

My opinion is that, while a font may include glyphs for tag characters,
that is not the normal use case for tag characters.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­






Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

2012-06-01 Thread Philippe Verdy
2012/6/1 Doug Ewell d...@ewellic.org:
 My opinion is that, while a font may include glyphs for tag characters,
 that is not the normal use case for tag characters.

I have exactly the same position about glyphs found in fonts for any
format controls. They are not intended to be rendered, except in very
specific technical contexts, or using some fallback mechanism if their
intended function is not supported or implemented, and one wants to
still be able to edits texts containing them (using a visible
controls edit mode).



RE: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

2012-06-01 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

 Note that I absolutely do not advocate the reuse of language tags for
 something else. They are deprecated and should remain deprecated. They
 were not intended to be visible symbols.

Just as a matter of terminology, the deprecated Plane 14 block is for
tags and not just for language tags. The idea for such a block did
come from the proposal to support inline language tagging, and the only
defined type of tag is U+E0001 LANGUAGE TAG, but other tags could have
been introduced later for other purposes. By deprecating the entire
block and not just U+E0001, UTC essentially deprecated the whole tag
concept.

 I much prefer a solution that generates **true** symbols that can be
 combined, and **optionally** (but safely) rendered as ligatures (by
 design of the encoding itself) to render the true flags instead of
 showing their code in the list of glyphs (the default rendering in
 absence of recongnized ligatures).

I wish we would use some other term for these than ligatures. They are
definitely not ligatures in the sense that any typographer, sign
painter, or reader would think of them. A picture of a French flag has
no imaginable visual relationship to the letter F or the letter R.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­






RE: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

2012-06-01 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

 Just as a matter of terminology, the deprecated Plane 14 block is for
 tags and not just for language tags. The idea for such a block
 did come from the proposal to support inline language tagging, and
 the only defined type of tag is U+E0001 LANGUAGE TAG, but other tags
 could have been introduced later for other purposes. By deprecating
 the entire block and not just U+E0001, UTC essentially deprecated the
 whole tag concept.

 Fine. But the Plane 14 was not deprecated at the same time as a whole.

 Anyway, given that I propose symbols, they are NOT tags. I haev no
 opinion however about which plane should be used to allocated them.
 The plane 14 is fine for me, like any other plane (except the BMP and
 the SIP), even if they are not tags.

 You seem to think that the whole plane is for tags. I don't think so.
 Only the **existing** blocks assigned in Plane 14 are deprecated.

No, I said the block was deprecated, not the plane. The deprecated
Plane 14 block meant the deprecated block which is in Plane 14.
Indeed, there are 240 variation selectors in Plane 14 which are not
deprecated.

 They are definitely not ligatures in the sense that any typographer,
 sign painter, or reader would think of them.

 You're right, in terms of typography. But all the technologies used
 for producing the ligatures are perfectly usable here to give the
 desired effect, with the same usage policies : they will remain
 optional, even if they are desirable (and should be enabled by
 default, just like the LAM-ALEF ligature in the Arabic script).

I accept that the technology for making a font and rendering engine
perform this visual transformation is the same as that used to combine
letters into typographical ligatures. Font guys can look at it that way.
I think if Unicode does embark on something like this—not to say they
should—or to the extent they already have with the Regional Indicator
Symbols, they should avoid the word ligature, and in fact the passage
on page 534 of TUS 6.1 simply talks about how those symbols could be
rendered.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­






Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

2012-06-01 Thread Asmus Freytag
Coding solutions that require substantial support across implementations 
are successful, if (and I argue, only if) you can't successfully sell 
your implementation in a given market without support for that feature.


Mathematical layout is not needed by the majority of users, but those 
users that do need it, can't be accommodated with a substitute. Hence, 
anyone trying to sell into that market has to make a decent job of it. 
Looks like there are enough people in that market that even general 
purpose software, like Word, has a decent (nay, excellent) equation editor.


Arabic shaping is so essential to the script that you either support it, 
or you don't support Arabic.


Placing accents on Latin characters is widely needed, but the most 
widely needed cases are covered by precomposed legacy characters. Hence. 
the support for this feature is spotty. Curiously, this remains the 
case, even though, taken together, the diverse users of particular 
combinations of letters and accents for the Latin/Greek/Cyrillic 
probably reach substantial numbers, and a common solution would seem to 
support all of them.


Support for Ideographic Variation Sequences is needed for all sorts of 
high-end CJK work. It can be expected to be supported in those market 
areas, but probably not necessarily in mainstream implementations. Time 
will tell.


And so on.

The chances that any form of meta encoding for symbols (including 
ligation) will ever reach critical mass in support is less than for 
Latin/Greek/Cyrillic accents, because - as of today - there's no 
established use for any of these schemes.


All of these things remain solutions in search of a problem.

The interesting thing I note is the level of enthusiasm with which these 
are discussed here, when, at the same time, a lowly single character 
currency symbol, with no special meta-coding, layout support, algorithm 
changes, etc. was so roundly dismissed - despite all the evidence that 
not supporting it in face of user demands would impact the ability of 
implementers to sell into a not insubstantial market.


Sometimes I wonder what's going on ...

A./



Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

2012-06-01 Thread Philippe Verdy
In addition, I am firmly convinced that the renderers used in browsers
will be able to synthetize themselves the flags according to their
wellknow ISO 31166-1 codes, in absence of font support: this will just
require for them to ship a small collection of SVG graphics (something
that is already widely available). This will be valid substitution
immediately, in absence of a more general technology based on an
external registry, and of support in fonts.

The technical needs for developping it in renderer software is very
small. It will also be easy to test as there's no complication.



Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

2012-06-01 Thread Asmus Freytag

On 6/1/2012 12:01 PM, Philippe Verdy wrote:

2012/6/1 Asmus Freytagasm...@ix.netcom.com:

The chances that any form of meta encoding for symbols (including ligation)
will ever reach critical mass in support is less than for
Latin/Greek/Cyrillic accents, because - as of today - there's no established
use for any of these schemes.

All of these things remain solutions in search of a problem.

No, my poposal gives something that is immediately usable, and does
not create any ambiguity. It is simple to implement even without the
presence of a technical ligaturing solution.


It's still a solution in search of a problem.

There's no demand out there for this feature.

A./





Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

2012-06-01 Thread Philippe Verdy
There's at least the demand coming from their use as Emoji. Attested
as well in many books and many applications (not always colorful).

May be the UTC did not receive aformal request before, but the demand
REALLY exists for the encoding of flags in plain text (not just rich
texts). They are semantically significant and are not just a question
of presentation.

2012/6/1 Asmus Freytag asm...@ix.netcom.com:
 On 6/1/2012 12:01 PM, Philippe Verdy wrote:
 There's no demand out there for this feature.



Shift-JIS encoded text (was: RE: Tags and future new technologies [...])

2012-06-01 Thread Doug Ewell
Peter Constable petercon at microsoft dot com wrote:

 The only requirement of Unicode was to provide a way to map Shift-JIS
 encoded text involving emoji to Unicode / 10646 in a way that could be
 round-tripped,

This is the part that has always confused me. At what point does text
encoded in a vendor's private-use extension to Shift-JIS become
Shift-JIS encoded text? Because I know for sure that I'm not supposed
to refer to characters assigned to the Unicode PUA, my own or anyone
else's, as being encoded in Unicode.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­






Re: Shift-JIS encoded text (was: RE: Tags and future new technologies [...])

2012-06-01 Thread Philippe Verdy
2012/6/1 Doug Ewell d...@ewellic.org:
 Peter Constable petercon at microsoft dot com wrote:

 The only requirement of Unicode was to provide a way to map Shift-JIS
 encoded text involving emoji to Unicode / 10646 in a way that could be
 round-tripped,

 This is the part that has always confused me. At what point does text
 encoded in a vendor's private-use extension to Shift-JIS become
 Shift-JIS encoded text? Because I know for sure that I'm not supposed
 to refer to characters assigned to the Unicode PUA, my own or anyone
 else's, as being encoded in Unicode.

May be because, without admitting it publicly, those symbols really
have a much wider use than in these private Shift-JIs extensions.

In which case, the need for roundtrip compatibility is definitely not
the main reason for their encoding, and these symbols should be
considered more globally (as they are certainly needed in other
countries or for other private implementations, but without the
interoperability that one could expect between these implementations
when they obviously mean the same thing and play the same role in
texts including them).

The private extension is just a sign that it was needed. The pressure
to include them in standard Shift-JIS is another sign, and then the
need to map them as well into the UCS, via their standardization in
Shift-JIS, whever it succeeds or not in that standard).

Of course, encoding flags visually in an international standard is
much more difficult, if one wants to encode some flags and not some
others, also because of political issues. That's why I propose another
way to represent them. This won't affect the private-use Shift-JIS
encoding, which can now have a roundtrip compatibility with its
existing symbols, even if the standard Shift-JIS will now prefer using
the more generic symbols instead of integrating the private-use
extension.



Re: Shift-JIS encoded text (was: RE: Tags and future new technologies [...])

2012-06-01 Thread Ken Whistler

On 6/1/2012 1:51 PM, Doug Ewell wrote:

At what point does text
encoded in a vendor's private-use extension to Shift-JIS become
Shift-JIS encoded text?


A possibly less confusing way to put this is:

At what point does text encoded in a vendor's private-use extension
to *JIS X 0208* become Shift-JIS encoded text?

The reason for putting it that way is that JIS X 0208 is a character
encoding standard. It defines the repertoire of characters and
assigns numbers to them.

But 2022-JP, EUC-JP, and Shift-JIS are then 3 different ways of
turning JIS X 0208 character codes (and possibly vendor or other
extensions) into streams of bytes. Think of them as character encoding
schemes (in the Unicode character encoding model sense).

One of the reasons why there are many Shift-JIS's is not that the
principle of how to shift JIS X 0208 code values into bytes changes,
but because there are many different private extensions, all making
use of the same general principle for how to move the byte values
into a particular scheme for processing.

In summary, Shift-JIS is not a character encoding standard -- it is
a scheme for turning JIS (and various extensions) into a particular
format for processing.

--Ken




[OT] Flerovium and livermorium get names on the periodic table of elements

2012-06-01 Thread Peter Constable
FYI – I know at least some folk here will find this of interest:

http://www.theverge.com/2012/6/1/3057261/flerovium-livermorium-periodic-table-of-elements



Peter


RE: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

2012-06-01 Thread Doug Ewell
Philippe Verdy wrote:

 No, my poposal gives something that is immediately usable, and does
 not create any ambiguity. It is simple to implement even without the
 presence of a technical ligaturing solution. Those flags will be
 immediately usable, without any of the political complications created
 by the case of flags. It will avoid prolieferations of proposals, and
 infinite debates for encoding or not some flags, or for changing the
 representative glyphs.

Again, not saying Unicode should do this, but:

Doesn't there at least have to be a well-defined convention for
representing flags before any of this works? How do I represent:

1. the flag of the United States
2. the flag of the state of Colorado
3. the flag of Adams County, Colorado
4. the flag of the city of Thornton

Not all of these might be defined right away, but an extensible
structure within which to define them would have to be in place.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­






RE: Shift-JIS encoded text (was: RE: Tags and future new technologies [...])

2012-06-01 Thread Doug Ewell
I hadn't thought that Peter was talking about text encoded according to
the Shift-JIS model, without specifying the encoding. I'm not sure that
changes my question.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­

 
 Original Message 
Subject: Re: Shift-JIS encoded text (was: RE: Tags and future new
technologies [...])
From: Ken Whistler k...@sybase.com
Date: Fri, June 01, 2012 3:17 pm
To: unicode@unicode.org

On 6/1/2012 1:51 PM, Doug Ewell wrote:
 At what point does text
 encoded in a vendor's private-use extension to Shift-JIS become
 Shift-JIS encoded text?

A possibly less confusing way to put this is:

At what point does text encoded in a vendor's private-use extension
to *JIS X 0208* become Shift-JIS encoded text?

The reason for putting it that way is that JIS X 0208 is a character
encoding standard. It defines the repertoire of characters and
assigns numbers to them.

But 2022-JP, EUC-JP, and Shift-JIS are then 3 different ways of
turning JIS X 0208 character codes (and possibly vendor or other
extensions) into streams of bytes. Think of them as character encoding
schemes (in the Unicode character encoding model sense).

One of the reasons why there are many Shift-JIS's is not that the
principle of how to shift JIS X 0208 code values into bytes changes,
but because there are many different private extensions, all making
use of the same general principle for how to move the byte values
into a particular scheme for processing.

In summary, Shift-JIS is not a character encoding standard -- it is
a scheme for turning JIS (and various extensions) into a particular
format for processing.

--Ken





Re: [OT] Flerovium and livermorium get names on the periodic table of elements

2012-06-01 Thread Philippe Verdy
2012/6/2 Peter Constable peter...@microsoft.com:
 FYI – I know at least some folk here will find this of interest:

 http://www.theverge.com/2012/6/1/3057261/flerovium-livermorium-periodic-table-of-elements

Well they are already in the tables shown in Wikipedia (the English,
French pages at least). Time for inclusion in Wikitionnary (unless
this is already done for these names, but some languages will need
transliterations)...




Re: [OT] Flerovium and livermorium get names on the periodic table of elements

2012-06-01 Thread Andrew West
On 1 June 2012 23:02, Peter Constable peter...@microsoft.com wrote:

 http://www.theverge.com/2012/6/1/3057261/flerovium-livermorium-periodic-table-of-elements

There don't appear to have been any Chinese characters assigned to
these two elements yet, but it is interesting to note that there are
no simplified forms for eight of the elements with highest atomic
numbers:

104 Rf 鑪 钅卢
105 Db 觀 钅杜
106 Sg 譎 钅喜
107 Bh 訏 钅波
108 Hs 譆 钅黑
109 Mt 䥑 钅麦
111 Rg 錀 钅仑
112 Cn 鎶 钅哥

which are represented with PUA characters at:

http://zh.wikipedia.org/wiki/%E5%85%83%E7%B4%A0%E5%91%A8%E6%9C%9F%E8%A1%A8

and as components at:

http://zh.wikipedia.org/wiki/%E6%89%A9%E5%B1%95%E5%85%83%E7%B4%A0%E5%91%A8%E6%9C%9F%E8%A1%A8

(110 Ds is already encoded in CJK-D as U+2B7FC 럼)

Seem like candidates for urgent encoding to me.

Andrew




Re: [OT] Flerovium and livermorium get names on the periodic table of elements

2012-06-01 Thread textexin
Can't they be represented by fusion of other elements?

;-)
Sent from my Verizon Wireless BlackBerry

-Original Message-
From: Andrew West andrewcw...@gmail.com
Sender: unicode-bou...@unicode.org
Date: Fri, 1 Jun 2012 23:50:42 
To: Peter Constablepeter...@microsoft.com
Cc: unicode@unicode.orgunicode@unicode.org
Subject: Re: [OT] Flerovium and livermorium get names on the periodic table of 
elements

On 1 June 2012 23:02, Peter Constable peter...@microsoft.com wrote:

 http://www.theverge.com/2012/6/1/3057261/flerovium-livermorium-periodic-table-of-elements

There don't appear to have been any Chinese characters assigned to
these two elements yet, but it is interesting to note that there are
no simplified forms for eight of the elements with highest atomic
numbers:

104 Rf 鑪 钅卢
105 Db 觀 钅杜
106 Sg 譎 钅喜
107 Bh 訏 钅波
108 Hs 譆 钅黑
109 Mt 䥑 钅麦
111 Rg 錀 钅仑
112 Cn 鎶 钅哥

which are represented with PUA characters at:

http://zh.wikipedia.org/wiki/%E5%85%83%E7%B4%A0%E5%91%A8%E6%9C%9F%E8%A1%A8

and as components at:

http://zh.wikipedia.org/wiki/%E6%89%A9%E5%B1%95%E5%85%83%E7%B4%A0%E5%91%A8%E6%9C%9F%E8%A1%A8

(110 Ds is already encoded in CJK-D as U+2B7FC 럼)

Seem like candidates for urgent encoding to me.

Andrew






RE: [OT] Flerovium and livermorium get names on the periodic table of elements

2012-06-01 Thread Peter Constable
You mean like--if we considered characters such as 0321 or FE73 as character 
analogues of sub-atomic particles--bombarding other characters with the likes 
of 0321, FE73, etc.?


P.

-Original Message-
From: texte...@xencraft.com [mailto:texte...@xencraft.com] 
Sent: June-01-12 4:09 PM
To: Andrew West; unicode-bou...@unicode.org; Peter Constable
Cc: unicode@unicode.org
Subject: Re: [OT] Flerovium and livermorium get names on the periodic table of 
elements

Can't they be represented by fusion of other elements?

;-)
Sent from my Verizon Wireless BlackBerry

-Original Message-
From: Andrew West andrewcw...@gmail.com
Sender: unicode-bou...@unicode.org
Date: Fri, 1 Jun 2012 23:50:42
To: Peter Constablepeter...@microsoft.com
Cc: unicode@unicode.orgunicode@unicode.org
Subject: Re: [OT] Flerovium and livermorium get names on the periodic table of 
elements

On 1 June 2012 23:02, Peter Constable peter...@microsoft.com wrote:

 http://www.theverge.com/2012/6/1/3057261/flerovium-livermorium-periodi
 c-table-of-elements

There don't appear to have been any Chinese characters assigned to these two 
elements yet, but it is interesting to note that there are no simplified forms 
for eight of the elements with highest atomic
numbers:

104 Rf 鑪 钅卢
105 Db 觀 钅杜
106 Sg 譎 钅喜
107 Bh 訏 钅波
108 Hs 譆 钅黑
109 Mt 䥑 钅麦
111 Rg 錀 钅仑
112 Cn 鎶 钅哥

which are represented with PUA characters at:

http://zh.wikipedia.org/wiki/%E5%85%83%E7%B4%A0%E5%91%A8%E6%9C%9F%E8%A1%A8

and as components at:

http://zh.wikipedia.org/wiki/%E6%89%A9%E5%B1%95%E5%85%83%E7%B4%A0%E5%91%A8%E6%9C%9F%E8%A1%A8

(110 Ds is already encoded in CJK-D as U+2B7FC 럼)

Seem like candidates for urgent encoding to me.

Andrew






Re: [OT] Flerovium and livermorium get names on the periodic table of elements

2012-06-01 Thread Mark E. Shoulson

On 06/01/2012 07:09 PM, texte...@xencraft.com wrote:

Can't they be represented by fusion of other elements?

;-)
Sent from my Verizon Wireless BlackBerry


Sure.  Just like two Hafnium nuclei make a Holmium.

(Meanwhile, Fl for Flerovium, I think it is?  Like people aren't already 
confused as to whether Fluorine is F or Fl?  Should have gone with Fv.)


(Meanwhile meanwhile: Who's with me for pushing for a moseleium for 
http://en.wikipedia.com/wiki/Henry_Moseley ?)


~mark



Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

2012-06-01 Thread Philippe Verdy
The principales used in ISO 3166, and those used for the extension of
language tags (with its locale extension subtags) could work as well.

If the first need is to represent current country flags simply
(ignoring the dated versions), and the first level of subdivisions in
those countries, then ISO 3166  already provides the basic codes (we
just need the convention that any codes that consists in two letters,
or start by two letters, and hyphen must obey to ISO 3166-1 or ISO
3166-2. Further extensions will wait the development of a more
complete registry, which will allow defining codes using other
prefixes acting like namespaces.

ISO 3166 also realsy has codes for private use, notably any code
starting by X, so that the registry can preserve the use of the
prefix X-, while keeping for itself some other prefix staring by X
and another letter.

These mechanisms are not really new and easy to understand as they
work in other standards. We don't need to reinvent the wheel.

2012/6/2 Doug Ewell d...@ewellic.org:
 Philippe Verdy wrote:

 No, my poposal gives something that is immediately usable, and does
 not create any ambiguity. It is simple to implement even without the
 presence of a technical ligaturing solution. Those flags will be
 immediately usable, without any of the political complications created
 by the case of flags. It will avoid prolieferations of proposals, and
 infinite debates for encoding or not some flags, or for changing the
 representative glyphs.

 Again, not saying Unicode should do this, but:

 Doesn't there at least have to be a well-defined convention for
 representing flags before any of this works? How do I represent:

 1. the flag of the United States
 2. the flag of the state of Colorado
 3. the flag of Adams County, Colorado
 4. the flag of the city of Thornton

 Not all of these might be defined right away, but an extensible
 structure within which to define them would have to be in place.

 --
 Doug Ewell | Thornton, Colorado, USA
 http://www.ewellic.org | @DougEwell ­






[OT] Flag coding (was: Re: Tags and future new technologies [...])

2012-06-01 Thread Doug Ewell

Philippe Verdy wrote:


If the first need is to represent current country flags simply
(ignoring the dated versions), and the first level of subdivisions in
those countries, then ISO 3166  already provides the basic codes (we
just need the convention that any codes that consists in two letters,
or start by two letters, and hyphen must obey to ISO 3166-1 or ISO
3166-2. Further extensions will wait the development of a more
complete registry, which will allow defining codes using other
prefixes acting like namespaces.


For flags belonging to nations and subnational entities, of course one 
would expect a flags code to use widely recognized standards, starting 
with ISO 3166. For my four examples, it might have:


1. the United States → US
2. the state of Colorado → US-CO
3. Adams County, Colorado → US-CO-001 (using FIPS 6-4; although that 
standard has been withdrawn, I can’t find what replaced it; other 
standards would be needed for second-level subdivisions of other 
countries)

4. the city of Thornton → US-THT (using UN/LOCODE)

There are other possibilities. But this only tells part of the story; 
one would probably want the flags code to cover current or historical 
entities without standard code elements, such as the Holy Roman Empire 
or NATO, or other types of domains, such as maritime and military and 
auto racing and the Olympic Games and classical pirates (and maybe 
modern ones too). There would have to be a coding mechanism for this—not 
necessarily all the code elements, not right away, but a way to expand 
to include them.


I think this is getting off-topic for Unicode, though I know Philippe 
thinks of it as the basis for a great addition to Unicode.


--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­




A question about the default grapheme cluster boundaries with U+0020 as the grapheme base

2012-06-01 Thread Konstantin Ritt
It seems like there is an inconsistency between what the default
grapheme clusters specification says and what the test results are
expected to be:

The UAX#29 says:
 Another key feature (of default Unicode grapheme clusters) is that bdefault 
 Unicode grapheme clusters are atomic units with respect to the process of 
 determining the Unicode default line, word, and sentence boundaries/b.
Also this mentioned in UAX#14:
 Example 6. Some implementations may wish to tailor the line breaking 
 algorithm to resolve grapheme clusters according to Unicode Standard Annex 
 #29, “Unicode Text Segmentation” [UAX29], as a first stage. bGenerally, the 
 line breaking algorithm does not create line break opportunities within 
 default grapheme clusters/b; therefore such a tailoring would be expected 
 to produce results that are close to those defined by the default algorithm. 
 However, if such a tailoring is chosen, characters that are members of line 
 break class CM but not part of the definition of default grapheme clusters 
 must still be handled by rules LB9 and LB10, or by some additional tailoring.

However, U+0020 (SP), U+0308 (CM) in the line breaking algorithm is
handled by the rules LB10+LB18 and produces a break opportunity while
GB9 prohibits break between U+0020 (Other), U+0308 (Entend).
Section 9.2 Legacy Support for Space Character as Base for Combining
Marks in UAX#29 clarifies why there is a line break occurs, but the
fact that the statements above are false statements and introduce some
ambiguility.
In case the space character is not a grapheme base anymore the
grapheme cluster breaking rules need to be updated.

Kind regards,
Konstantin