date:20120531

2012/5/30 Martin J. Dürst due...@it.aoyama.ac.jp:
 On 2012/05/30 4:42, Roozbeh Pournader wrote:

 Just look what happened when the Japanese did their own font/character set
 hack. The backslash/yen problem is still with us, to this day...


 To be fair, the Japanese Yen at 0x5C was there long before Unicode, in the
 Japanese version of ISO 646. That it has remained as a font hack is very
 unfortunate, but for that, not only the Japanese, but also major
 international vendors are to blame.

As long as it was part of the Japanese version of ISO 646 (which
itself was only the first page of the SJIS encoding), there was
absolutely NO problem at all. This was not different from the
situation of all other national versions of ISO 646, which were all
distinct encodings.

The situation became a problem when the Japanese ISO 646 started to be
mapped to Unicode/ISO/IEC 10646 within fonts using incorrect mappings.
This occured in the early stages of ISO/IEC 10646 development.

And unfortunately several OSes for Japan used those incorrect
mappings, assuming that it was still safe to convert blindly texts
containing backslashes by showing yen symbols instead, just like the
same systems blindly converted US-ASCII (American version of ISO 646)
into SJIS with broken algorithms, simply because those softwares could
not really work with Unicode but still worked only with SJIS, and did
not track correctly which source encoding was used.

This would have probably not occured if Japan had defined and
standardized an ISO 8859 version for mapping the Yen out of ASCII
(along with basic Kana letters and Asian punctuations); but they
prefered to develop only SJIS to support Kanjis (and later the
emerging UCS remapped on it). And it would also have offered an easier
migration.

They were ambitious at the beginning, but the ambition was premature
when the surrounding technologies to support a large character set was
still very incomplete (forcing a lot of software to use unsafe/lossy
remappings to a smaller character sets). So for several decennials,
there has been a lot of interoperability problems caused by the
various implementations of SJIS, many of them not compatible with each
other in their limitations or in the way the simplifications were
applied to support different parts of it.

The backslash character, though it was common in many programming
languages and OSes, then appeared to be replaced there by the yen
symbol, and people were trained with it (for example when using
pathnames in DOS/Windows filesystems, or when using the yen symbol as
the escaping prefix when programming in C/C++); and it was then
perceived that the backslash was for them a variant form (of their yen
symbol) that they did not need (SJIS was later adapted to map the
backslash somewhere else, but the SJIS users did not immediately fix
it).

As a result, the mapping of 0x5C in SJIS has always been ambiguous,
depending on the implementations, but it has never been ambiguous in
the Japanese version of ISO 646, that did not include the backslash.

So don't criticize ISO 646, there was no problem there. The problem is
fully within the early versions of SJIS which allowed such variation
of glyphs, when it should have considered the yen symbol and the
backslash as distinct abstract characters requiring separate mappings.

But who uses the Japanese version of ISO 646 now in Japan ? Only SJIS
seems to survive now, with all its intrinsic ambiguities and its many
incompatible implementations (whose exact versions are most often not
identified correctly in most softwares).

The Japanese NB should have stopped this nightmare by fixing a rule to
strongly deprecate (and remove all past recommandations), so that only
one version of SJIS should survive, and that old data encoded with
ambiguous SJIS version being left in their blackbox :

It would have been simpler and more effective for the Japanease NB to
rename the SJIS standard for the only remaining version, such as
UJIS (U for Universal, meaning that it has a full roundtrip
compatibility with the UCS and no longer any ambiguity allowed) and
then freeze it completely at this state (all other developments being
made in the UCS), with a strong recommandation to NOT perform any
blind conversion to UJIS or interpretation as UJIS of any past data
encoded for an unversioned SJIS : all ambiguous characters in these
old data should be detected as ambiguous, meaning that the
document/data  was not convertible without proper versioning.

This would have forced also the various private software makers and
manufacturers that had used their own version of SJIS to register
again to the Japanese NB a SINGLE (and unique) string recommanded to
identify their implementation of SJIS, removing all past known aliases
that were also ambiguous between each other, so that the effective
encofing old data could be uniquely identified and would then become
uniquely convertible first to the national standard UJIS, then to the
UCS by its

Re: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)

2012/5/31 Doug Ewell d...@ewellic.org:
 A seemingly straightforward solution to the “unambiguous mapping” problem
 would be to use the existing Plane 14 tag letters along with a new FLAG TAG,
 say at U+E0002. Then E0002, E0043, E0048 would unequivocally denote the
 current Swiss flag. No need for separate lead and trail. Simple.

 ... What’s that? Oh, sorry, never mind. Deprecated.

Not necessaryly: you could very well have sets of characters with
unambigous glyphs showing the ASCII capital letter martly enclosed:
- in a first set, it encloses the letter on the left/top/bottom sides
with the strokes that start displaying the flag (this glyph could also
include the pole)
- in a second set, it encloses the letter only on the top/bottom sides
- in the third set, it encloses the letter only on the top/bottom/right sides.

Let's not forget that even if countries do not change, and keep their
ISO 3166-1 code, their flag may change over time. So a flag encoded
with such characters should contain a year of their first official use
: this would require mapping in the second set the colon : and
digits for specifying the year, and mapping in the last set the digits
as well. The colon and digits are a priori not needed in the first
set.

So to represent the flag of Japan, you could encode:

FLAG INITIAL SYMBOL J
FLAG FINAL SYMBOL P

But if you want to use explicitly the post-1945 flag (and not the
imperial flag with sun rays), you would encode:

FLAG INITIAL SYMBOL J
FLAG MEDIAL SYMBOL P
FLAG MEDIAL SYMBOL COLON
FLAG MEDIAL SYMBOL ONE
FLAG MEDIAL SYMBOL NINE
FLAG MEDIAL SYMBOL FOUR
FLAG MEDIAL SYMBOL SIX

Which would render mostly like this, if there's no ligature defined
(several lines used here to approximate the glyphs) :

 +–––\
  | J P : 1 9 4 6   
 +–––/
  |

Here again, a font-defined ligature (if available) could remap it to
the actual flag.

A font can then eaqily be made, with the only constraint that the
glyphs in them should join theses enclosures. If needed, those fonts
can then create ligatures for wellknown flags, showing their apparent
goemetry. The pole could be also removed, and colors added if
supported by the font technology, or replaced by hatches in a basic
monochromatic font technology.

All these would remain standard symbols (they are superficially
letter-like except that the standard can say that the letters shown
in the enclosing glyphs are only used as a default fallback, but
ligatures can SAFELY replace them by the actual flag, including with
its true colors. The renderer can use the color capavilities of fonts,
if the font format supports it, or a set of icons (e.g. encoded in a
zipped archive containing SVG files an a small maping files
identifying the flag codes with the name of a SVG file, or within a
single SVG file, containing this mapping internally and mapping this
code to an internal XML anchor ID's using standard XML href's)

Note that OpenType currently does not contain any standard allowing to
map true colors used in glyphs, but there's nothing in OpenType that
prevents a font to expose several glyph variants for mapping the same
characters (or their defined ligatures) : a monochromatic version like
today, and with a new OpenType feature, a colorful version, with an
extra table found in the font that exposes the color mapping either
into an sRGBA color, or to a hatched filling pattern exposed as well
by the font as a rectangle glyph with metrics (and possibly an angle
relative to the baseline).

I am still surprised to see that OpenType still does not include such
standard. Note that hatching patterns will be defined using the
em-square of glyphs assigned to characters and ligatures, so they will
scale the same way, and would be frid-fitted and hinted the same way.
A separate definition of patterns would simplify the design of colored
fonts, as the same glyph geometries would be used. But there could
also be a separate monochromatic glyph to be defined as well in the
same font, in such a way that the glyph is defined with the pattern
integrated to its geometry.

And that CSS for example could specify a way to indicate that the
rendered characters should not use an sRGBA color (with the hatching
pattern defined in the font) but the natural colors defined by the
glyphs themselves: this would require only a new value for color:
natural. If the font does not define any natural color for its
mapped glyphs, or the glyphs do not map any hatching patterns, then
this CSS value would be interpreted as if it was color:inherit. An
extended version could be also color: natural #rrggbb : the #rrggbb
would still allow to specify the color to use if there's no natural
color in the font, or if the colors defined in the font (those that
are marked as being important) are incompatible or not easily
distinguished with the current background (according to user's
preferences), or not accessible to the user (also according to his
preferences) : in which case the renderer would use

Re: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)

Also there should exist somewhere a registry of known flag codes.
There are wellknown vexillologic sites that list large collections of
flags, but for now they still did not develop a standard (ASCII-based)
codification.

In my opinion, this codication should just need BASIC LATIN CAPITAL
LETTERs, Arabo-European digits, the ASCII HYPHEN as a separator for
country/region subcodes, and the colon and dot for versioning/dating,
and it should be based on ISO 3166-1 (using extension/private codes
for historic countries or regions that are not encoded in ISO 3166)

Such registry should contain a search form for codes, showing the
designs, the preferred aspect ratio metric, the color mappings, and if
the flag itself is protected by some copyright restrictions (this
won't limit the usage of fallback glyphs (showing letters in an
enclosing blank flag) showing just the code in free fonts that do not
want to violate these copyright restrictions, when they will still
define some ligatures for flag designs that are free from those
restrictions.

But this registry does not have to be defined and maintained by the
Unicode Consortium or by ISO, unless they have the desire to develop
it. In any case, it is not necessary to make it part of the Unicode
and ISO/IEC 10646 standards themselves (but there could be an
informative reference to the registry, to help font developers.

Re: Flag tags

2012/5/31 Asmus Freytag asm...@ix.netcom.com:
 On 5/30/2012 7:19 PM, Philippe Verdy wrote:

 2012/5/31 Michael Eversonever...@evertype.com:

 On 31 May 2012, at 00:24, Mark Davis ☕ wrote:
 Members of ISO National Bodies quite properly thought that it is
 inapprioprate for an International Standard to encode the flags of some
 countries and not the flags of others. You can stuff your condescension,
 Mark.

 I fully agree. Either all of them or none of them (or just a generic
 white flag).

 No at least the black pirate flag, and the checkered flag (for car racing).

There are two black pirate flags. One is all black (the most generic
one), another has bones and skullhead. OK these ones are generic
enough to not convey country/territory specific information.

There are also conventional sky blue flags used in Europe (may be
elsewhere) for the quality of waters. There may be others used for
signaling (including surveillance of beaches and dangers for swimming
: red, orange, green) : may be unified with the all-black flag (if
color is not really encoded but assignable by external styles).

If you add the flag cor car racing, then why wouldn't there flags used
in other transportation areas ?

Add also flags used as maritime alphabets (they are a true script by
themselves, whose mapping to actual letters depend on the locale's
script, so they are not really a visual variant of any script, just
like the Braille script is not tied to Latin), or othe ideographic
flags displayed much like the pirate flag (e.g. signaling deceases on
board)...

Re: Flag tags

On 31 May 2012, at 04:49, Asmus Freytag wrote:

 On 5/30/2012 7:19 PM, Philippe Verdy wrote:
 2012/5/31 Michael Eversonever...@evertype.com:
 Members of ISO National Bodies quite properly thought that it is 
 inapprioprate for an International Standard to encode the flags of some 
 countries and not the flags of others. [...]
 
 I fully agree. Either all of them or none of them (or just a generic white 
 flag).
 
 No at least the black pirate flag, and the checkered flag (for car racing).
 
 Those would constitute the minimum useful set.

U+2690 WHITE FLAG
U+2691 BLACK FLAG
U+26FF WHITE FLAG WITH HORIZONTAL MIDDLE BLACK STRIPE
U+1F38C CROSSED FLAGS
1F3C1 CHEQUERED FLAG

We are missing the JOLLY ROGER.

Michael Everson * http://www.evertype.com/

Re: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)

2012-05-31 Thread Andrew West

On 31 May 2012 00:24, Mark Davis ☕ m...@macchiato.com wrote:

 There is definitely a problem.

Is it really such a problem?  Why can't implementations simply use
ZWSP to demarcate the 2-character units in a sequence of more than two
regional indicator symbols (and maybe always emit 2-character codes
wrapped between ZWSP on either side to be safe), so for example
USZWSPESZWSPGE would be parsed as the regional indicator symbols
for USA, SPAIN and Georgia, whereas UZWSPSEZWSPSGZWSPE would be
parsed as the regional indicator symbols for U (invalid), Sweden,
Singapore and E (invalid).  Algorithms such as line-breaking would not
break between two regional indicator symbols, but only at a ZWSP.

And if implementations wanted to support two- and three-letter
regional codes, they might parse
ZWSPGBZWSPCYMZWSPENGZWSPNIRZWSPSCOZWSP as the codes for
United Kingdom, Wales, England, Northern Ireland, and Scotland, and
represent them visually with the appropriate flag icons.

Andrew

Re: Flag tags

2012-05-31 Thread Andrew West

On 31 May 2012 10:20, Michael Everson ever...@evertype.com wrote:

 No at least the black pirate flag, and the checkered flag (for car racing).

 U+2690 WHITE FLAG
 U+2691 BLACK FLAG
 U+26FF WHITE FLAG WITH HORIZONTAL MIDDLE BLACK STRIPE
 U+1F38C CROSSED FLAGS
 1F3C1 CHEQUERED FLAG

 We are missing the JOLLY ROGER.

I propose U+20F1 COMBINING ENCLOSING FLAG, and a named sequence
U+2620 U+20F1 = JOLLY ROGER.

Andrew

Re: Flag tags

2012/5/31, Michael Everson ever...@evertype.com wrote:
 U+26FF WHITE FLAG WITH HORIZONTAL MIDDLE BLACK STRIPE

What does this mean ? Is it really useful for something ?

Re: Flag tags

Philippe Verdy wrote:

 So to represent the flag of Japan, you could encode: 

 FLAG INITIAL SYMBOL J 
 FLAG FINAL SYMBOL P 
 [...]

For me, the existing Plane 14 mechanism would have worked just as well,
without requiring three more duplicate sets of printable Basic Latin.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell

[OT] Re: Flag tags

Philippe Verdy wrote:

 Also there should exist somewhere a registry of known flag codes.
 There are wellknown vexillologic sites that list large collections of
 flags, but for now they still did not develop a standard (ASCII-based)
 codification.

 [...]

 But this registry does not have to be defined and maintained by the
 Unicode Consortium or by ISO, unless they have the desire to develop
 it.

This doesn't seem at all within the scope of Unicode, though perhaps
CLDR would want it.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell

RE: Flag tags

 We are missing the JOLLY ROGER.

At least one, there're lots :)

http://en.wikipedia.org/wiki/Pirate_flag#Jolly_Roger_gallery

Re: Flag tags

On 31 May 2012, at 16:04, Shawn Steele wrote:

 We are missing the JOLLY ROGER.
 
 At least one, there're lots :)
 
 http://en.wikipedia.org/wiki/Pirate_flag#Jolly_Roger_gallery

A, glyph variants. 

Yo ho ho,
Michael Everson * http://www.evertype.com/

RE: Flag tags

 We are missing the JOLLY ROGER.
 
 At least one, there're lots :)
 
 http://en.wikipedia.org/wiki/Pirate_flag#Jolly_Roger_gallery

 A, glyph variants. 

Ar, you're right, missed that :)

-Shawn

Re: Flag tags


On 5/31/2012 2:06 AM, Philippe Verdy wrote:

2012/5/31 Asmus Freytagasm...@ix.netcom.com:

On 5/30/2012 7:19 PM, Philippe Verdy wrote:

2012/5/31 Michael Eversonever...@evertype.com:

On 31 May 2012, at 00:24, Mark Davis ☕ wrote:
Members of ISO National Bodies quite properly thought that it is
inapprioprate for an International Standard to encode the flags of some
countries and not the flags of others. You can stuff your condescension,
Mark.

I fully agree. Either all of them or none of them (or just a generic
white flag).

No at least the black pirate flag, and the checkered flag (for car racing).

There are two black pirate flags. One is all black (the most generic
one), another has bones and skullhead. OK these ones are generic
enough to not convey country/territory specific information.

There are also conventional sky blue flags used in Europe (may be
elsewhere) for the quality of waters. There may be others used for
signaling (including surveillance of beaches and dangers for swimming
: red, orange, green) : may be unified with the all-black flag (if
color is not really encoded but assignable by external styles).

If you add the flag cor car racing, then why wouldn't there flags used
in other transportation areas ?

You are right! I missed these:


Add also flags used as maritime alphabets (they are a true script by
themselves, whose mapping to actual letters depend on the locale's
script, so they are not really a visual variant of any script, just
like the Braille script is not tied to Latin), or othe ideographic
flags displayed much like the pirate flag (e.g. signaling deceases on
board)...

Re: Flag tags


On 5/31/2012 8:56 AM, Shawn Steele wrote:

We are missing the JOLLY ROGER.

At least one, there're lots :)
http://en.wikipedia.org/wiki/Pirate_flag#Jolly_Roger_gallery

A, glyph variants.

Ar, you're right, missed that :)




No, that's a misunderstanding of glyph variants.

Some of them can be substituted and will be recognized by all as jolly 
roger, others will not.


The former set may be glyph variants - that is, if there's no 
contrastive usage, the latter cannot be.


Why? Because for symbols, you don't have a word-context that gives you a 
definite, secondary clue to the identity of a shape, so the shape alone 
has to be recognized. Hence, designs that cannot be recognized for each 
other are not glyph variants.


In this case, on top of that, many represent symbols identifying 
particular bands, captains or ships (or nowadays, movie cycles). As such 
they resemble the distinguishing function of national flags.


A./

Re: Flag tags

On 31 May 2012, at 17:19, Asmus Freytag wrote:

 On 5/31/2012 8:56 AM, Shawn Steele wrote:
 We are missing the JOLLY ROGER.
 At least one, there're lots :)
 http://en.wikipedia.org/wiki/Pirate_flag#Jolly_Roger_gallery
 A, glyph variants.
 Ar, you're right, missed that :)
 
 No, that's a misunderstanding of glyph variants.

Lordy. It was FUNNY, Asmus. 

Michael Everson * http://www.evertype.com/

Re: Flag tags

On 31 May 2012, at 17:19, Asmus Freytag wrote:

 Some of them can be substituted and will be recognized by all as jolly 
 roger, others will not.
 
 The former set may be glyph variants - that is, if there's no contrastive 
 usage, the latter cannot be.

They are logos for the actual dead pirate captains. They are glyph variants of 
pirate flag otherwise. Some are just obscure glyph variants. 

 In this case, on top of that, many represent symbols identifying particular 
 bands, captains or ships (or nowadays, movie cycles). As such they resemble 
 the distinguishing function of national flags.

Then, yes, but now we do have a notion of pirate flag which is basically 
black with a skull and crossbones on it. 

Michael Everson * http://www.evertype.com/

Re: Flag tags

On 31 May 2012, at 17:26, Asmus Freytag wrote:

 you put your finger on it. Any form of combining scheme is doomed to fail. 

That's why http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3680.pdf was the right 
solution.

Michael Everson * http://www.evertype.com/

RE: Flag tags

Which ones are used in print?  Isn't that the criteria?  Personally, I'd like 
to see the maritime flags encoded, because I've always been interested in them, 
but I can see a case for them not being encoded.  (Though a couple weeks ago on 
a cruise ship I did see them used in several places in print as it were, 
though I'd have to concede that the reason they were in print was primarily 
decorative, though they were readable.  Eg: Signals bar spelled out in flags).

Seems like swimming flags or shark flags or dive flags wouldn't be used much in 
print?

-Shawn

-Original Message-
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Asmus Freytag
Sent: Poʻahā, Mei 31, 2012 9:00 AM
To: verd...@wanadoo.fr
Cc: Michael Everson; unicode Unicode Discussion
Subject: Re: Flag tags

On 5/31/2012 2:06 AM, Philippe Verdy wrote:
 2012/5/31 Asmus Freytagasm...@ix.netcom.com:
 On 5/30/2012 7:19 PM, Philippe Verdy wrote:
 2012/5/31 Michael Eversonever...@evertype.com:
 On 31 May 2012, at 00:24, Mark Davis ☕ wrote:
 Members of ISO National Bodies quite properly thought that it is 
 inapprioprate for an International Standard to encode the flags of 
 some countries and not the flags of others. You can stuff your 
 condescension, Mark.
 I fully agree. Either all of them or none of them (or just a generic 
 white flag).
 No at least the black pirate flag, and the checkered flag (for car racing).
 There are two black pirate flags. One is all black (the most generic 
 one), another has bones and skullhead. OK these ones are generic 
 enough to not convey country/territory specific information.

 There are also conventional sky blue flags used in Europe (may be
 elsewhere) for the quality of waters. There may be others used for 
 signaling (including surveillance of beaches and dangers for swimming
 : red, orange, green) : may be unified with the all-black flag (if 
 color is not really encoded but assignable by external styles).

 If you add the flag cor car racing, then why wouldn't there flags used 
 in other transportation areas ?
You are right! I missed these:

 Add also flags used as maritime alphabets (they are a true script by 
 themselves, whose mapping to actual letters depend on the locale's 
 script, so they are not really a visual variant of any script, just 
 like the Braille script is not tied to Latin), or othe ideographic
 flags displayed much like the pirate flag (e.g. signaling deceases on 
 board)...

Re: Flag tags


On 5/31/2012 9:40 AM, Shawn Steele wrote:

Which ones are used in print?  Isn't that the criteria?  Personally, I'd like to see the maritime flags 
encoded, because I've always been interested in them, but I can see a case for them not being encoded.  
(Though a couple weeks ago on a cruise ship I did see them used in several places in print as it 
were, though I'd have to concede that the reason they were in print was primarily decorative, 
though they were readable.  Eg: Signals bar spelled out in flags).


The decorative use of those is in fact not uncommon, and when they are 
used that way, in print, they do form strings.


They do, by definition, require colors for their representation, 
although, the design is such that colors and shapes work together in a 
redundant way, to improve their recognition under poor visibility.


They are also not glyph variants of ordinary letters and digits, even 
where there is a 1:1 correspondence.


First, reprinting Shakespeare's works using flags would make it 
immediately and utterly illegible to most speakers of English. So they 
would fail the test of being recognizably the same letter.


Second, one place where the flags are still used today is sailboat 
races. Replacing the flag by a placard showing the letter would also not 
be acceptable in that context.


So, seeing that Unicode nowadays has the support of SMS-specific symbols 
as part of its scope, who would like to be able to communicate with flags?


Another alphabet, even that with 1:1 correspondence to Latin, but, 
again, not recognizable as such are the dancing men. They at least can 
be demonstrated to have appeared in print.


A./


Seems like swimming flags or shark flags or dive flags wouldn't be used much in 
print?

-Shawn

-Original Message-
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Asmus Freytag
Sent: Poʻahā, Mei 31, 2012 9:00 AM
To: verd...@wanadoo.fr
Cc: Michael Everson; unicode Unicode Discussion
Subject: Re: Flag tags

On 5/31/2012 2:06 AM, Philippe Verdy wrote:

2012/5/31 Asmus Freytagasm...@ix.netcom.com:

On 5/30/2012 7:19 PM, Philippe Verdy wrote:

2012/5/31 Michael Eversonever...@evertype.com:

On 31 May 2012, at 00:24, Mark Davis ☕ wrote:
Members of ISO National Bodies quite properly thought that it is
inapprioprate for an International Standard to encode the flags of
some countries and not the flags of others. You can stuff your
condescension, Mark.

I fully agree. Either all of them or none of them (or just a generic
white flag).

No at least the black pirate flag, and the checkered flag (for car racing).

There are two black pirate flags. One is all black (the most generic
one), another has bones and skullhead. OK these ones are generic
enough to not convey country/territory specific information.

There are also conventional sky blue flags used in Europe (may be
elsewhere) for the quality of waters. There may be others used for
signaling (including surveillance of beaches and dangers for swimming
: red, orange, green) : may be unified with the all-black flag (if
color is not really encoded but assignable by external styles).

If you add the flag cor car racing, then why wouldn't there flags used
in other transportation areas ?

You are right! I missed these:

Add also flags used as maritime alphabets (they are a true script by
themselves, whose mapping to actual letters depend on the locale's
script, so they are not really a visual variant of any script, just
like the Braille script is not tied to Latin), or othe ideographic
flags displayed much like the pirate flag (e.g. signaling deceases on
board)...

Re: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)

2012-05-31 Thread William_J_G Overington

Doug Ewell d...@ewellic.org wrote:
 
 A seemingly straightforward solution to the “unambiguous mapping” problem 
 would be to use the existing Plane 14 tag letters along with a new FLAG TAG, 
 say at U+E0002. Then E0002, E0043, E0048 would unequivocally denote the 
 current Swiss flag. No need for separate lead and trail. Simple.
 
 ... What’s that? Oh, sorry, never mind. Deprecated.
 
On a point of order, is deprecation of a character or collection of characters 
carried out by only the Unicode Technical Committee or by both of the Unicode 
Technical Committee and the ISO/IEC 10646 Committee?
 
Further to that point of order, is there any rule that absolutely prevents the 
deprecated status of a character or collection of characters being removed?
 
I feel that by hybridizing the suggestions of Doug and Philippe that an elegant 
solution using tags and an advanced format font could be designed.
 
William Overington
 
31 May 2012

Re: Flag tags


On 5/31/2012 9:30 AM, Michael Everson wrote:

On 31 May 2012, at 17:19, Asmus Freytag wrote:


Some of them can be substituted and will be recognized by all as jolly roger, 
others will not.

The former set may be glyph variants - that is, if there's no contrastive 
usage, the latter cannot be.

They are logos for the actual dead pirate captains.


That's so. Do their heir's claim rights to them? That would exclude them 
from encoding forever.


But wait, aren't national flags logos for their respective countries?

A./

PS: This is the part I can't find funny:


They are glyph variants of pirate flag otherwise. Some are just obscure glyph 
variants.


In this case, on top of that, many represent symbols identifying particular 
bands, captains or ships (or nowadays, movie cycles). As such they resemble the 
distinguishing function of national flags.

Then, yes, but now we do have a notion of pirate flag which is basically 
black with a skull and crossbones on it.


Pirate flag is a generic concept. Encoding generic concept as such in 
Unicode is a problematic notion - especially if from that the mistaken 
conclusion is drawn that all concrete realizations of symbols that 
somehow pertain to the same general concept are mere glyph variants.


What you would encode is not the concept of pirate flag but the 
archetypical representation of a (generic) pirate flag. That means 
that minor variations in the skull and crossbones are indeed glyph 
variants (representing different artists' attempt to depict the same 
thing), but that other types of flags, used as pirate flags, do not 
constitute mere variants, but represent their own symbols (of related, 
but not identical semantics).


The distinction between these concepts has been sorely lacking in much 
of the recent and not so recent discussion of encoding symbols, and 
that's why I can't find it funny...

Re: Flag tags


On 5/31/2012 9:34 AM, Michael Everson wrote:

On 31 May 2012, at 17:26, Asmus Freytag wrote:


you put your finger on it. Any form of combining scheme is doomed to fail.

That's why http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3680.pdf was the right 
solution.




No Michael.

While I've come to the conclusion that encoding some form of 
combinatorial tags is indeed doomed, I don't believe that encoding 
images for codes (or if you will, ASCII strings) is the answer - that's 
meta encoding of a different sort.


The right answer would have been to encode the 10 flags and then agree 
to *study* the needs for and best solutions available to address a more 
comprehensive system at a future date. The main problem I see in that 
regard is impatience.


It's like with currency symbols - you code things when there's 
demonstrated demand, you don't put place holders in, and you don't give 
codes to all the three letter currency codes (like USD CND etc.).


A./

Re: Flag tags

On 31 May 2012, at 18:51, Asmus Freytag wrote:

 The right answer would have been to encode the 10 flags and then agree to 
 *study* the needs for and best solutions available to address a more 
 comprehensive system at a future date. The main problem I see in that 
 regard is impatience.

ISO NBs were, correctly, uncomfortable with the idea of encoding the flags of 
some countries and not of others. As representative of one of those NBs, I have 
no regrets about having made our proposal, which is still better than the 
current solution. 

 It's like with currency symbols - you code things when there's demonstrated 
 demand, you don't put place holders in, and you don't give codes to all the 
 three letter currency codes (like USD CND etc.).

When you encode a flag for Germany and the US, you automatically get a demand 
for the encoding of a flag for Ireland and Iceland. That's the way it is. And 
no, waiting for some vendor to put more flags in the phone is not going to 
solve it. If you don't understand the politics of this matter, well, I can't 
help you to do it. 

Michael Everson * http://www.evertype.com/

Re: Flag tags

2012-05-31 Thread John H. Jenkins


Michael Everson ever...@evertype.com 於 2012年5月31日 上午11:57 寫道：

 When you encode a flag for Germany and the US, you automatically get a demand 
 for the encoding of a flag for Ireland and Iceland. That's the way it is. 

tongue-in-cheek
Oh, c'mon, Michael, next you'll be saying that because some countries have 
currency symbols with decidated code points, other countries will make *new* 
currency symbols and demand that *they* get dedicated code points, too. We all 
know how unrealistic a scenario *that* is.
/tongue-in-cheek

=
John H. Jenkins
jenk...@apple.com

RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)

William_J_G Overington wjgo underscore 10009 at btinternet dot com
wrote:

 Further to that point of order, is there any rule that absolutely
 prevents the deprecated status of a character or collection of
 characters being removed?

UTC has not ever shown the slightest inclination to do so, if that
answers your question.

 I feel that by hybridizing the suggestions of Doug and Philippe that
 an elegant solution using tags and an advanced format font could be
 designed.

I had forgotten that the Regional Indicator Symbols from U+1F1E6 through
U+1F1FF had already been encoded. You can create such a font today if
you like, mapping pairs of these symbols to a flag representing the
country with that ISO 3166-1 code element. See TUS 6.1, Section 5.10,
next-to-last subsection (page 534) for details.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell

Re: Preliminary proposal to encode Unifon in the UCS.

2012-05-31 Thread Jean-François Colson

Hello

I wrote: “1st possibility: a separate script. There’ll be no problem.”
You wrote: “There would, because the bulk of the script would look just
like Latin, and the encoding committees consider this to be a security
issue for internet spoofing for instance.”

I don’t understand.
Internet spoofing would be possible for example by mixing Latin and
Cyrillic letters in internationalized domain names. For example, instead
of paypal.com, you could take advantage of the fact that the first five
letters all have looking alike Cyrillic letters and register one of the
31 (2⁵-1) DIFFERENT domain names paypаl.com, payрal.com, payраl.com,
paуpal.com, paуpаl.com, paурal.com, paураl.com, pаypal.com, pаypаl.com,
pаyрal.com, pаyраl.com, pауpal.com, pауpаl.com, pаурal.com, pаураl.com,
рaypal.com, рaypаl.com, рayрal.com, рayраl.com, рaуpal.com, рaуpаl.com,
рaурal.com, рaураl.com, раypal.com, раypаl.com, раyрal.com, раyраl.com,
рауpal.com, рауpаl.com, раурal.com or раураl.com to ask their paypal
e-mail and password to your “customers”. That could only work if the
said customer is very distracted or if he has previously typed
“about:config” in the address bar and set network.IDN_show_punycode to
false. (That works with Firefox. The way to do it could be different
with other browsers.)
But, as far as I know, the domain names are commonly written in
lowercase. When I type in capital a domain name which doesn’t exist,
such as CUYOPUIESVRDKRSIXTVESVRDSHKSE.com, it is automatically converted
in lowercase (http://www.cuyopuiesvrdkrsixtvesvrdshkse.com/) before the
“not found” message is displayed.
In Unifon, only the capital letters would look alike. The lowercase
letters would be different. There could be a problem with the letter o,
but that would be a drop in the ocean, not more problematic than the
letter ᴏ (small capital o), ο (Greek omicron), о (Cyrillic o), ⲟ (Coptic
o), Ь (Deseret o), ჿ (Georgian labial sign), ੦ (Gurmukhi zero), all the
zeros, most of which look like circles, etc.
What exactly is the real security issue with Unifon as a separate
script? Some one who wants to spoof will find a way to do it without that.

NOW, a few comments about the Unifon proposal.

You didn’t correct “for several the Hupa, Yurok, Tolowa, and Karok
languages”.

There’s also the word “Karok”. Below, you write “Karuk”.

In the Unifon letters unified with existing characters, you forgot the
letter I.

You propose a Latin capital letter small capital i to be paired with ɪ
(Latin letter small capital i). Would ɪ have wider serifs when displayed
in small caps?

For the Latin capital beta, you wrote: “The unique Latin capital form
meets one of the major criteria for disunification.”
Could I use the same formula for Unifon? The unique Unifon small forms
meet one of the major criteria for disunification…

In the previous proposal, you also included a letter which looked a
little like a ƆC ligature or a rounded X. You called it zhay in n4195.
Have you forgotten it deliberately? That’s the last letter in figure 1,
although you wrote X in the caption.

You also used an X in Figure 7’s caption: it would be strange to have an
X pronounced /ʒ/ (zh) in a phonemic alphabet for English.

In the first three columns of the table at page 12, the two parts of
Latin letter oy are detached. In all samples of Unifon I’ve seen which
use that letter, the vertical line of the turned Ⱶ is tangent to the
right of the O.

In the same table, the Latin letter dhe should have a round shape.
That’s one of the two features which permit to distinguish it from the
Latin letter the.
In all Unifon fonts I know except one, the left part of the letter dhe
is not really a T but something midway between a T and a Γ.

I think Latin letter the should have a small top bar.

In this table of the Tolowa Unifon alphabet,
http://unifon.org/images/TOLOWA.jpg , some letters have a different
value when followed by a small stroke which looks like an apostrophe.
Should it be an ASCII apostrophe, a ’ (U+2019), a ʼ (U+02BC), a Ꞌ
(saltillo) or something else?

On page 3, the capital ʃ looks like an enlarged form of the lowercase
letter, different from the Greek capital sigma-like Ʃ. Would the unique
Latin capital form meets one of the major criteria for disunification.
What about the capital U with a tail?

I wonder whether the 8th letter of the 42-letter “Indian Unifon
Single-Sound Alphabet” is a turned or a reversed C.

For the turned e-r, I think a new lower case is needed.

For the Latin letter reversed-e e, could the double ϵ, used for the same
sound in the Initial Teaching Alphabet, be used as a lower case letter?

Would a separate proposal be required for the Initial Teaching Alphabet
(http://en.wikipedia.org/wiki/Initial_Teaching_Alphabet)?

28 or 29 letters of this 44 letter alphabet are already supported:
b, c, d, f, ɡ, h, j, k, l, m, n are already supported.
ng ligature is different from ŋ.
p, r, s are already

RE: Flag tags

One possible problem with either (a) encoding flags or (b) encouraging
the display of Regional Indicator Symbols as flags is that some authors
would want to use them to indicate the language of the text that
follows. I'm not talking about inline, plain-text language tagging in
the sense that UTC frowns upon, but literally a visual display of a
flag. 

It's common, particularly in Europe, to see English-language text marked
with a Union Jack, French-language text marked with the flag of France,
and so forth. Of course, we all know the problems with using national
flags to indicate languages, but it's common practice nevertheless.
Having Unicode characters for flags, especially well-supported ones,
might encourage this practice.

Of course, the Japanese phone users might have been doing this all along
with the existing 10 emoji flags.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell

RE: Flag tags

Asmus Freytag asmusf at ix dot netcom dot com wrote:

 First, reprinting Shakespeare's works using flags would make it
 immediately and utterly illegible to most speakers of English. So they
 would fail the test of being recognizably the same letter.

 Second, one place where the flags are still used today is sailboat
 races. Replacing the flag by a placard showing the letter would also
 not be acceptable in that context.

 So, seeing that Unicode nowadays has the support of SMS-specific
 symbols as part of its scope, who would like to be able to communicate
 with flags?

 Another alphabet, even that with 1:1 correspondence to Latin, but,
 again, not recognizable as such are the dancing men. They at least
 can be demonstrated to have appeared in print.

Are substitution ciphers candidates for encoding?

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell

Re: Flag tags

2012-05-31 Thread Karl Pentzlin

Am Donnerstag, 31. Mai 2012 um 20:09 schrieb John H. Jenkins:

JHJ tongue-in-cheek
JHJ ... that because some
JHJ countries have currency symbols with decidated code points, other
JHJ countries will make *new* currency symbols and demand that *they*
JHJ get dedicated code points ...

Seriously speaking, flag symbols and currency signs are completely
different topics.

Every country has exactly one flag, right now. Thus, in fact an
encoding proposal proposing only a few of them based on an
arbitrary collection made by some telephone companies without proving
any scrutiny for its making never can be acceptable for most national
bodies represented in ISO.

On the other hand, currencies may exist without a currency symbol
(as in fact most currencies do). In fact, all currency symbols
assigned to currencies valid today are included in Unicode now, with
only two exceptions after acceptance for the new Turkish Lira sign:

AZN Azerbaijan Manat (waiting for confirmation of its actual use),

ANG Netherlands Antillean guilder (used formerly mostly for NLG Dutch
 guilder which was valid until 2002; problematically unified with
 U+0192 LATIN SMALL LETTER F WITH HOOK; see
 http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3588.pdf )

On this base, nobody will request the addition of other symbols
as precondition for acceptance for any new currency sign on ballot.

- Karl

Re: Flag tags

2012-05-31 Thread David Starner

On Thu, May 31, 2012 at 12:03 PM, Doug Ewell d...@ewellic.org wrote:
 Asmus Freytag asmusf at ix dot netcom dot com wrote:

 First, reprinting Shakespeare's works using flags would make it
 immediately and utterly illegible to most speakers of English. So they
 would fail the test of being recognizably the same letter.
[...]
 Another alphabet, even that with 1:1 correspondence to Latin, but,
 again, not recognizable as such are the dancing men. They at least
 can be demonstrated to have appeared in print.

 Are substitution ciphers candidates for encoding?

Exactly. I've always thought that Cyrillicized Latin fonts (Яussiaи
with all Latin backing) and flag letters and various other weird
symbolic conversions are perfectly legal if limited Unicode fonts. The
Dancing Men are really a special font for Latin.

-- 
Kie ekzistas vivo, ekzistas espero.

RE: Flag tags

 First, reprinting Shakespeare's works using flags would make it immediately

 and utterly illegible to most speakers of English. So they would fail the test

 of being recognizably the same letter.



FWIW: The Alpha flag doesn't mean A.  For example it also means Diver 
Down.  Most of the flags have other meanings beyond just a letter, like Quebec 
 Quarantine.  So it's not just a substitution cipher.  Combinations can also 
have special meanings.  Additionally, repeaters make it more complicated than a 
simple substitution cipher,  eg: November, Oscar, Repeat2, Repeat1 for noon == 
4 different flags for 2 letters.


[Description: ICS 
November.svg]http://en.wikipedia.org/wiki/File:ICS_November.svg

[Description: ICS Oscar.svg]http://en.wikipedia.org/wiki/File:ICS_Oscar.svg

[Description: ICS Repeat 
Two.svg]http://en.wikipedia.org/wiki/File:ICS_Repeat_Two.svg

[Description: ICS Repeat 
One.svg]http://en.wikipedia.org/wiki/File:ICS_Repeat_One.svg






-Shawn


inline: image001.pnginline: image002.pnginline: image003.pnginline: image004.png

Re: Flag tags


On 5/31/2012 12:07 PM, Karl Pentzlin wrote:

Am Donnerstag, 31. Mai 2012 um 20:09 schrieb John H. Jenkins:

JHJ  tongue-in-cheek
JHJ  ... that because some
JHJ  countries have currency symbols with decidated code points, other
JHJ  countries will make *new* currency symbols and demand that *they*
JHJ  get dedicated code points ...

Seriously speaking, flag symbols and currency signs are completely
different topics.

Every country has exactly one flag, right now.


But not all of these flags are used in writing - right now.

This is similar to not all currencies having a symbol.

There's nothing wrong with encoding a subset and leaving the door open 
for additions - there's no reason to jump to encoding hundreds of 
concrete cloth and thread symbols without any indication that they are 
used in text. Or is there?


Also, for those of you not residing in North America, a point of 
information: the state flags of the 50 states of the USA are flown 
widely - if not as widely as the federal flag, and the accompanying 
symbols and designs (including seals) are widely used in publications. 
So, there's not a simple 1 country : 1 flag principle here - if you look 
at actual usage, there's a wide variety of practices.


A./

  Thus, in fact an
encoding proposal proposing only a few of them based on an
arbitrary collection made by some telephone companies without proving
any scrutiny for its making never can be acceptable for most national
bodies represented in ISO.

On the other hand, currencies may exist without a currency symbol
(as in fact most currencies do). In fact, all currency symbols
assigned to currencies valid today are included in Unicode now, with
only two exceptions after acceptance for the new Turkish Lira sign:

AZN Azerbaijan Manat (waiting for confirmation of its actual use),

ANG Netherlands Antillean guilder (used formerly mostly for NLG Dutch
  guilder which was valid until 2002; problematically unified with
  U+0192 LATIN SMALL LETTER F WITH HOOK; see
  http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3588.pdf )

On this base, nobody will request the addition of other symbols
as precondition for acceptance for any new currency sign on ballot.

- Karl

Re: Flag tags


  
  
On 5/31/2012 1:56 PM, Shawn Steele wrote:

  
  
  
  
  
 First, reprinting Shakespeare's
  works using flags would make it immediately
  
 and utterly illegible to most
  speakers of English. So they would fail the test
  
 of being recognizably the same
  letter.
 
FWIW: The "Alpha" flag doesn't mean
  "A".  For example it also means "Diver Down".  Most of the
  flags have other meanings beyond just a letter, like Quebec
   Quarantine.  So it's not just a substitution cipher. 
  Combinations can also have special meanings.  Additionally,
  repeaters make it more complicated than a simple substitution
  cipher,  eg: November, Oscar, Repeat2, Repeat1 for noon == 4
  different flags for 2 letters.
 

  

  

  
  

  
  

  
  

  

  

 
 
  

See, there you go.

A./

Re: Flag tags


On 5/31/2012 12:03 PM, Doug Ewell wrote:



Another alphabet, even that with 1:1 correspondence to Latin, but,
again, not recognizable as such are the dancing men. They at least
can be demonstrated to have appeared in print.
Are substitution ciphers candidates for encoding?



To the degree that the use of the substitution is style, no. Fraktur 
and Insular forms have been unified for Latin. But these styles are also 
recognizable (if not to all users, then a significant number). And, 
there's a benefit in identifying them primarily with the Latin alphabet, 
and only secondarily with the precise style.


The dancing men are more like Braille. There's one source where they 
have been given a particular mapping to the Latin alphabet, but that 
mapping is not the only one possible. The whole point of them is that 
the actual mapping has to be known or discovered each time.


So, yes, these would have to be encoded by shape, not by target.
A./

Re: Flag tags

On 31 May 2012, at 22:57, Asmus Freytag wrote:

 See, there you go.

What do you mean by this?

Michael Everson * http://www.evertype.com/

Re: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)

Here he probably meant that if we need to encode many flags, each flag
code may be arbitrarily long. A solution based on combining characters
will not work correctly, and it will be better to use leading and
traling markers, or to use a codification that allows knowing where a
flag starts and where it finishes.

There are two solutions:

(1) use specific punctuation-like characters acting like brackets
(those brackets can be given also a visual glyph by themselves), and
encode the intermediate flag code using usual characters. This would
allow viable fallback representations of flags, even if they show the
codes (as letters will be encloded, for reasability, the set should be
restricted and probably only uppercase, so that letters can be reduced
easily within the enclosing sym

(2) restrict the subset of characters that are usable in flag
identification codes to a useful and productive subset of ASCII, then
reencode them as enclosed letters marking the start and end of the
code, as well as eventual medial codes. This eases the production of
fonts for a reasonnable representation of these codes within a visual
band looking like a flag, as well as allows those sequences to ve
easily converted into ligatures for showing the actual flags
(including with their colors if needed).

Your solution based on SWSP *separator* does not solve anything, it
does not clearly indicates that this is representing a flag, and will
not allow automated recognition and production of ligatures.

2012/5/31 Andrew West andrewcw...@gmail.com:
 On 31 May 2012 00:24, Mark Davis ☕ m...@macchiato.com wrote:

 There is definitely a problem.

 Is it really such a problem?  Why can't implementations simply use
 ZWSP to demarcate the 2-character units in a sequence of more than two
 regional indicator symbols (and maybe always emit 2-character codes
 wrapped between ZWSP on either side to be safe), so for example
 USZWSPESZWSPGE would be parsed as the regional indicator symbols
 for USA, SPAIN and Georgia, whereas UZWSPSEZWSPSGZWSPE would be
 parsed as the regional indicator symbols for U (invalid), Sweden,
 Singapore and E (invalid).  Algorithms such as line-breaking would not
 break between two regional indicator symbols, but only at a ZWSP.

 And if implementations wanted to support two- and three-letter
 regional codes, they might parse
 ZWSPGBZWSPCYMZWSPENGZWSPNIRZWSPSCOZWSP as the codes for
 United Kingdom, Wales, England, Northern Ireland, and Scotland, and
 represent them visually with the appropriate flag icons.

 Andrew

Re: Flag tags

2012/5/31 Asmus Freytag asm...@ix.netcom.com:
 On 5/30/2012 10:15 PM, Doug Ewell wrote:

 A seemingly straightforward solution to the “unambiguous mapping” problem
 would be to use the existing Plane 14 tag letters along with a new FLAG TAG,
 say at U+E0002. Then E0002, E0043, E0048 would unequivocally denote the
 current Swiss flag. No need for separate lead and trail. Simple.

 ... What’s that? Oh, sorry, never mind. Deprecated.


 Doug,

 you put your finger on it. Any form of combining scheme is doomed to fail.

 This includes the current approach of Regional indicators.

You're wrong. The Régional indicators failed because they were encoded
at the character level, so that their scope of effect was supposed to
extended to arbitrary lengths of texts.

Here it's just about how to represent a glyph (even if it's colored)
locally representing a flag. The scope of the encoded substring will
not go outside of this flag indicator, so it will work the same way as
if this were encoded as ligatures.

You can perfectly create a breaking rule that will aboid breaking the
sequence of encoded characters representing the flag with its code. It
can be handled perfectly as if it was an unbreakable word, surrounded
by two punctuation marks (which will still be a valid fallback display
method, in case of absence of the glyphs in fonts for this type of
string).

You can perfectly assign representative glyphs for the indididual
characters (these glyphs don't have to represent any complete flag,
just a part of a flag showing internally its code.

In fact, all characters used will be treted as separate symbols
(independantly of the fact that they *may* be ligatured to show the
actual flag design. The encoding will provide a clear indication that
substituting the list of default representative glyphs to an actual
flag will be valid (it won't break the character identities, as long
as there exists a registry describing the assigned flag codes,
reencoded with these symbols).

In other words, it avoids completely the need to encode directly any
flag of any political entity (or with a naming convention applied in
the vexillologist registry, for any other personal or organisational
flag). It avoids all copyright issues and the problem of legal
restriction of use of flags (including in some countries where some
flags are prohibited).

Re: Flag tags

2012/5/31 Asmus Freytag asm...@ix.netcom.com:
 On 5/31/2012 12:07 PM, Karl Pentzlin wrote:

 Am Donnerstag, 31. Mai 2012 um 20:09 schrieb John H. Jenkins:

 JHJ  tongue-in-cheek
 JHJ  ... that because some
 JHJ  countries have currency symbols with decidated code points, other
 JHJ  countries will make *new* currency symbols and demand that *they*
 JHJ  get dedicated code points ...

 Seriously speaking, flag symbols and currency signs are completely
 different topics.

 Every country has exactly one flag, right now.

This is wrong if you consider their dependencies. Some dependencies
legally have their own flag used *instead* of the flag for the
main/metropolitan part of the country. So countries can have several
flags.

Then consider that countries may also have several flags for different
usages (national flag, civil flag, naval flag...)

Also the same flag may be shared by different political entities (e.g.
The European Union reuses the flag of the Council of Europe, with
permission, and made it one of its official emblems). Some flags are
also shared without permission, because the original design was not
protected internaitonally or had fallen in public domain (including in
the country of origin).

Flags have strong political issues that are out of scope for encoding
directly in the UCS. They are not stable across history, so they
should be versioned, but most frequent uses will omit the precise
versioning, so that flags will be instantly replaced at any time (e.g.
if you encode a flag for US, how many stars will there be on it ?
Libya changed its flag recently, returning to an older flag ; in many
cases it will not really matter, but if you have to deal with encoded
texts that are also versioned themselves, it will not be acceptable to
have flag designs freely interchanged as it would cause confusion :
consider the case of countries that appeared in the history as part of
a split or merge, in an article speaking about their history, and
identifying the armies and generals with their respective flag...).

Re: Flag tags

2012/5/31 Doug Ewell d...@ewellic.org:
 Philippe Verdy wrote:

 So to represent the flag of Japan, you could encode:

 FLAG INITIAL SYMBOL J
 FLAG FINAL SYMBOL P
 [...]

 For me, the existing Plane 14 mechanism would have worked just as well,
 without requiring three more duplicate sets of printable Basic Latin.

You can perfectly map this small set of  symbols in Plane 14.

And no, they are NOT confusable and not a duplicate set of Basic Latin
: their representative glyphs will be clearly different. They will be
REAL symbols, even if they embed a letter in their default
representative glyph (this letter will disappear when the ligatures
will be generated by renderers supporting a mapping from flag codes to
actual glyphs, either with fonts build specifically for some
recognized ligatured, or with the help of an external protocol to get
a flag from an external flags registry (which we don't need to specify
in Unicode).

Unicode sessions at Localization World Paris

2012-05-31 Thread Lisa Moore

On Monday, 4 June, noted experts Richard Ishida (W3C) and Addison Phillips
(Lab126) will team up to present a full day of sessions on Unicode.

In the morning, Richard Ishida will present “An Introduction to Writing
Systems and Unicode”, a tutorial that will introduce the basic functioning
of Unicode in dealing with non-Latin writing systems. It is an excellent
orientation for people new to these concepts, but it also offers content
for people at intermediate and advanced levels due to the breadth of
scripts discussed.

In the afternoon, Addison will present Internationalization: An
Introduction, a two-part tutorial covering:

•What is internationalization?
•What is Unicode? Implementing and using the standard.
•How do you prepare software localization and translation?

Finally, Richard and Addison will present  Towards the Promised Land:
Globalization Developments in Web Standards, which surveys current
developments at the W3C.

You may register for any or all of these sessions via
http://localizationworld.com/lwparis2012/registration.php where you will
see the sessions in the preconference day.

This is an opportunity to get a taste of the Unicode conference to be held
in California on the following October 22-24, and see how the people on
your staff can benefit from a deeper knowledge of Unicode and
internationalization.

Lisa Moore
--

Flag emoji

2012-05-31 Thread Mark Davis ☕

The UTC considered as one of the possible approaches to the problem. While
easier in terms of line breaking, there'd still be a requirement to change
grapheme cluster boundaries and word boundaries to join sequences
like , and people felt the approach didn't work well with encoding
conversion. About conversion, I think the discussion was something like the
following:

It is relatively simple to have a mapping like:

sjis bytes   ↔   [joiner]

If we used ZWSP, then we'd have:

sjis bytes ←  // but the code wouldn't know when to also absorb
adjacent ZWSPs.

sjis bytes →  // but the code would need context to know when to add
adjacent ZWSPs.

Both of those would be complicated for encoding converters to handle.
People also felt that [joiner] would be more consistent with treating
the sequence as a unit, both conceptually and in fonts.

I personally favored the ZWSP, but was convinced during the discussion that
ZWJ was a better approach.

--
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**



On Thu, May 31, 2012 at 2:47 AM, Andrew West andrewcw...@gmail.com wrote:

 On 31 May 2012 00:24, Mark Davis ☕ m...@macchiato.com wrote:
 
  There is definitely a problem.

 Is it really such a problem?  Why can't implementations simply use
 ZWSP to demarcate the 2-character units in a sequence of more than two
 regional indicator symbols (and maybe always emit 2-character codes
 wrapped between ZWSP on either side to be safe), so for example
 USZWSPESZWSPGE would be parsed as the regional indicator symbols
 for USA, SPAIN and Georgia, whereas UZWSPSEZWSPSGZWSPE would be
 parsed as the regional indicator symbols for U (invalid), Sweden,
 Singapore and E (invalid).  Algorithms such as line-breaking would not
 break between two regional indicator symbols, but only at a ZWSP.

 And if implementations wanted to support two- and three-letter
 regional codes, they might parse
 ZWSPGBZWSPCYMZWSPENGZWSPNIRZWSPSCOZWSP as the codes for
 United Kingdom, Wales, England, Northern Ireland, and Scotland, and
 represent them visually with the appropriate flag icons.

 Andrew

Re: Flag tags


So I could propose, say, the Pigpen cipher?

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell 

-Original Message- 
From: Asmus Freytag

Sent: Thursday, May 31, 2012 16:03
To: Doug Ewell
Cc: Shawn Steele ; verd...@wanadoo.fr ; Michael Everson ; unicode 
Unicode Discussion

Subject: Re: Flag tags

On 5/31/2012 12:03 PM, Doug Ewell wrote:



Another alphabet, even that with 1:1 correspondence to Latin, but,
again, not recognizable as such are the dancing men. They at least
can be demonstrated to have appeared in print.
Are substitution ciphers candidates for encoding?



To the degree that the use of the substitution is style, no. Fraktur
and Insular forms have been unified for Latin. But these styles are also
recognizable (if not to all users, then a significant number). And,
there's a benefit in identifying them primarily with the Latin alphabet,
and only secondarily with the precise style.

The dancing men are more like Braille. There's one source where they
have been given a particular mapping to the Latin alphabet, but that
mapping is not the only one possible. The whole point of them is that
the actual mapping has to be known or discovered each time.

So, yes, these would have to be encoded by shape, not by target.
A./

Re: Flag emoji

2012-05-31 Thread Markus Scherer

On Thu, May 31, 2012 at 4:18 PM, Mark Davis ☕ m...@macchiato.com wrote:

 If we used ZWSP, then we'd have:

 sjis bytes ←  // but the code wouldn't know when to also absorb
 adjacent ZWSPs.

 sjis bytes →  // but the code would need context to know when to add
 adjacent ZWSPs.


I think we could do this reasonably well by providing two mappings for the
same sjis bytes:

sjis - A+A+ZWSP
sjis - A+A

A longest-match conversion would get the desired results.

I believe there were more objections to the ZWSP approach though. I think
one was about losing the ZWSP in editing and copy-paste. (I didn't write
down details.)

markus

Re: Flag tags

On 1 Jun 2012, at 00:59, Doug Ewell wrote:

 So I could propose, say, the Pigpen cipher?

I would rather you help convince people about the Unifon proposal.

Michael Everson * http://www.evertype.com/

Re: Flag tags

Let's not forget the largest collection of flags collected on the web
: Flags of the World, maintained since lots of years (initially via
Usenet before the Internet we know today). All other references are
found there, including the International Association of Vexillologal
Association (IAVA), that should be involved in the project of building
and maintaining a registry of flag codes.

The FOTW seb site has always had several domains, some disappearing,
but mirrored together. This one is the most stable :

http://www.crwflags.com/fotw/flags/index.html

Re: Flag tags


On 5/31/2012 3:29 PM, Philippe Verdy wrote:

2012/5/31 Asmus Freytagasm...@ix.netcom.com:

On 5/31/2012 12:07 PM, Karl Pentzlin wrote:

Am Donnerstag, 31. Mai 2012 um 20:09 schrieb John H. Jenkins:

JHJtongue-in-cheek
JHJ... that because some
JHJcountries have currency symbols with decidated code points, other
JHJcountries will make *new* currency symbols and demand that *they*
JHJget dedicated code points ...

Seriously speaking, flag symbols and currency signs are completely
different topics.

Every country has exactly one flag, right now.

This is wrong if you consider their dependencies. Some dependencies
legally have their own flag used *instead* of the flag for the
main/metropolitan part of the country. So countries can have several
flags.
And some have well established flags for their constituent parts - 
because they arose of a

federation of entities.


Then consider that countries may also have several flags for different
usages (national flag, civil flag, naval flag...)


Good point.


Also the same flag may be shared by different political entities (e.g.
The European Union reuses the flag of the Council of Europe, with
permission, and made it one of its official emblems). Some flags are
also shared without permission, because the original design was not
protected internaitonally or had fallen in public domain (including in
the country of origin).

Examples?



Flags have strong political issues that are out of scope for encoding
directly in the UCS. They are not stable across history, so they
should be versioned, but most frequent uses will omit the precise
versioning, so that flags will be instantly replaced at any time (e.g.
if you encode a flag for US, how many stars will there be on it ?

Obviously these are all glyph variants?

A./


Libya changed its flag recently, returning to an older flag ; in many
cases it will not really matter, but if you have to deal with encoded
texts that are also versioned themselves, it will not be acceptable to
have flag designs freely interchanged as it would cause confusion :
consider the case of countries that appeared in the history as part of
a split or merge, in an article speaking about their history, and
identifying the armies and generals with their respective flag...).

Re: Flag tags

2012/6/1 Asmus Freytag asm...@ix.netcom.com:
 They are not stable across history, so they
 should be versioned, but most frequent uses will omit the precise
 versioning, so that flags will be instantly replaced at any time (e.g.
 if you encode a flag for US, how many stars will there be on it ?

 Obviously these are all glyph variants?

If you speak about the flag of Lybia, differences are significant when
there are opposed parties. During the last Libyan revolution, those
flags were used very distinctly. They were not free variants of each
other.

Yes you may have a genic flag code that maps to the latest version of
the flag, but versioned flags should be encoded separately.

Similar to the encoding of languages : you may have en or en-US
vs. en-GB and several subtags for variants...

Re: Flag tags

This would be a great resource for developing a flags code, as Philippe 
suggested earlier, an idea I actually think has quite a bit of merit. 
However, I'm not sure it has much relevance to character encoding. It's 
not that hard to imagine encoding 220 or so current national flags or 
placeholders, but you wouldn't want to expand this to, say, tens of 
thousands.


--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell 


-Original Message- 
From: Philippe Verdy

Sent: Thursday, May 31, 2012 18:06
To: Doug Ewell
Cc: Asmus Freytag ; Shawn Steele ; Michael Everson ; unicode Unicode 
Discussion

Subject: Re: Flag tags

Let's not forget the largest collection of flags collected on the web
: Flags of the World, maintained since lots of years (initially via
Usenet before the Internet we know today). All other references are
found there, including the International Association of Vexillologal
Association (IAVA), that should be involved in the project of building
and maintaining a registry of flag codes.

The FOTW seb site has always had several domains, some disappearing,
but mirrored together. This one is the most stable :

http://www.crwflags.com/fotw/flags/index.html

Re: Flag tags


Michael Everson wrote:


So I could propose, say, the Pigpen cipher?


I would rather you help convince people about the Unifon proposal.


I actually wasn't planning to propose Pigpen. I was just surprised the 
idea would even be considered.


--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell

Re: Flag tags

That's why I just propose an external registry rather then a direct
encoding of individual flags.

A naming convention (using namespace prefixes) could be used to make
sure that the common codes from ISO 3166-1 will be usable.

I'm not sure that the CLDR TC is currently competent to develop such a
registry, but it may work along with the IAVA to develop the naming
convention for use in the registry (which could be hosted by IAVA or
by Unicode. To be decided later.

The CLDR TC would be involved in the development of the registry
rules, for its stability.

2012/6/1 Doug Ewell d...@ewellic.org:
 This would be a great resource for developing a flags code, as Philippe
 suggested earlier, an idea I actually think has quite a bit of merit.
 However, I'm not sure it has much relevance to character encoding. It's not
 that hard to imagine encoding 220 or so current national flags or
 placeholders, but you wouldn't want to expand this to, say, tens of
 thousands.

Re: Flag tags

e.g. the empty namespace could be reserved for country codes.
Namespace separation could use the hyphen (like in language codes).

So the generic US flag would be coded as simply as -US (with the leading hyphen)

If rendering the defautl glyphs, you'll see that hyphen. The
laternative being to use a space separator, so that the standard code
would just be rendered showing only the country code with the default
glyphs.

Other namespaces extensions will use a non empty prefix per category.

2012/6/1 Philippe Verdy verd...@wanadoo.fr:
 That's why I just propose an external registry rather then a direct
 encoding of individual flags.

 A naming convention (using namespace prefixes) could be used to make
 sure that the common codes from ISO 3166-1 will be usable.

 I'm not sure that the CLDR TC is currently competent to develop such a
 registry, but it may work along with the IAVA to develop the naming
 convention for use in the registry (which could be hosted by IAVA or
 by Unicode. To be decided later.

 The CLDR TC would be involved in the development of the registry
 rules, for its stability.

 2012/6/1 Doug Ewell d...@ewellic.org:
 This would be a great resource for developing a flags code, as Philippe
 suggested earlier, an idea I actually think has quite a bit of merit.
 However, I'm not sure it has much relevance to character encoding. It's not
 that hard to imagine encoding 220 or so current national flags or
 placeholders, but you wouldn't want to expand this to, say, tens of
 thousands.

Re: Flag tags


On 5/31/2012 5:06 PM, Michael Everson wrote:

On 1 Jun 2012, at 00:59, Doug Ewell wrote:


So I could propose, say, the Pigpen cipher?

I would rather you help convince people about the Unifon proposal.


hehe.

A./

PS:what's Unifon and what's it got to do with it?

Re: Flag tags