Re: Unicode 6.2 to Support the Turkish Lira Sign
2012/5/30 Martin J. Dürst due...@it.aoyama.ac.jp: On 2012/05/30 4:42, Roozbeh Pournader wrote: Just look what happened when the Japanese did their own font/character set hack. The backslash/yen problem is still with us, to this day... To be fair, the Japanese Yen at 0x5C was there long before Unicode, in the Japanese version of ISO 646. That it has remained as a font hack is very unfortunate, but for that, not only the Japanese, but also major international vendors are to blame. As long as it was part of the Japanese version of ISO 646 (which itself was only the first page of the SJIS encoding), there was absolutely NO problem at all. This was not different from the situation of all other national versions of ISO 646, which were all distinct encodings. The situation became a problem when the Japanese ISO 646 started to be mapped to Unicode/ISO/IEC 10646 within fonts using incorrect mappings. This occured in the early stages of ISO/IEC 10646 development. And unfortunately several OSes for Japan used those incorrect mappings, assuming that it was still safe to convert blindly texts containing backslashes by showing yen symbols instead, just like the same systems blindly converted US-ASCII (American version of ISO 646) into SJIS with broken algorithms, simply because those softwares could not really work with Unicode but still worked only with SJIS, and did not track correctly which source encoding was used. This would have probably not occured if Japan had defined and standardized an ISO 8859 version for mapping the Yen out of ASCII (along with basic Kana letters and Asian punctuations); but they prefered to develop only SJIS to support Kanjis (and later the emerging UCS remapped on it). And it would also have offered an easier migration. They were ambitious at the beginning, but the ambition was premature when the surrounding technologies to support a large character set was still very incomplete (forcing a lot of software to use unsafe/lossy remappings to a smaller character sets). So for several decennials, there has been a lot of interoperability problems caused by the various implementations of SJIS, many of them not compatible with each other in their limitations or in the way the simplifications were applied to support different parts of it. The backslash character, though it was common in many programming languages and OSes, then appeared to be replaced there by the yen symbol, and people were trained with it (for example when using pathnames in DOS/Windows filesystems, or when using the yen symbol as the escaping prefix when programming in C/C++); and it was then perceived that the backslash was for them a variant form (of their yen symbol) that they did not need (SJIS was later adapted to map the backslash somewhere else, but the SJIS users did not immediately fix it). As a result, the mapping of 0x5C in SJIS has always been ambiguous, depending on the implementations, but it has never been ambiguous in the Japanese version of ISO 646, that did not include the backslash. So don't criticize ISO 646, there was no problem there. The problem is fully within the early versions of SJIS which allowed such variation of glyphs, when it should have considered the yen symbol and the backslash as distinct abstract characters requiring separate mappings. But who uses the Japanese version of ISO 646 now in Japan ? Only SJIS seems to survive now, with all its intrinsic ambiguities and its many incompatible implementations (whose exact versions are most often not identified correctly in most softwares). The Japanese NB should have stopped this nightmare by fixing a rule to strongly deprecate (and remove all past recommandations), so that only one version of SJIS should survive, and that old data encoded with ambiguous SJIS version being left in their blackbox : It would have been simpler and more effective for the Japanease NB to rename the SJIS standard for the only remaining version, such as UJIS (U for Universal, meaning that it has a full roundtrip compatibility with the UCS and no longer any ambiguity allowed) and then freeze it completely at this state (all other developments being made in the UCS), with a strong recommandation to NOT perform any blind conversion to UJIS or interpretation as UJIS of any past data encoded for an unversioned SJIS : all ambiguous characters in these old data should be detected as ambiguous, meaning that the document/data was not convertible without proper versioning. This would have forced also the various private software makers and manufacturers that had used their own version of SJIS to register again to the Japanese NB a SINGLE (and unique) string recommanded to identify their implementation of SJIS, removing all past known aliases that were also ambiguous between each other, so that the effective encofing old data could be uniquely identified and would then become uniquely convertible first to the national standard UJIS, then to the UCS by its
Re: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)
2012/5/31 Doug Ewell d...@ewellic.org: A seemingly straightforward solution to the “unambiguous mapping” problem would be to use the existing Plane 14 tag letters along with a new FLAG TAG, say at U+E0002. Then E0002, E0043, E0048 would unequivocally denote the current Swiss flag. No need for separate lead and trail. Simple. ... What’s that? Oh, sorry, never mind. Deprecated. Not necessaryly: you could very well have sets of characters with unambigous glyphs showing the ASCII capital letter martly enclosed: - in a first set, it encloses the letter on the left/top/bottom sides with the strokes that start displaying the flag (this glyph could also include the pole) - in a second set, it encloses the letter only on the top/bottom sides - in the third set, it encloses the letter only on the top/bottom/right sides. Let's not forget that even if countries do not change, and keep their ISO 3166-1 code, their flag may change over time. So a flag encoded with such characters should contain a year of their first official use : this would require mapping in the second set the colon : and digits for specifying the year, and mapping in the last set the digits as well. The colon and digits are a priori not needed in the first set. So to represent the flag of Japan, you could encode: FLAG INITIAL SYMBOL J FLAG FINAL SYMBOL P But if you want to use explicitly the post-1945 flag (and not the imperial flag with sun rays), you would encode: FLAG INITIAL SYMBOL J FLAG MEDIAL SYMBOL P FLAG MEDIAL SYMBOL COLON FLAG MEDIAL SYMBOL ONE FLAG MEDIAL SYMBOL NINE FLAG MEDIAL SYMBOL FOUR FLAG MEDIAL SYMBOL SIX Which would render mostly like this, if there's no ligature defined (several lines used here to approximate the glyphs) : +–––\ | J P : 1 9 4 6 +–––/ | Here again, a font-defined ligature (if available) could remap it to the actual flag. A font can then eaqily be made, with the only constraint that the glyphs in them should join theses enclosures. If needed, those fonts can then create ligatures for wellknown flags, showing their apparent goemetry. The pole could be also removed, and colors added if supported by the font technology, or replaced by hatches in a basic monochromatic font technology. All these would remain standard symbols (they are superficially letter-like except that the standard can say that the letters shown in the enclosing glyphs are only used as a default fallback, but ligatures can SAFELY replace them by the actual flag, including with its true colors. The renderer can use the color capavilities of fonts, if the font format supports it, or a set of icons (e.g. encoded in a zipped archive containing SVG files an a small maping files identifying the flag codes with the name of a SVG file, or within a single SVG file, containing this mapping internally and mapping this code to an internal XML anchor ID's using standard XML href's) Note that OpenType currently does not contain any standard allowing to map true colors used in glyphs, but there's nothing in OpenType that prevents a font to expose several glyph variants for mapping the same characters (or their defined ligatures) : a monochromatic version like today, and with a new OpenType feature, a colorful version, with an extra table found in the font that exposes the color mapping either into an sRGBA color, or to a hatched filling pattern exposed as well by the font as a rectangle glyph with metrics (and possibly an angle relative to the baseline). I am still surprised to see that OpenType still does not include such standard. Note that hatching patterns will be defined using the em-square of glyphs assigned to characters and ligatures, so they will scale the same way, and would be frid-fitted and hinted the same way. A separate definition of patterns would simplify the design of colored fonts, as the same glyph geometries would be used. But there could also be a separate monochromatic glyph to be defined as well in the same font, in such a way that the glyph is defined with the pattern integrated to its geometry. And that CSS for example could specify a way to indicate that the rendered characters should not use an sRGBA color (with the hatching pattern defined in the font) but the natural colors defined by the glyphs themselves: this would require only a new value for color: natural. If the font does not define any natural color for its mapped glyphs, or the glyphs do not map any hatching patterns, then this CSS value would be interpreted as if it was color:inherit. An extended version could be also color: natural #rrggbb : the #rrggbb would still allow to specify the color to use if there's no natural color in the font, or if the colors defined in the font (those that are marked as being important) are incompatible or not easily distinguished with the current background (according to user's preferences), or not accessible to the user (also according to his preferences) : in which case the renderer would use
Re: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)
Also there should exist somewhere a registry of known flag codes. There are wellknown vexillologic sites that list large collections of flags, but for now they still did not develop a standard (ASCII-based) codification. In my opinion, this codication should just need BASIC LATIN CAPITAL LETTERs, Arabo-European digits, the ASCII HYPHEN as a separator for country/region subcodes, and the colon and dot for versioning/dating, and it should be based on ISO 3166-1 (using extension/private codes for historic countries or regions that are not encoded in ISO 3166) Such registry should contain a search form for codes, showing the designs, the preferred aspect ratio metric, the color mappings, and if the flag itself is protected by some copyright restrictions (this won't limit the usage of fallback glyphs (showing letters in an enclosing blank flag) showing just the code in free fonts that do not want to violate these copyright restrictions, when they will still define some ligatures for flag designs that are free from those restrictions. But this registry does not have to be defined and maintained by the Unicode Consortium or by ISO, unless they have the desire to develop it. In any case, it is not necessary to make it part of the Unicode and ISO/IEC 10646 standards themselves (but there could be an informative reference to the registry, to help font developers.
Re: Flag tags
2012/5/31 Asmus Freytag asm...@ix.netcom.com: On 5/30/2012 7:19 PM, Philippe Verdy wrote: 2012/5/31 Michael Eversonever...@evertype.com: On 31 May 2012, at 00:24, Mark Davis ☕ wrote: Members of ISO National Bodies quite properly thought that it is inapprioprate for an International Standard to encode the flags of some countries and not the flags of others. You can stuff your condescension, Mark. I fully agree. Either all of them or none of them (or just a generic white flag). No at least the black pirate flag, and the checkered flag (for car racing). There are two black pirate flags. One is all black (the most generic one), another has bones and skullhead. OK these ones are generic enough to not convey country/territory specific information. There are also conventional sky blue flags used in Europe (may be elsewhere) for the quality of waters. There may be others used for signaling (including surveillance of beaches and dangers for swimming : red, orange, green) : may be unified with the all-black flag (if color is not really encoded but assignable by external styles). If you add the flag cor car racing, then why wouldn't there flags used in other transportation areas ? Add also flags used as maritime alphabets (they are a true script by themselves, whose mapping to actual letters depend on the locale's script, so they are not really a visual variant of any script, just like the Braille script is not tied to Latin), or othe ideographic flags displayed much like the pirate flag (e.g. signaling deceases on board)...
Re: Flag tags
On 31 May 2012, at 04:49, Asmus Freytag wrote: On 5/30/2012 7:19 PM, Philippe Verdy wrote: 2012/5/31 Michael Eversonever...@evertype.com: Members of ISO National Bodies quite properly thought that it is inapprioprate for an International Standard to encode the flags of some countries and not the flags of others. [...] I fully agree. Either all of them or none of them (or just a generic white flag). No at least the black pirate flag, and the checkered flag (for car racing). Those would constitute the minimum useful set. U+2690 WHITE FLAG U+2691 BLACK FLAG U+26FF WHITE FLAG WITH HORIZONTAL MIDDLE BLACK STRIPE U+1F38C CROSSED FLAGS 1F3C1 CHEQUERED FLAG We are missing the JOLLY ROGER. Michael Everson * http://www.evertype.com/
Re: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)
On 31 May 2012 00:24, Mark Davis ☕ m...@macchiato.com wrote: There is definitely a problem. Is it really such a problem? Why can't implementations simply use ZWSP to demarcate the 2-character units in a sequence of more than two regional indicator symbols (and maybe always emit 2-character codes wrapped between ZWSP on either side to be safe), so for example USZWSPESZWSPGE would be parsed as the regional indicator symbols for USA, SPAIN and Georgia, whereas UZWSPSEZWSPSGZWSPE would be parsed as the regional indicator symbols for U (invalid), Sweden, Singapore and E (invalid). Algorithms such as line-breaking would not break between two regional indicator symbols, but only at a ZWSP. And if implementations wanted to support two- and three-letter regional codes, they might parse ZWSPGBZWSPCYMZWSPENGZWSPNIRZWSPSCOZWSP as the codes for United Kingdom, Wales, England, Northern Ireland, and Scotland, and represent them visually with the appropriate flag icons. Andrew
Re: Flag tags
On 31 May 2012 10:20, Michael Everson ever...@evertype.com wrote: No at least the black pirate flag, and the checkered flag (for car racing). U+2690 WHITE FLAG U+2691 BLACK FLAG U+26FF WHITE FLAG WITH HORIZONTAL MIDDLE BLACK STRIPE U+1F38C CROSSED FLAGS 1F3C1 CHEQUERED FLAG We are missing the JOLLY ROGER. I propose U+20F1 COMBINING ENCLOSING FLAG, and a named sequence U+2620 U+20F1 = JOLLY ROGER. Andrew
Re: Flag tags
2012/5/31, Michael Everson ever...@evertype.com wrote: U+26FF WHITE FLAG WITH HORIZONTAL MIDDLE BLACK STRIPE What does this mean ? Is it really useful for something ?
Re: Flag tags
Philippe Verdy wrote: So to represent the flag of Japan, you could encode: FLAG INITIAL SYMBOL J FLAG FINAL SYMBOL P [...] For me, the existing Plane 14 mechanism would have worked just as well, without requiring three more duplicate sets of printable Basic Latin. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
[OT] Re: Flag tags
Philippe Verdy wrote: Also there should exist somewhere a registry of known flag codes. There are wellknown vexillologic sites that list large collections of flags, but for now they still did not develop a standard (ASCII-based) codification. [...] But this registry does not have to be defined and maintained by the Unicode Consortium or by ISO, unless they have the desire to develop it. This doesn't seem at all within the scope of Unicode, though perhaps CLDR would want it. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
RE: Flag tags
We are missing the JOLLY ROGER. At least one, there're lots :) http://en.wikipedia.org/wiki/Pirate_flag#Jolly_Roger_gallery
Re: Flag tags
On 31 May 2012, at 16:04, Shawn Steele wrote: We are missing the JOLLY ROGER. At least one, there're lots :) http://en.wikipedia.org/wiki/Pirate_flag#Jolly_Roger_gallery A, glyph variants. Yo ho ho, Michael Everson * http://www.evertype.com/
RE: Flag tags
We are missing the JOLLY ROGER. At least one, there're lots :) http://en.wikipedia.org/wiki/Pirate_flag#Jolly_Roger_gallery A, glyph variants. Ar, you're right, missed that :) -Shawn
Re: Flag tags
On 5/31/2012 2:06 AM, Philippe Verdy wrote: 2012/5/31 Asmus Freytagasm...@ix.netcom.com: On 5/30/2012 7:19 PM, Philippe Verdy wrote: 2012/5/31 Michael Eversonever...@evertype.com: On 31 May 2012, at 00:24, Mark Davis ☕ wrote: Members of ISO National Bodies quite properly thought that it is inapprioprate for an International Standard to encode the flags of some countries and not the flags of others. You can stuff your condescension, Mark. I fully agree. Either all of them or none of them (or just a generic white flag). No at least the black pirate flag, and the checkered flag (for car racing). There are two black pirate flags. One is all black (the most generic one), another has bones and skullhead. OK these ones are generic enough to not convey country/territory specific information. There are also conventional sky blue flags used in Europe (may be elsewhere) for the quality of waters. There may be others used for signaling (including surveillance of beaches and dangers for swimming : red, orange, green) : may be unified with the all-black flag (if color is not really encoded but assignable by external styles). If you add the flag cor car racing, then why wouldn't there flags used in other transportation areas ? You are right! I missed these: Add also flags used as maritime alphabets (they are a true script by themselves, whose mapping to actual letters depend on the locale's script, so they are not really a visual variant of any script, just like the Braille script is not tied to Latin), or othe ideographic flags displayed much like the pirate flag (e.g. signaling deceases on board)...
Re: Flag tags
On 5/31/2012 8:56 AM, Shawn Steele wrote: We are missing the JOLLY ROGER. At least one, there're lots :) http://en.wikipedia.org/wiki/Pirate_flag#Jolly_Roger_gallery A, glyph variants. Ar, you're right, missed that :) No, that's a misunderstanding of glyph variants. Some of them can be substituted and will be recognized by all as jolly roger, others will not. The former set may be glyph variants - that is, if there's no contrastive usage, the latter cannot be. Why? Because for symbols, you don't have a word-context that gives you a definite, secondary clue to the identity of a shape, so the shape alone has to be recognized. Hence, designs that cannot be recognized for each other are not glyph variants. In this case, on top of that, many represent symbols identifying particular bands, captains or ships (or nowadays, movie cycles). As such they resemble the distinguishing function of national flags. A./
Re: Flag tags
On 31 May 2012, at 17:19, Asmus Freytag wrote: On 5/31/2012 8:56 AM, Shawn Steele wrote: We are missing the JOLLY ROGER. At least one, there're lots :) http://en.wikipedia.org/wiki/Pirate_flag#Jolly_Roger_gallery A, glyph variants. Ar, you're right, missed that :) No, that's a misunderstanding of glyph variants. Lordy. It was FUNNY, Asmus. Michael Everson * http://www.evertype.com/
Re: Flag tags
On 31 May 2012, at 17:19, Asmus Freytag wrote: Some of them can be substituted and will be recognized by all as jolly roger, others will not. The former set may be glyph variants - that is, if there's no contrastive usage, the latter cannot be. They are logos for the actual dead pirate captains. They are glyph variants of pirate flag otherwise. Some are just obscure glyph variants. In this case, on top of that, many represent symbols identifying particular bands, captains or ships (or nowadays, movie cycles). As such they resemble the distinguishing function of national flags. Then, yes, but now we do have a notion of pirate flag which is basically black with a skull and crossbones on it. Michael Everson * http://www.evertype.com/
Re: Flag tags
On 31 May 2012, at 17:26, Asmus Freytag wrote: you put your finger on it. Any form of combining scheme is doomed to fail. That's why http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3680.pdf was the right solution. Michael Everson * http://www.evertype.com/
RE: Flag tags
Which ones are used in print? Isn't that the criteria? Personally, I'd like to see the maritime flags encoded, because I've always been interested in them, but I can see a case for them not being encoded. (Though a couple weeks ago on a cruise ship I did see them used in several places in print as it were, though I'd have to concede that the reason they were in print was primarily decorative, though they were readable. Eg: Signals bar spelled out in flags). Seems like swimming flags or shark flags or dive flags wouldn't be used much in print? -Shawn -Original Message- From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of Asmus Freytag Sent: Poʻahā, Mei 31, 2012 9:00 AM To: verd...@wanadoo.fr Cc: Michael Everson; unicode Unicode Discussion Subject: Re: Flag tags On 5/31/2012 2:06 AM, Philippe Verdy wrote: 2012/5/31 Asmus Freytagasm...@ix.netcom.com: On 5/30/2012 7:19 PM, Philippe Verdy wrote: 2012/5/31 Michael Eversonever...@evertype.com: On 31 May 2012, at 00:24, Mark Davis ☕ wrote: Members of ISO National Bodies quite properly thought that it is inapprioprate for an International Standard to encode the flags of some countries and not the flags of others. You can stuff your condescension, Mark. I fully agree. Either all of them or none of them (or just a generic white flag). No at least the black pirate flag, and the checkered flag (for car racing). There are two black pirate flags. One is all black (the most generic one), another has bones and skullhead. OK these ones are generic enough to not convey country/territory specific information. There are also conventional sky blue flags used in Europe (may be elsewhere) for the quality of waters. There may be others used for signaling (including surveillance of beaches and dangers for swimming : red, orange, green) : may be unified with the all-black flag (if color is not really encoded but assignable by external styles). If you add the flag cor car racing, then why wouldn't there flags used in other transportation areas ? You are right! I missed these: Add also flags used as maritime alphabets (they are a true script by themselves, whose mapping to actual letters depend on the locale's script, so they are not really a visual variant of any script, just like the Braille script is not tied to Latin), or othe ideographic flags displayed much like the pirate flag (e.g. signaling deceases on board)...
Re: Flag tags
On 5/31/2012 9:40 AM, Shawn Steele wrote: Which ones are used in print? Isn't that the criteria? Personally, I'd like to see the maritime flags encoded, because I've always been interested in them, but I can see a case for them not being encoded. (Though a couple weeks ago on a cruise ship I did see them used in several places in print as it were, though I'd have to concede that the reason they were in print was primarily decorative, though they were readable. Eg: Signals bar spelled out in flags). The decorative use of those is in fact not uncommon, and when they are used that way, in print, they do form strings. They do, by definition, require colors for their representation, although, the design is such that colors and shapes work together in a redundant way, to improve their recognition under poor visibility. They are also not glyph variants of ordinary letters and digits, even where there is a 1:1 correspondence. First, reprinting Shakespeare's works using flags would make it immediately and utterly illegible to most speakers of English. So they would fail the test of being recognizably the same letter. Second, one place where the flags are still used today is sailboat races. Replacing the flag by a placard showing the letter would also not be acceptable in that context. So, seeing that Unicode nowadays has the support of SMS-specific symbols as part of its scope, who would like to be able to communicate with flags? Another alphabet, even that with 1:1 correspondence to Latin, but, again, not recognizable as such are the dancing men. They at least can be demonstrated to have appeared in print. A./ Seems like swimming flags or shark flags or dive flags wouldn't be used much in print? -Shawn -Original Message- From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of Asmus Freytag Sent: Poʻahā, Mei 31, 2012 9:00 AM To: verd...@wanadoo.fr Cc: Michael Everson; unicode Unicode Discussion Subject: Re: Flag tags On 5/31/2012 2:06 AM, Philippe Verdy wrote: 2012/5/31 Asmus Freytagasm...@ix.netcom.com: On 5/30/2012 7:19 PM, Philippe Verdy wrote: 2012/5/31 Michael Eversonever...@evertype.com: On 31 May 2012, at 00:24, Mark Davis ☕ wrote: Members of ISO National Bodies quite properly thought that it is inapprioprate for an International Standard to encode the flags of some countries and not the flags of others. You can stuff your condescension, Mark. I fully agree. Either all of them or none of them (or just a generic white flag). No at least the black pirate flag, and the checkered flag (for car racing). There are two black pirate flags. One is all black (the most generic one), another has bones and skullhead. OK these ones are generic enough to not convey country/territory specific information. There are also conventional sky blue flags used in Europe (may be elsewhere) for the quality of waters. There may be others used for signaling (including surveillance of beaches and dangers for swimming : red, orange, green) : may be unified with the all-black flag (if color is not really encoded but assignable by external styles). If you add the flag cor car racing, then why wouldn't there flags used in other transportation areas ? You are right! I missed these: Add also flags used as maritime alphabets (they are a true script by themselves, whose mapping to actual letters depend on the locale's script, so they are not really a visual variant of any script, just like the Braille script is not tied to Latin), or othe ideographic flags displayed much like the pirate flag (e.g. signaling deceases on board)...
Re: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)
Doug Ewell d...@ewellic.org wrote: A seemingly straightforward solution to the “unambiguous mapping” problem would be to use the existing Plane 14 tag letters along with a new FLAG TAG, say at U+E0002. Then E0002, E0043, E0048 would unequivocally denote the current Swiss flag. No need for separate lead and trail. Simple. ... What’s that? Oh, sorry, never mind. Deprecated. On a point of order, is deprecation of a character or collection of characters carried out by only the Unicode Technical Committee or by both of the Unicode Technical Committee and the ISO/IEC 10646 Committee? Further to that point of order, is there any rule that absolutely prevents the deprecated status of a character or collection of characters being removed? I feel that by hybridizing the suggestions of Doug and Philippe that an elegant solution using tags and an advanced format font could be designed. William Overington 31 May 2012
Re: Flag tags
On 5/31/2012 9:30 AM, Michael Everson wrote: On 31 May 2012, at 17:19, Asmus Freytag wrote: Some of them can be substituted and will be recognized by all as jolly roger, others will not. The former set may be glyph variants - that is, if there's no contrastive usage, the latter cannot be. They are logos for the actual dead pirate captains. That's so. Do their heir's claim rights to them? That would exclude them from encoding forever. But wait, aren't national flags logos for their respective countries? A./ PS: This is the part I can't find funny: They are glyph variants of pirate flag otherwise. Some are just obscure glyph variants. In this case, on top of that, many represent symbols identifying particular bands, captains or ships (or nowadays, movie cycles). As such they resemble the distinguishing function of national flags. Then, yes, but now we do have a notion of pirate flag which is basically black with a skull and crossbones on it. Pirate flag is a generic concept. Encoding generic concept as such in Unicode is a problematic notion - especially if from that the mistaken conclusion is drawn that all concrete realizations of symbols that somehow pertain to the same general concept are mere glyph variants. What you would encode is not the concept of pirate flag but the archetypical representation of a (generic) pirate flag. That means that minor variations in the skull and crossbones are indeed glyph variants (representing different artists' attempt to depict the same thing), but that other types of flags, used as pirate flags, do not constitute mere variants, but represent their own symbols (of related, but not identical semantics). The distinction between these concepts has been sorely lacking in much of the recent and not so recent discussion of encoding symbols, and that's why I can't find it funny...
Re: Flag tags
On 5/31/2012 9:34 AM, Michael Everson wrote: On 31 May 2012, at 17:26, Asmus Freytag wrote: you put your finger on it. Any form of combining scheme is doomed to fail. That's why http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3680.pdf was the right solution. No Michael. While I've come to the conclusion that encoding some form of combinatorial tags is indeed doomed, I don't believe that encoding images for codes (or if you will, ASCII strings) is the answer - that's meta encoding of a different sort. The right answer would have been to encode the 10 flags and then agree to *study* the needs for and best solutions available to address a more comprehensive system at a future date. The main problem I see in that regard is impatience. It's like with currency symbols - you code things when there's demonstrated demand, you don't put place holders in, and you don't give codes to all the three letter currency codes (like USD CND etc.). A./
Re: Flag tags
On 31 May 2012, at 18:51, Asmus Freytag wrote: The right answer would have been to encode the 10 flags and then agree to *study* the needs for and best solutions available to address a more comprehensive system at a future date. The main problem I see in that regard is impatience. ISO NBs were, correctly, uncomfortable with the idea of encoding the flags of some countries and not of others. As representative of one of those NBs, I have no regrets about having made our proposal, which is still better than the current solution. It's like with currency symbols - you code things when there's demonstrated demand, you don't put place holders in, and you don't give codes to all the three letter currency codes (like USD CND etc.). When you encode a flag for Germany and the US, you automatically get a demand for the encoding of a flag for Ireland and Iceland. That's the way it is. And no, waiting for some vendor to put more flags in the phone is not going to solve it. If you don't understand the politics of this matter, well, I can't help you to do it. Michael Everson * http://www.evertype.com/
Re: Flag tags
Michael Everson ever...@evertype.com 於 2012年5月31日 上午11:57 寫道: When you encode a flag for Germany and the US, you automatically get a demand for the encoding of a flag for Ireland and Iceland. That's the way it is. tongue-in-cheek Oh, c'mon, Michael, next you'll be saying that because some countries have currency symbols with decidated code points, other countries will make *new* currency symbols and demand that *they* get dedicated code points, too. We all know how unrealistic a scenario *that* is. /tongue-in-cheek = John H. Jenkins jenk...@apple.com
RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)
William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote: Further to that point of order, is there any rule that absolutely prevents the deprecated status of a character or collection of characters being removed? UTC has not ever shown the slightest inclination to do so, if that answers your question. I feel that by hybridizing the suggestions of Doug and Philippe that an elegant solution using tags and an advanced format font could be designed. I had forgotten that the Regional Indicator Symbols from U+1F1E6 through U+1F1FF had already been encoded. You can create such a font today if you like, mapping pairs of these symbols to a flag representing the country with that ISO 3166-1 code element. See TUS 6.1, Section 5.10, next-to-last subsection (page 534) for details. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
Re: Preliminary proposal to encode Unifon in the UCS.
Hello I wrote: “1st possibility: a separate script. There’ll be no problem.” You wrote: “There would, because the bulk of the script would look just like Latin, and the encoding committees consider this to be a security issue for internet spoofing for instance.” I don’t understand. Internet spoofing would be possible for example by mixing Latin and Cyrillic letters in internationalized domain names. For example, instead of paypal.com, you could take advantage of the fact that the first five letters all have looking alike Cyrillic letters and register one of the 31 (2⁵-1) DIFFERENT domain names paypаl.com, payрal.com, payраl.com, paуpal.com, paуpаl.com, paурal.com, paураl.com, pаypal.com, pаypаl.com, pаyрal.com, pаyраl.com, pауpal.com, pауpаl.com, pаурal.com, pаураl.com, рaypal.com, рaypаl.com, рayрal.com, рayраl.com, рaуpal.com, рaуpаl.com, рaурal.com, рaураl.com, раypal.com, раypаl.com, раyрal.com, раyраl.com, рауpal.com, рауpаl.com, раурal.com or раураl.com to ask their paypal e-mail and password to your “customers”. That could only work if the said customer is very distracted or if he has previously typed “about:config” in the address bar and set network.IDN_show_punycode to false. (That works with Firefox. The way to do it could be different with other browsers.) But, as far as I know, the domain names are commonly written in lowercase. When I type in capital a domain name which doesn’t exist, such as CUYOPUIESVRDKRSIXTVESVRDSHKSE.com, it is automatically converted in lowercase (http://www.cuyopuiesvrdkrsixtvesvrdshkse.com/) before the “not found” message is displayed. In Unifon, only the capital letters would look alike. The lowercase letters would be different. There could be a problem with the letter o, but that would be a drop in the ocean, not more problematic than the letter ᴏ (small capital o), ο (Greek omicron), о (Cyrillic o), ⲟ (Coptic o), Ь (Deseret o), ჿ (Georgian labial sign), ੦ (Gurmukhi zero), all the zeros, most of which look like circles, etc. What exactly is the real security issue with Unifon as a separate script? Some one who wants to spoof will find a way to do it without that. NOW, a few comments about the Unifon proposal. You didn’t correct “for several the Hupa, Yurok, Tolowa, and Karok languages”. There’s also the word “Karok”. Below, you write “Karuk”. In the Unifon letters unified with existing characters, you forgot the letter I. You propose a Latin capital letter small capital i to be paired with ɪ (Latin letter small capital i). Would ɪ have wider serifs when displayed in small caps? For the Latin capital beta, you wrote: “The unique Latin capital form meets one of the major criteria for disunification.” Could I use the same formula for Unifon? The unique Unifon small forms meet one of the major criteria for disunification… In the previous proposal, you also included a letter which looked a little like a ƆC ligature or a rounded X. You called it zhay in n4195. Have you forgotten it deliberately? That’s the last letter in figure 1, although you wrote X in the caption. You also used an X in Figure 7’s caption: it would be strange to have an X pronounced /ʒ/ (zh) in a phonemic alphabet for English. In the first three columns of the table at page 12, the two parts of Latin letter oy are detached. In all samples of Unifon I’ve seen which use that letter, the vertical line of the turned Ⱶ is tangent to the right of the O. In the same table, the Latin letter dhe should have a round shape. That’s one of the two features which permit to distinguish it from the Latin letter the. In all Unifon fonts I know except one, the left part of the letter dhe is not really a T but something midway between a T and a Γ. I think Latin letter the should have a small top bar. In this table of the Tolowa Unifon alphabet, http://unifon.org/images/TOLOWA.jpg , some letters have a different value when followed by a small stroke which looks like an apostrophe. Should it be an ASCII apostrophe, a ’ (U+2019), a ʼ (U+02BC), a Ꞌ (saltillo) or something else? On page 3, the capital ʃ looks like an enlarged form of the lowercase letter, different from the Greek capital sigma-like Ʃ. Would the unique Latin capital form meets one of the major criteria for disunification. What about the capital U with a tail? I wonder whether the 8th letter of the 42-letter “Indian Unifon Single-Sound Alphabet” is a turned or a reversed C. For the turned e-r, I think a new lower case is needed. For the Latin letter reversed-e e, could the double ϵ, used for the same sound in the Initial Teaching Alphabet, be used as a lower case letter? Would a separate proposal be required for the Initial Teaching Alphabet (http://en.wikipedia.org/wiki/Initial_Teaching_Alphabet)? 28 or 29 letters of this 44 letter alphabet are already supported: b, c, d, f, ɡ, h, j, k, l, m, n are already supported. ng ligature is different from ŋ. p, r, s are already
RE: Flag tags
One possible problem with either (a) encoding flags or (b) encouraging the display of Regional Indicator Symbols as flags is that some authors would want to use them to indicate the language of the text that follows. I'm not talking about inline, plain-text language tagging in the sense that UTC frowns upon, but literally a visual display of a flag. It's common, particularly in Europe, to see English-language text marked with a Union Jack, French-language text marked with the flag of France, and so forth. Of course, we all know the problems with using national flags to indicate languages, but it's common practice nevertheless. Having Unicode characters for flags, especially well-supported ones, might encourage this practice. Of course, the Japanese phone users might have been doing this all along with the existing 10 emoji flags. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
RE: Flag tags
Asmus Freytag asmusf at ix dot netcom dot com wrote: First, reprinting Shakespeare's works using flags would make it immediately and utterly illegible to most speakers of English. So they would fail the test of being recognizably the same letter. Second, one place where the flags are still used today is sailboat races. Replacing the flag by a placard showing the letter would also not be acceptable in that context. So, seeing that Unicode nowadays has the support of SMS-specific symbols as part of its scope, who would like to be able to communicate with flags? Another alphabet, even that with 1:1 correspondence to Latin, but, again, not recognizable as such are the dancing men. They at least can be demonstrated to have appeared in print. Are substitution ciphers candidates for encoding? -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
Re: Flag tags
Am Donnerstag, 31. Mai 2012 um 20:09 schrieb John H. Jenkins: JHJ tongue-in-cheek JHJ ... that because some JHJ countries have currency symbols with decidated code points, other JHJ countries will make *new* currency symbols and demand that *they* JHJ get dedicated code points ... Seriously speaking, flag symbols and currency signs are completely different topics. Every country has exactly one flag, right now. Thus, in fact an encoding proposal proposing only a few of them based on an arbitrary collection made by some telephone companies without proving any scrutiny for its making never can be acceptable for most national bodies represented in ISO. On the other hand, currencies may exist without a currency symbol (as in fact most currencies do). In fact, all currency symbols assigned to currencies valid today are included in Unicode now, with only two exceptions after acceptance for the new Turkish Lira sign: AZN Azerbaijan Manat (waiting for confirmation of its actual use), ANG Netherlands Antillean guilder (used formerly mostly for NLG Dutch guilder which was valid until 2002; problematically unified with U+0192 LATIN SMALL LETTER F WITH HOOK; see http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3588.pdf ) On this base, nobody will request the addition of other symbols as precondition for acceptance for any new currency sign on ballot. - Karl
Re: Flag tags
On Thu, May 31, 2012 at 12:03 PM, Doug Ewell d...@ewellic.org wrote: Asmus Freytag asmusf at ix dot netcom dot com wrote: First, reprinting Shakespeare's works using flags would make it immediately and utterly illegible to most speakers of English. So they would fail the test of being recognizably the same letter. [...] Another alphabet, even that with 1:1 correspondence to Latin, but, again, not recognizable as such are the dancing men. They at least can be demonstrated to have appeared in print. Are substitution ciphers candidates for encoding? Exactly. I've always thought that Cyrillicized Latin fonts (Яussiaи with all Latin backing) and flag letters and various other weird symbolic conversions are perfectly legal if limited Unicode fonts. The Dancing Men are really a special font for Latin. -- Kie ekzistas vivo, ekzistas espero.
RE: Flag tags
First, reprinting Shakespeare's works using flags would make it immediately and utterly illegible to most speakers of English. So they would fail the test of being recognizably the same letter. FWIW: The Alpha flag doesn't mean A. For example it also means Diver Down. Most of the flags have other meanings beyond just a letter, like Quebec Quarantine. So it's not just a substitution cipher. Combinations can also have special meanings. Additionally, repeaters make it more complicated than a simple substitution cipher, eg: November, Oscar, Repeat2, Repeat1 for noon == 4 different flags for 2 letters. [Description: ICS November.svg]http://en.wikipedia.org/wiki/File:ICS_November.svg [Description: ICS Oscar.svg]http://en.wikipedia.org/wiki/File:ICS_Oscar.svg [Description: ICS Repeat Two.svg]http://en.wikipedia.org/wiki/File:ICS_Repeat_Two.svg [Description: ICS Repeat One.svg]http://en.wikipedia.org/wiki/File:ICS_Repeat_One.svg -Shawn inline: image001.pnginline: image002.pnginline: image003.pnginline: image004.png
Re: Flag tags
On 5/31/2012 12:07 PM, Karl Pentzlin wrote: Am Donnerstag, 31. Mai 2012 um 20:09 schrieb John H. Jenkins: JHJ tongue-in-cheek JHJ ... that because some JHJ countries have currency symbols with decidated code points, other JHJ countries will make *new* currency symbols and demand that *they* JHJ get dedicated code points ... Seriously speaking, flag symbols and currency signs are completely different topics. Every country has exactly one flag, right now. But not all of these flags are used in writing - right now. This is similar to not all currencies having a symbol. There's nothing wrong with encoding a subset and leaving the door open for additions - there's no reason to jump to encoding hundreds of concrete cloth and thread symbols without any indication that they are used in text. Or is there? Also, for those of you not residing in North America, a point of information: the state flags of the 50 states of the USA are flown widely - if not as widely as the federal flag, and the accompanying symbols and designs (including seals) are widely used in publications. So, there's not a simple 1 country : 1 flag principle here - if you look at actual usage, there's a wide variety of practices. A./ Thus, in fact an encoding proposal proposing only a few of them based on an arbitrary collection made by some telephone companies without proving any scrutiny for its making never can be acceptable for most national bodies represented in ISO. On the other hand, currencies may exist without a currency symbol (as in fact most currencies do). In fact, all currency symbols assigned to currencies valid today are included in Unicode now, with only two exceptions after acceptance for the new Turkish Lira sign: AZN Azerbaijan Manat (waiting for confirmation of its actual use), ANG Netherlands Antillean guilder (used formerly mostly for NLG Dutch guilder which was valid until 2002; problematically unified with U+0192 LATIN SMALL LETTER F WITH HOOK; see http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3588.pdf ) On this base, nobody will request the addition of other symbols as precondition for acceptance for any new currency sign on ballot. - Karl
Re: Flag tags
On 5/31/2012 1:56 PM, Shawn Steele wrote: First, reprinting Shakespeare's works using flags would make it immediately and utterly illegible to most speakers of English. So they would fail the test of being recognizably the same letter. FWIW: The "Alpha" flag doesn't mean "A". For example it also means "Diver Down". Most of the flags have other meanings beyond just a letter, like Quebec Quarantine. So it's not just a substitution cipher. Combinations can also have special meanings. Additionally, repeaters make it more complicated than a simple substitution cipher, eg: November, Oscar, Repeat2, Repeat1 for noon == 4 different flags for 2 letters. See, there you go. A./
Re: Flag tags
On 5/31/2012 12:03 PM, Doug Ewell wrote: Another alphabet, even that with 1:1 correspondence to Latin, but, again, not recognizable as such are the dancing men. They at least can be demonstrated to have appeared in print. Are substitution ciphers candidates for encoding? To the degree that the use of the substitution is style, no. Fraktur and Insular forms have been unified for Latin. But these styles are also recognizable (if not to all users, then a significant number). And, there's a benefit in identifying them primarily with the Latin alphabet, and only secondarily with the precise style. The dancing men are more like Braille. There's one source where they have been given a particular mapping to the Latin alphabet, but that mapping is not the only one possible. The whole point of them is that the actual mapping has to be known or discovered each time. So, yes, these would have to be encoded by shape, not by target. A./
Re: Flag tags
On 31 May 2012, at 22:57, Asmus Freytag wrote: See, there you go. What do you mean by this? Michael Everson * http://www.evertype.com/
Re: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign)
Here he probably meant that if we need to encode many flags, each flag code may be arbitrarily long. A solution based on combining characters will not work correctly, and it will be better to use leading and traling markers, or to use a codification that allows knowing where a flag starts and where it finishes. There are two solutions: (1) use specific punctuation-like characters acting like brackets (those brackets can be given also a visual glyph by themselves), and encode the intermediate flag code using usual characters. This would allow viable fallback representations of flags, even if they show the codes (as letters will be encloded, for reasability, the set should be restricted and probably only uppercase, so that letters can be reduced easily within the enclosing sym (2) restrict the subset of characters that are usable in flag identification codes to a useful and productive subset of ASCII, then reencode them as enclosed letters marking the start and end of the code, as well as eventual medial codes. This eases the production of fonts for a reasonnable representation of these codes within a visual band looking like a flag, as well as allows those sequences to ve easily converted into ligatures for showing the actual flags (including with their colors if needed). Your solution based on SWSP *separator* does not solve anything, it does not clearly indicates that this is representing a flag, and will not allow automated recognition and production of ligatures. 2012/5/31 Andrew West andrewcw...@gmail.com: On 31 May 2012 00:24, Mark Davis ☕ m...@macchiato.com wrote: There is definitely a problem. Is it really such a problem? Why can't implementations simply use ZWSP to demarcate the 2-character units in a sequence of more than two regional indicator symbols (and maybe always emit 2-character codes wrapped between ZWSP on either side to be safe), so for example USZWSPESZWSPGE would be parsed as the regional indicator symbols for USA, SPAIN and Georgia, whereas UZWSPSEZWSPSGZWSPE would be parsed as the regional indicator symbols for U (invalid), Sweden, Singapore and E (invalid). Algorithms such as line-breaking would not break between two regional indicator symbols, but only at a ZWSP. And if implementations wanted to support two- and three-letter regional codes, they might parse ZWSPGBZWSPCYMZWSPENGZWSPNIRZWSPSCOZWSP as the codes for United Kingdom, Wales, England, Northern Ireland, and Scotland, and represent them visually with the appropriate flag icons. Andrew
Re: Flag tags
2012/5/31 Asmus Freytag asm...@ix.netcom.com: On 5/30/2012 10:15 PM, Doug Ewell wrote: A seemingly straightforward solution to the “unambiguous mapping” problem would be to use the existing Plane 14 tag letters along with a new FLAG TAG, say at U+E0002. Then E0002, E0043, E0048 would unequivocally denote the current Swiss flag. No need for separate lead and trail. Simple. ... What’s that? Oh, sorry, never mind. Deprecated. Doug, you put your finger on it. Any form of combining scheme is doomed to fail. This includes the current approach of Regional indicators. You're wrong. The Régional indicators failed because they were encoded at the character level, so that their scope of effect was supposed to extended to arbitrary lengths of texts. Here it's just about how to represent a glyph (even if it's colored) locally representing a flag. The scope of the encoded substring will not go outside of this flag indicator, so it will work the same way as if this were encoded as ligatures. You can perfectly create a breaking rule that will aboid breaking the sequence of encoded characters representing the flag with its code. It can be handled perfectly as if it was an unbreakable word, surrounded by two punctuation marks (which will still be a valid fallback display method, in case of absence of the glyphs in fonts for this type of string). You can perfectly assign representative glyphs for the indididual characters (these glyphs don't have to represent any complete flag, just a part of a flag showing internally its code. In fact, all characters used will be treted as separate symbols (independantly of the fact that they *may* be ligatured to show the actual flag design. The encoding will provide a clear indication that substituting the list of default representative glyphs to an actual flag will be valid (it won't break the character identities, as long as there exists a registry describing the assigned flag codes, reencoded with these symbols). In other words, it avoids completely the need to encode directly any flag of any political entity (or with a naming convention applied in the vexillologist registry, for any other personal or organisational flag). It avoids all copyright issues and the problem of legal restriction of use of flags (including in some countries where some flags are prohibited).
Re: Flag tags
2012/5/31 Asmus Freytag asm...@ix.netcom.com: On 5/31/2012 12:07 PM, Karl Pentzlin wrote: Am Donnerstag, 31. Mai 2012 um 20:09 schrieb John H. Jenkins: JHJ tongue-in-cheek JHJ ... that because some JHJ countries have currency symbols with decidated code points, other JHJ countries will make *new* currency symbols and demand that *they* JHJ get dedicated code points ... Seriously speaking, flag symbols and currency signs are completely different topics. Every country has exactly one flag, right now. This is wrong if you consider their dependencies. Some dependencies legally have their own flag used *instead* of the flag for the main/metropolitan part of the country. So countries can have several flags. Then consider that countries may also have several flags for different usages (national flag, civil flag, naval flag...) Also the same flag may be shared by different political entities (e.g. The European Union reuses the flag of the Council of Europe, with permission, and made it one of its official emblems). Some flags are also shared without permission, because the original design was not protected internaitonally or had fallen in public domain (including in the country of origin). Flags have strong political issues that are out of scope for encoding directly in the UCS. They are not stable across history, so they should be versioned, but most frequent uses will omit the precise versioning, so that flags will be instantly replaced at any time (e.g. if you encode a flag for US, how many stars will there be on it ? Libya changed its flag recently, returning to an older flag ; in many cases it will not really matter, but if you have to deal with encoded texts that are also versioned themselves, it will not be acceptable to have flag designs freely interchanged as it would cause confusion : consider the case of countries that appeared in the history as part of a split or merge, in an article speaking about their history, and identifying the armies and generals with their respective flag...).
Re: Flag tags
2012/5/31 Doug Ewell d...@ewellic.org: Philippe Verdy wrote: So to represent the flag of Japan, you could encode: FLAG INITIAL SYMBOL J FLAG FINAL SYMBOL P [...] For me, the existing Plane 14 mechanism would have worked just as well, without requiring three more duplicate sets of printable Basic Latin. You can perfectly map this small set of symbols in Plane 14. And no, they are NOT confusable and not a duplicate set of Basic Latin : their representative glyphs will be clearly different. They will be REAL symbols, even if they embed a letter in their default representative glyph (this letter will disappear when the ligatures will be generated by renderers supporting a mapping from flag codes to actual glyphs, either with fonts build specifically for some recognized ligatured, or with the help of an external protocol to get a flag from an external flags registry (which we don't need to specify in Unicode).
Unicode sessions at Localization World Paris
On Monday, 4 June, noted experts Richard Ishida (W3C) and Addison Phillips (Lab126) will team up to present a full day of sessions on Unicode. In the morning, Richard Ishida will present “An Introduction to Writing Systems and Unicode”, a tutorial that will introduce the basic functioning of Unicode in dealing with non-Latin writing systems. It is an excellent orientation for people new to these concepts, but it also offers content for people at intermediate and advanced levels due to the breadth of scripts discussed. In the afternoon, Addison will present Internationalization: An Introduction, a two-part tutorial covering: •What is internationalization? •What is Unicode? Implementing and using the standard. •How do you prepare software localization and translation? Finally, Richard and Addison will present Towards the Promised Land: Globalization Developments in Web Standards, which surveys current developments at the W3C. You may register for any or all of these sessions via http://localizationworld.com/lwparis2012/registration.php where you will see the sessions in the preconference day. This is an opportunity to get a taste of the Unicode conference to be held in California on the following October 22-24, and see how the people on your staff can benefit from a deeper knowledge of Unicode and internationalization. Lisa Moore --
Flag emoji
The UTC considered as one of the possible approaches to the problem. While easier in terms of line breaking, there'd still be a requirement to change grapheme cluster boundaries and word boundaries to join sequences like , and people felt the approach didn't work well with encoding conversion. About conversion, I think the discussion was something like the following: It is relatively simple to have a mapping like: sjis bytes ↔ [joiner] If we used ZWSP, then we'd have: sjis bytes ← // but the code wouldn't know when to also absorb adjacent ZWSPs. sjis bytes → // but the code would need context to know when to add adjacent ZWSPs. Both of those would be complicated for encoding converters to handle. People also felt that [joiner] would be more consistent with treating the sequence as a unit, both conceptually and in fonts. I personally favored the ZWSP, but was convinced during the discussion that ZWJ was a better approach. -- Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Thu, May 31, 2012 at 2:47 AM, Andrew West andrewcw...@gmail.com wrote: On 31 May 2012 00:24, Mark Davis ☕ m...@macchiato.com wrote: There is definitely a problem. Is it really such a problem? Why can't implementations simply use ZWSP to demarcate the 2-character units in a sequence of more than two regional indicator symbols (and maybe always emit 2-character codes wrapped between ZWSP on either side to be safe), so for example USZWSPESZWSPGE would be parsed as the regional indicator symbols for USA, SPAIN and Georgia, whereas UZWSPSEZWSPSGZWSPE would be parsed as the regional indicator symbols for U (invalid), Sweden, Singapore and E (invalid). Algorithms such as line-breaking would not break between two regional indicator symbols, but only at a ZWSP. And if implementations wanted to support two- and three-letter regional codes, they might parse ZWSPGBZWSPCYMZWSPENGZWSPNIRZWSPSCOZWSP as the codes for United Kingdom, Wales, England, Northern Ireland, and Scotland, and represent them visually with the appropriate flag icons. Andrew
Re: Flag tags
So I could propose, say, the Pigpen cipher? -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell -Original Message- From: Asmus Freytag Sent: Thursday, May 31, 2012 16:03 To: Doug Ewell Cc: Shawn Steele ; verd...@wanadoo.fr ; Michael Everson ; unicode Unicode Discussion Subject: Re: Flag tags On 5/31/2012 12:03 PM, Doug Ewell wrote: Another alphabet, even that with 1:1 correspondence to Latin, but, again, not recognizable as such are the dancing men. They at least can be demonstrated to have appeared in print. Are substitution ciphers candidates for encoding? To the degree that the use of the substitution is style, no. Fraktur and Insular forms have been unified for Latin. But these styles are also recognizable (if not to all users, then a significant number). And, there's a benefit in identifying them primarily with the Latin alphabet, and only secondarily with the precise style. The dancing men are more like Braille. There's one source where they have been given a particular mapping to the Latin alphabet, but that mapping is not the only one possible. The whole point of them is that the actual mapping has to be known or discovered each time. So, yes, these would have to be encoded by shape, not by target. A./
Re: Flag emoji
On Thu, May 31, 2012 at 4:18 PM, Mark Davis ☕ m...@macchiato.com wrote: If we used ZWSP, then we'd have: sjis bytes ← // but the code wouldn't know when to also absorb adjacent ZWSPs. sjis bytes → // but the code would need context to know when to add adjacent ZWSPs. I think we could do this reasonably well by providing two mappings for the same sjis bytes: sjis - A+A+ZWSP sjis - A+A A longest-match conversion would get the desired results. I believe there were more objections to the ZWSP approach though. I think one was about losing the ZWSP in editing and copy-paste. (I didn't write down details.) markus
Re: Flag tags
On 1 Jun 2012, at 00:59, Doug Ewell wrote: So I could propose, say, the Pigpen cipher? I would rather you help convince people about the Unifon proposal. Michael Everson * http://www.evertype.com/
Re: Flag tags
Let's not forget the largest collection of flags collected on the web : Flags of the World, maintained since lots of years (initially via Usenet before the Internet we know today). All other references are found there, including the International Association of Vexillologal Association (IAVA), that should be involved in the project of building and maintaining a registry of flag codes. The FOTW seb site has always had several domains, some disappearing, but mirrored together. This one is the most stable : http://www.crwflags.com/fotw/flags/index.html
Re: Flag tags
On 5/31/2012 3:29 PM, Philippe Verdy wrote: 2012/5/31 Asmus Freytagasm...@ix.netcom.com: On 5/31/2012 12:07 PM, Karl Pentzlin wrote: Am Donnerstag, 31. Mai 2012 um 20:09 schrieb John H. Jenkins: JHJtongue-in-cheek JHJ... that because some JHJcountries have currency symbols with decidated code points, other JHJcountries will make *new* currency symbols and demand that *they* JHJget dedicated code points ... Seriously speaking, flag symbols and currency signs are completely different topics. Every country has exactly one flag, right now. This is wrong if you consider their dependencies. Some dependencies legally have their own flag used *instead* of the flag for the main/metropolitan part of the country. So countries can have several flags. And some have well established flags for their constituent parts - because they arose of a federation of entities. Then consider that countries may also have several flags for different usages (national flag, civil flag, naval flag...) Good point. Also the same flag may be shared by different political entities (e.g. The European Union reuses the flag of the Council of Europe, with permission, and made it one of its official emblems). Some flags are also shared without permission, because the original design was not protected internaitonally or had fallen in public domain (including in the country of origin). Examples? Flags have strong political issues that are out of scope for encoding directly in the UCS. They are not stable across history, so they should be versioned, but most frequent uses will omit the precise versioning, so that flags will be instantly replaced at any time (e.g. if you encode a flag for US, how many stars will there be on it ? Obviously these are all glyph variants? A./ Libya changed its flag recently, returning to an older flag ; in many cases it will not really matter, but if you have to deal with encoded texts that are also versioned themselves, it will not be acceptable to have flag designs freely interchanged as it would cause confusion : consider the case of countries that appeared in the history as part of a split or merge, in an article speaking about their history, and identifying the armies and generals with their respective flag...).
Re: Flag tags
2012/6/1 Asmus Freytag asm...@ix.netcom.com: They are not stable across history, so they should be versioned, but most frequent uses will omit the precise versioning, so that flags will be instantly replaced at any time (e.g. if you encode a flag for US, how many stars will there be on it ? Obviously these are all glyph variants? If you speak about the flag of Lybia, differences are significant when there are opposed parties. During the last Libyan revolution, those flags were used very distinctly. They were not free variants of each other. Yes you may have a genic flag code that maps to the latest version of the flag, but versioned flags should be encoded separately. Similar to the encoding of languages : you may have en or en-US vs. en-GB and several subtags for variants...
Re: Flag tags
This would be a great resource for developing a flags code, as Philippe suggested earlier, an idea I actually think has quite a bit of merit. However, I'm not sure it has much relevance to character encoding. It's not that hard to imagine encoding 220 or so current national flags or placeholders, but you wouldn't want to expand this to, say, tens of thousands. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell -Original Message- From: Philippe Verdy Sent: Thursday, May 31, 2012 18:06 To: Doug Ewell Cc: Asmus Freytag ; Shawn Steele ; Michael Everson ; unicode Unicode Discussion Subject: Re: Flag tags Let's not forget the largest collection of flags collected on the web : Flags of the World, maintained since lots of years (initially via Usenet before the Internet we know today). All other references are found there, including the International Association of Vexillologal Association (IAVA), that should be involved in the project of building and maintaining a registry of flag codes. The FOTW seb site has always had several domains, some disappearing, but mirrored together. This one is the most stable : http://www.crwflags.com/fotw/flags/index.html
Re: Flag tags
Michael Everson wrote: So I could propose, say, the Pigpen cipher? I would rather you help convince people about the Unifon proposal. I actually wasn't planning to propose Pigpen. I was just surprised the idea would even be considered. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell
Re: Flag tags
That's why I just propose an external registry rather then a direct encoding of individual flags. A naming convention (using namespace prefixes) could be used to make sure that the common codes from ISO 3166-1 will be usable. I'm not sure that the CLDR TC is currently competent to develop such a registry, but it may work along with the IAVA to develop the naming convention for use in the registry (which could be hosted by IAVA or by Unicode. To be decided later. The CLDR TC would be involved in the development of the registry rules, for its stability. 2012/6/1 Doug Ewell d...@ewellic.org: This would be a great resource for developing a flags code, as Philippe suggested earlier, an idea I actually think has quite a bit of merit. However, I'm not sure it has much relevance to character encoding. It's not that hard to imagine encoding 220 or so current national flags or placeholders, but you wouldn't want to expand this to, say, tens of thousands.
Re: Flag tags
e.g. the empty namespace could be reserved for country codes. Namespace separation could use the hyphen (like in language codes). So the generic US flag would be coded as simply as -US (with the leading hyphen) If rendering the defautl glyphs, you'll see that hyphen. The laternative being to use a space separator, so that the standard code would just be rendered showing only the country code with the default glyphs. Other namespaces extensions will use a non empty prefix per category. 2012/6/1 Philippe Verdy verd...@wanadoo.fr: That's why I just propose an external registry rather then a direct encoding of individual flags. A naming convention (using namespace prefixes) could be used to make sure that the common codes from ISO 3166-1 will be usable. I'm not sure that the CLDR TC is currently competent to develop such a registry, but it may work along with the IAVA to develop the naming convention for use in the registry (which could be hosted by IAVA or by Unicode. To be decided later. The CLDR TC would be involved in the development of the registry rules, for its stability. 2012/6/1 Doug Ewell d...@ewellic.org: This would be a great resource for developing a flags code, as Philippe suggested earlier, an idea I actually think has quite a bit of merit. However, I'm not sure it has much relevance to character encoding. It's not that hard to imagine encoding 220 or so current national flags or placeholders, but you wouldn't want to expand this to, say, tens of thousands.
Re: Flag tags
On 5/31/2012 5:06 PM, Michael Everson wrote: On 1 Jun 2012, at 00:59, Doug Ewell wrote: So I could propose, say, the Pigpen cipher? I would rather you help convince people about the Unifon proposal. hehe. A./ PS:what's Unifon and what's it got to do with it?
Re: Flag tags
Note that I gave an URL for the Flags Of The World site which is hosted by a commercial vendor of manufactured flags. But as the site is built from a collection of static HTML webpages, without any script, its is easily mirrored on various place. For now, Wikipedia prefers referencing a vendor-neutral website at this address: http://flagspot.net/flags/ The pages are identical, only the base URL change, all relative URLs are identical starting at the flag/ folder. It is fed by discussions and contributors on its old mailing list (the main place of discussions related to the FOTW project), whose volume is huge (about than one half million messages sent since 1993, about 2000 or 3000 mails per month), notably because it also conveys photos, and graphic designs. But the effective discussions are even larger within the local associations that are members of FIAV. The FIAV itself (from which the FOTW wide is just a small visible part containing a summary of the huge collection of flags discussed and maintained by the various member associations and its contributors) has offices in Belgium (presidence), Texas (general secretariat), and UK (conferences). I think it is illusory to restart completely the huge work already performed by the FIAV and exposed partly in the FTOW website. If you ever want to know how best the codification should be made (how many distinct characters you need to support the reencoding into abstract symbols that will later be recombinable into ligatures showing the actual flags), I suggest that the UTC contacts the general secretary. All contact details are on this page: http://flagspot.net/flags/vex-fiav.html (once again ignore the base URL http://flagspot.net; before /flag, which varies depending on the various website mirrors you'll find easily on web search engines). Immediately, you won't need anything more that a subset of symbols to represent each letter of the code. The registry can be developped later only for standardizing the recognized ligatures. In the Unicode standard, there's no need to encode any subcollection of flags, even if we can explain how to use these symbols into ligatures. The representative glyphs shown in TUS and ISO/IEC 10646 will just display the default symbols containing the associated ASCII letter used in the registry (and most probably this should be restricted to ASCII characters usable in common filesystems for naming graphic files in whatever format the rendering applications will recognize, or to name the glyphs of ligatures when developing fonts showing more than just the separate representative glyphs displaying the codes). For allowing compativility with filenames in various filesystems, I just suggest using a single letter case, avoiding characters like / or \ which could be incompatible with some OSes or with the syntax of hierarchial URLs. Characters currently allowed for language codes should all be usable : ASCII letters, digits, hyphen separators. The slash could be added later for precise versioning purpose. The slash in a standard code would be mapped to the symbol not showing any letter or digit, but a space, in their representative glyph. I'm not sure that the colon shuld be used as it may cause compatitibility problems when deciphering a series of symbols into therir associated ASCII character par of codes that would be rempped to filenames. And the dot should not be used if it breaks file extensions in local filesystems or in URLs for reeiving a known flag from a collection of prebuilt glyphs stored as graphic files (SVG, PNG...). If we encode each character of the Flag code into symbols, we'll need then less than 50 characters in each subcollection for the start symbol, the medial symbols, and the final sybols. All would fit within 192 codepoints allocated in the SMP (or in Place 14, but that plane is not intended for visible symbols). As long as a policy is documented that allows starting representing immediately at least the generic country flags with their ISO 3166 codes, in a viable namespace, using just 3 Unicode symbols, it will remain safe for immediate use. Versioned flags may be encoded later once the registry is working. 2012/6/1 Asmus Freytag asm...@ix.netcom.com: On 5/31/2012 5:06 PM, Michael Everson wrote: On 1 Jun 2012, at 00:59, Doug Ewell wrote: So I could propose, say, the Pigpen cipher? I would rather you help convince people about the Unifon proposal. hehe. A./ PS:what's Unifon and what's it got to do with it?