Re: Unicode block for programming related symbols and codepoints?
- Indentation codepoint, with no fixed defined graphical representation. For indentation based programming languages. That wouldn’t be compliant with existing languages and future languages might use any existing character. Because: -- specific clients may want to show it different (for example as arrows, lines etc., using another color): Can’t good editors display tabs in a different color when required ? Lots of them already do, e.g. Emacs in various modes. - John Burger MITRE --- browsers could let the web page creator let decide the visual representation (character and size) via CSS --- the same with editors, independent from the actual font --- in case of visual impairment, the user could even change the accoustical representation if the editor allows it -- unlike a space symbol, it wouldn't need more than one character per indentation -- unlike tabs or space, it wouldn't be whitespace -- unlike normal arrow characters, one could customize the length in an editor and wouldn't have to insert extra spaces for a better visual imagery - A codepoint for string literal quotes, that would spare one the escaping. I rarely escape quotes. In a text, I use ’ (U+2019) as an apostrophe and «»“”‘’ as quotes, so I don’t need to escape them. When I use PHP to generate some HTML code, I try to alternate simple and double quotes as much as possible. That way I rarely need to escape them. - A statement separator symbol. To replace the semicolon in C and the languages based on its syntax? - Other ideas? Aren’t you trying to reinvent APL? You may now think, this is highly specific and you are right. However, so are EMOJI signs, in particular those like PINE DECORATION. These days, there are a lot of tools to create small embedded scripting languages and DSLs, which are used in-program in special editors. And there is a lot of people using them. Exactly these could really profit from such a codeblock instead of using conventional ASCII subset characters. Also, there is a lot of potential with really good text editors and IDEs where semantics may matter a lot. Excuse my english, I hope this was understandable. Best regards, A. Z. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: About cultural/languages communities flags
On Tue, Feb 10, 2015 at 12:11 AM, Ken Whistler kenwhist...@att.net wrote: for the full context, and for the current 26x26 letter matrix which is the basis for the flag glyph implementations of regional indicator code pairs on smartphones. SC, SO, ST are already taken, but might I suggest putting in for registering AB for Alba? That one is currently unassigned. Yeah, yeah, what is the likelihood of BSI pushing for a Scots two-letter code?! But seriously, if folks are planning ahead for Scots independence or even some kind of greater autonomy, this is an issue that needs to be worked, anyway. In the meantime, let me reiterate that there is *no* formal relationship between TLD's and the regional indicator codes in Unicode (or the implementations built upon them). Well, yes, a bunch of registered TLD's do match the country codes, but there is no two-letter constraint on TLD's. This should already be apparent, as Scotland has registered .scot At this point there isn't even a limitation of TLD's to ASCII letters, so there is no way to map them to the limited set of regional indicator codes in the Unicode Standard. Not having a two letter country code for Scotland that matches the four letter TLD for Scotland might indeed be a problem for someone, but I don't see *this* as a problem that the Unicode Standard needs to solve. I want to add to that that there are already a fair number of ISO 2-letter codes for regions that are administered as part of another country, like Hong Kong. There are also codes for crown possessions like Guernsey. So having a code for Scotland (and Wales, and N. Ireland) do not really break precedent. But as Ken says, the best mechanism is for the UK to push for a code in ISO and the UN. Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Unicode block for programming related symbols and codepoints?
I think this discussion is confusing the need for separate syntactic functions in formal language definitions with the need for *encoding* of characters. The distinction between assignment and test for equality has been around for decades in formal languages, and of course it is almost always carefully distinguished in the formal syntax: C, C++ and kindred Use = for assignment. Use == for equivalence operator. Pascal and kindred Use := for assignment. Use = for equivalence operator. Lisp Assignment: let (a 6) Equivalence evaluation: (= a 6) And so on. The fact that these formal languages do not use a *single* distinct character for each of these syntactic functions is not a formal defect -- there are many, many concepts in formal languages which are defined using sequences of characters, rather than a single character. As has already been alluded to in this thread, trying to stack all functionality into single character definitions heads back in the direction of relatively illegible APL program text. It might have its place, but isn't much of a choice for widely used general programming languages. There are two basic issues with using sequences of (typically ASCII) characters for fundamental operators: 1. It marginally complicates parsing. 2. If chosen badly, they can confuse programmers using the syntax. #1 is basically trivial, as long as the formal syntax passes the bar of not introducing syntactic ambiguity. #2 is the *real* problem, imo. The use in C of = and == was badly designed from the start, and is the source of bezillions of inadvertent programming errors in practice. But if a left arrow, for example, might be a better choice for an assignment operator in a programming language, and a two-character ASCII operator like := or - doesn't seem appropriate or causes other confusion, there still isn't a character *encoding* issue here. Just use ←, which already exists (U+2190), and is a fine left arrow! What is *not* appropriate for Unicode consideration here is trying to encode programming *functions* per se. That turns the problem on its head really. There are lots and lots of symbols already defined in the standard: it is the job of formal language designers to simply pick from them and *define* their formal functions in their language design. Just because the UTC occasionally invents new control functions and encodes them in characters -- as for the bidirectional algorithm -- does not mean that every new function conceived for a programming language is automatically a character encoding problem. Coming to the UTC looking to encode a new functional character on spec should be a matter of *last* resort -- not a first resort. It requires a carefully built case demonstrating a real use and showing that alternative approaches using existing characters do not (and cannot) work. --Ken P.S. Arrow symbols like U+2190 have been in the Unicode Standard since Unicode 1.0 in 1991. They are far, far more widely supported nowadays than any new, language-specific functional symbol addition would be. Even if the UTC agreed to such character additions at the next meeting in May, its earliest opportunity for publication would be Unicode 10 in June, 2017. That amounts to a 26 year impedance mismatch for implementations. Why would a designer of a new formal language syntax want to buy into that kind of grief for character availability, when there are hundreds of symbols in the standard to choose from that have been encoded for decades now? On 2/9/2015 8:41 AM, Andre Schappo wrote: Let me take as an example the use of = in programming. The = is used for test of equality and assignment in various programming languages. The equality and assignment operations should have different characters. e.g. U+XXX1 TEST FOR EQUALITY U+XXX2 ASSIGNMENT OPERATOR Initially the glyphs used for these characters could be = but then this mechanism can be used to transition to a new and less ambiguous visual representation. The new visual representation could be something like U+XXX1 TEST FOR EQUALITY = U+XXX2 ASSIGNMENT OPERATOR ⬅ Such a visual and character distinction between the 2 functions must surely make it easier for those learning to program and for interpreter and compiler writers. I think it would also make for easier to read/understand program code. André ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Unicode block for programming related symbols and codepoints?
But then it would be incompatible from IDE to IDE, like Python is incompatible using 2 spaces, 4 spaces and tabs. It's the data that is important, not the software. Specifically talking about Python, we should not solve what PEP 8[1] is intended for in Unicode. Pythonistas and their IDEs are encouraged to use linters to address syntactical discrepancies. This, more or less, applies to other programming language as well. [1]: https://www.python.org/dev/peps/pep-0008/#tabs-or-spaces You know, the fact that this consortium ever took emoji into consideration immediately justifies to include everything everyone ever wanted. There is no such thing as important data including emoji. :) If you read the background information (in TR51 or elsewhere) on Unicode emoji, you will see how common and widespread use of PUA by Japanese providers introduced interoperability issues with the rest of the world. And no...Addressing that major compatibility/interoperability issue (and any future issue raised from address that) do not justify inclusion of everything everyone ever wanted. ↪ Shervin On Mon, Feb 9, 2015 at 4:55 AM, Alfred Zett alfre...@web.de wrote: OK, I will now try to answer all of you in one mail, otherwise it gets hard to overlook... Shervin Afshar: All of the requirements mentioned here can be (and are) implemented in higher levels of software (like IDEs). IMO, there isn't any need for adding new characters to Unicode to address these issues. But then it would be incompatible from IDE to IDE, like Python is incompatible using 2 spaces, 4 spaces and tabs. It's the data that is important, not the software. Additionally, people tend to forget that simply because Unicode is doing emoji out of compatibility (or other) requirements, it does not mean that now anything goes. I refer folks to TR51[1] (specifically sections 1.3, 8, and Annex C). [1]: http://www.unicode.org/reports/tr51 You know, the fact that this consortium ever took emoji into consideration immediately justifies to include everything everyone ever wanted. There is no such thing as important data including emoji. :) Jean-Francois Colson: I need a few tens of characters for a conlang I’m developping. ☺ Except two or three control characters don't make a con language. Also, if you don't like con languages in Unicode, what's this: http://unicode.org/charts/PDF/U1F700.pdf The problem is that Unicode only encodes characters which are effectively used today or which have been used in the past. It doesn’t encode characters which could perhaps be used in a hypothetical new programing language in the future. So you want the font encoding scheme to be a limitating factor for new things? Pierpaolo Bernardi: How would your proposed character be displayed as plain text? There is no such thing as plain text. Even line breaks and tabs are a matter of interpretation. It's just that they usually have typographic semantics, even in programming editors, with all the side effects. In very simple (and with that I mean shitty or not even remotely programming oriented) editors, it may show like a control character, like ␄. Browsers and any editor passing the based on scintilla complexity mark of course should display something that makes more sense, like an arrow or ⍈ plus surrounding space. Unicode is a standard for plain text. If you require a special IDE for your programming language then why use plain text at all? Because binary custom encoded databases or blob files are the death of interoperability. Konstantin Ritt: Easier than latin1, a layout one could find on [almost] every keyboard? Good luck. Also: Jean-Francois Colson: Hard to input? Not harder than the new symbols you’d like to propose. That’s only a matter of keyboard layout and input method. Indent by pressing tab and insert the literal thing by pressing . Nothing changes, the IDE/editor does the work on the fly. Just that you have clean semantics, interoperability and customizability. Beat that, APL. Where you would 10 key bindings or an annoying software keyboard. I’ve never used APL so I don’t remember the meanings of its symbols, but couldn’t ⍘ U+2358 APL FUNCTIONAL SYMBOL QUOTE UNDERBAR or ⍞ U+235E APL FUNCTIONAL SYMBOL QUOTE QUAD work as “string litteral quotes” in a new programming language? That's a good idea. That still leaves the indentation character, which is harder than that, because one would want a control character with certain semantics. E.G.: For programming editors it would make sense to only allow it after line breaks and convert other occurences into tabs. If the IDE inputs your new character when you press tab, then your new character is a tab… Not if it detects the beginning of a line. Best regards A. Z. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: About cultural/languages communities flags
On Mon, Feb 9, 2015 at 9:54 AM, Andrea Giammarchi andrea.giammar...@gmail.com wrote: if a cultural/language TLD is typed with Unicode RIS, then show the flag for these culture/language: This does not work. The Unicode RIS are defined to be used in pairs, with semantics according to corresponding ISO 3166 alpha2 codes. In your examples, each successive pair will encode a flag. If you want to represent every flag of every locality, you first have to figure out how to catalog and label them. You are mentioning provinces, one level down from nation states; I guess there are thousands of them. In much of Europe, every little village http://de.wikipedia.org/wiki/Butterstadt has its own flag and coat of arms. Where do you want the text encoding and fonts to stop? markus ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Unicode block for programming related symbols and codepoints?
Frédéric Grosshans frederic dot grosshans at gmail dot com wrote: The including of emoji was a considerable debate here, with people strongly against and strongly for. The trick is that they were already used as digital characters by Japanese Telcos and their millions of customers. They were de facto encoded as characters in Japanese text messages. At the time of encoding, the spread of smartphones made them appear in other places (emails, web forums, etc.) Sorry, I can't let the compatibility argument go unchallenged again. It can be argued — and was, repeatedly and persuasively — that the initial collection of emoji in Unicode 6.1 [1] were added for compatibility with Japanese telco extensions to JIS. But the additional emoji added to Unicode 6.2 and 7.0, and planned for 8.0, do not have even this provenance; they were added on foot of novel proposals sent directly to Unicode, or (more recently) by popular request. There is no longer any requirement that the robot faces and burritos appear first in any sort of industry character set extension, with which Unicode is then obliged to maintain compatibility. [1] No, I am not counting the ARIB symbols or any other long-encoded symbols that have been retroactively defined as emoji, to help legitimize the latter. Alfred Zett alfred underscore z at web dot de) replied: The trick is that one doesn't bargain with Telcos and similar criminals. Gotta drop them hard and the pest will go away from itself after five years or so. This does not help to make a case for or against encoding of anything. -- Doug Ewell | Thornton, CO, USA | http://ewellic.org ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: About cultural/languages communities flags
Thanks for your replies, As far as I see, my informal request for expanding current RIS design hasn't a good response. I understand it. Flags are cause of disputes, and it isn't an issue for Unicode encode them. IMHO keept tied to 2-alpha codes is a poor choice for users. May be industry manufactures could find a better approach. Best regards, Joan Montané ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Unicode block for programming related symbols and codepoints?
@ John D Burger: And out of the sudden a war wages what counts as good editor. :D @ Andre Schappo: That's a good idea. We need it in the name of science and education. :D William_J_G Overington: Hi You might like the following post. http://www.unicode.org/mail-arch/unicode-ml/y2010-m06/0001.html William Hi, I'm really not sure what this is about, but it seems like an interface to deliver instructions to the rendering VM? Martin v. Löwis: So if you can't demonstrate usage, you should at least demonstrate demand (rather than just claiming that there might be demand). The problem is, you can't do that with the topic at hand. Because most programmers don't even see the possibilities. It's like asking a blind what colors look like. Although that may sound kind of arrogant. Among language designers and people interested in stuff like this, there is only a small fraction that doesn't hold the ill-minded opinion that syntax doesn't matter at all. Among those who care for syntax there is only a small fraction that really knows enough about Unicode. And who can blame them, I still see broken characters on a weekly base. Among those there is only a small fraction that cares enough. Among those there is only a small fraction that has the nerves/balls to put up with a consortium. This small subset is a handful of people, like André, me and maybe 3 other persons. I don't really feel comfortable to sound that elitist, but in this case I dare say that the consortium shouldn't care for established popularity, the same way they should have handled emoji characters. Best regards A. Z. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
About cultural/languages communities flags
Hello everyone, I've had an interesting request [1] that makes sense to me, but I'd like to understand Unicode position about it. The TL;DR version of the request is the following: There are communities, let's take Scottish people as example, that have even a domain but not an emoji flag. Some flag s related project adopted more than what we have now in emoji, inclucing 239 flags: http://www.famfamfam.com/archive/flag-icons-released/ The proposal is quite simple, and I am quoting from the request: if a cultural/language TLD is typed with Unicode RIS, then show the flag for these culture/language: -- it shows Scottish flag -- it shows a Welsh flag -- it shows a Breton flag -- it shows Catalan flag -- it shows a Basque flag -- it shows a Gallician flag Thanks in advance for any sort of outcome. Best Regards [1] https://github.com/twitter/twemoji/issues/40 ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Emoji (was: Re: Unicode block for programming related symbols and codepoints?)
Shervin Afshar shervinafshar at gmail dot com wrote: There is no longer any requirement that the robot faces and burritos appear first in any sort of industry character set extension, with which Unicode is then obliged to maintain compatibility. Only if you don't consider existing usage and popular requests as requirement and precedence; for example Gmail had Robot Face for a long time. I said there was no longer a requirement *that the items appear first in an industry character set extension*, right? In what character encoding standard, or extension, does ROBOT FACE appear? Gmail has it is not a character encoding standard. Neither is People want to see it. Most popularly requested, as a criterion for adding a character, is absolutely new to Unicode. Earlier I wrote privately to a Unicode officer about whether PERSON TAKING SELFIE and GIRL TWERKING and PERSON DUMPING ICE BUCKET OVER HEAD would be ephemeral enough, and got no reply. (What, you've forgotten the ice-bucket craze already? That's exactly why most popular at the moment wasn't supposed to be a criterion.) -- Doug Ewell | Thornton, CO, USA | http://ewellic.org ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Unicode block for programming related symbols and codepoints?
Le 9 févr. 2015 20:27, Doug Ewell d...@ewellic.org a écrit : Sorry, I can't let the compatibility argument go unchallenged again. I stand corrected (and I should have known better! ) ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Emoji (was: Re: Unicode block for programming related symbols and codepoints?)
I said there was no longer a requirement *that the items appear first in an industry character set extension*, right? The issue is with your very rigid interpretation of the criteria for encoding new symbols. Is appearing in an industry character set extension an official phrasing that you keep referring to? In what character encoding standard, or extension, does ROBOT FACE appear? Gmail has it is not a character encoding standard. Neither is People want to see it. Robot Face is available on Gmail (GChat), Facebook, and Twitch among others (calculating the size of user community is left as an assignment for the reader). That's enough usage for consideration by the UTC even if the symbol is not present in a character encoding standard. Also, since Unicode is an industry standard maintained by industry members (among others), then if there is enough request to these corporations from communities of users, then there might be some reason for considering those symbols. I think that's the case for the newer symbols. Most popularly requested, as a criterion for adding a character, is absolutely new to Unicode. Earlier I wrote privately to a Unicode officer about whether PERSON TAKING SELFIE and GIRL TWERKING and PERSON DUMPING ICE BUCKET OVER HEAD would be ephemeral enough, and got no reply. (What, you've forgotten the ice-bucket craze already? That's exactly why most popular at the moment wasn't supposed to be a criterion.) IMO, Unicode officers seems to have low patience for such sentiments. You might want to reconsider your tone. There is a time and place for sarcasm. ↪ Shervin On Mon, Feb 9, 2015 at 12:16 PM, Doug Ewell d...@ewellic.org wrote: Shervin Afshar shervinafshar at gmail dot com wrote: There is no longer any requirement that the robot faces and burritos appear first in any sort of industry character set extension, with which Unicode is then obliged to maintain compatibility. Only if you don't consider existing usage and popular requests as requirement and precedence; for example Gmail had Robot Face for a long time. I said there was no longer a requirement *that the items appear first in an industry character set extension*, right? In what character encoding standard, or extension, does ROBOT FACE appear? Gmail has it is not a character encoding standard. Neither is People want to see it. Most popularly requested, as a criterion for adding a character, is absolutely new to Unicode. Earlier I wrote privately to a Unicode officer about whether PERSON TAKING SELFIE and GIRL TWERKING and PERSON DUMPING ICE BUCKET OVER HEAD would be ephemeral enough, and got no reply. (What, you've forgotten the ice-bucket craze already? That's exactly why most popular at the moment wasn't supposed to be a criterion.) -- Doug Ewell | Thornton, CO, USA | http://ewellic.org ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: About cultural/languages communities flags
Thanks, that was somehow indeed my very first concern. Everyone could claim an emoji, at that point. Enough info for me so far, so thanks again. Best Regards On Mon, Feb 9, 2015 at 8:16 PM, Markus Scherer markus@gmail.com wrote: On Mon, Feb 9, 2015 at 9:54 AM, Andrea Giammarchi andrea.giammar...@gmail.com wrote: if a cultural/language TLD is typed with Unicode RIS, then show the flag for these culture/language: This does not work. The Unicode RIS are defined to be used in pairs, with semantics according to corresponding ISO 3166 alpha2 codes. In your examples, each successive pair will encode a flag. If you want to represent every flag of every locality, you first have to figure out how to catalog and label them. You are mentioning provinces, one level down from nation states; I guess there are thousands of them. In much of Europe, every little village http://de.wikipedia.org/wiki/Butterstadt has its own flag and coat of arms. Where do you want the text encoding and fonts to stop? markus ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Emoji (was: Re: Unicode block for programming related symbols and codepoints?)
I like symbols a lot. But I know that I and a number of people have been thinking that too much emphasis is being put on emoji. Michael Everson * http://www.evertype.com/ ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Emoji (was: Re: Unicode block for programming related symbols and codepoints?)
Doug Ewell: Most popularly requested, as a criterion for adding a character, is absolutely new to Unicode. Earlier I wrote privately to a Unicode officer about whether PERSON TAKING SELFIE and GIRL TWERKING and PERSON DUMPING ICE BUCKET OVER HEAD would be ephemeral enough, and got no reply. (What, you've forgotten the ice-bucket craze already? That's exactly why most popular at the moment wasn't supposed to be a criterion.) There is much truth in this. I'll now leave the discussion, because it doesn't lead anywhere. Best regards, A. Z. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: About cultural/languages communities flags
On Mon, Feb 9, 2015 at 1:11 PM, Joan Montané j...@montane.cat wrote: AFAIK, this is done in font side. Emoji flags are just ligatures, so a font can provide a ligature for 4 RIS characters. Technically true, but a font that violates the encoding standard would cause large problems. Imagine a font that ligates letters 't' and 'h' and displays an Egyptian hieroglyph for the combination. What's the way for encoding them in Unicode standard? In principle, the way for encoding anything in the Unicode Standard is to write a well-formed proposal, and convince the Unicode Technical Committee and ISO JTC1/SC2 that the proposal has merit. However, I would much prefer if everyone spent their considerable energy on upgrading protocols (e.g., IETF RFCs for email subject lines) and lobby relevant vendors (e.g., chat services social network messages) to support images embedded in the text stream, ideally with scaling and other behavior that would make them behave somewhat text-like. Best regards, markus ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Unicode block for programming related symbols and codepoints?
Le 09/02/2015 13:55, Alfred Zett a écrit : Additionally, people tend to forget that simply because Unicode is doing emoji out of compatibility (or other) requirements, it does not mean that now anything goes. I refer folks to TR51[1] (specifically sections 1.3, 8, and Annex C). [1]: http://www.unicode.org/reports/tr51 You know, the fact that this consortium ever took emoji into consideration immediately justifies to include everything everyone ever wanted. There is no such thing as important data including emoji. :) The including of emoji was a considerable debate here, with people strongly against and strongly for. The trick is that they were already used as digital characters by Japanese Telcos and their millions of customers. They were de facto encoded as characters in Japanese text messages. At the time of encoding, the spread of smartphones made them appear in other places (emails, web forums, etc.) Jean-Francois Colson: I need a few tens of characters for a conlang I’m developping. ☺ Except two or three control characters don't make a con language. Also, if you don't like con languages in Unicode, what's this: http://unicode.org/charts/PDF/U1F700.pdf I doubt that “not liking con languages” is a faithful description of Jean-François ;-) On a more serious notes, this block is actually a set of “scientific” (at his time) notations used by Isaac Newton in its time. They were encoded in Unicode following an academic project to digitize his manuscripts. So here, you have characters used 3 centuries ago by no less than Isaac Newton, most of them having a much longer history, and useful for science historians. See http://www.unicode.org/L2/L2009/09037r2-alchemy.pdf for details. This does not compares with a few characters invented for a conlang invented by an amateur and used by no one but himself. I think that is the point Jean-François wanted to make. A closer counter-example to Jean-François's “wish” would be Shavian (10450..1047F), but this alphabet has shown some use, and I guess that its encoding would have been much harder without its association with someone as famous as George Berard Shaw or without the existing publication of a full text in Shavian. The problem is that Unicode only encodes characters which are effectively used today or which have been used in the past. It doesn’t encode characters which could perhaps be used in a hypothetical new programing language in the future. So you want the font encoding scheme to be a limitating factor for new things? It is more or less the rule, expt that is not a font encoding, but a standard encoding. Once something is encoded , it can never be unencoded. And the Unicode standard is built to stay relevant as long as possible (decades or centuries). So you ask for your character top be encoded in billions of devices for decades. It is more than a mere font encoding. There are a few exceptions, but only when a widespread use is really expected, like for monetary symbols (it was the case for the Euro). What you are asking, is a character for an untested idea. You are convinced it is useful, but cannot prove anyone beyond yourself will use it, hence Jean-François’s parallel with conlangs. In order to have a chance of success, design a language using existing characters (e.g. some APL + → for TAB) and/or private use codepoints. Once your language start gathering steam, come back and argue that using an arrow or a tab is awkward, and that U+ SHINY TAB FOR PROGRAMMERS would be an improvement for a significant community. I know it is a lot of work, but that is probably what it takes. Pierpaolo Bernardi: How would your proposed character be displayed as plain text? There is no such thing as plain text. When you say that, you don’t accept the premise of Unicode encoding. Unicode’s goal is to encode all plain text characters, but only plain text characters. Even line breaks and tabs are a matter of interpretation. It's just that they usually have typographic semantics, even in programming editors, with all the side effects. In very simple (and with that I mean shitty or not even remotely programming oriented) editors, it may show like a control character, like ␄. Browsers and any editor passing the based on scintilla complexity mark of course should display something that makes more sense, like an arrow or ⍈ plus surrounding space. I think everyone her knows what you are saying, and that the notion of plain text is a bit fuzzy. But if you cannot argue that your character has a meaning in plaint text, for some value of “plain text”, then you can not hope for an encoding in Unicode. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: About cultural/languages communities flags
Sorry, my reply was sended CC: to Unicode ML, My apologies, Joan Montané 2015-02-09 22:11 GMT+01:00 Joan Montané j...@montane.cat: Hi all, I am the one who made the request to tweemoji Github. 2015-02-09 20:16 GMT+01:00 Markus Scherer markus@gmail.com: On Mon, Feb 9, 2015 at 9:54 AM, Andrea Giammarchi andrea.giammar...@gmail.com wrote: if a cultural/language TLD is typed with Unicode RIS, then show the flag for these culture/language: This does not work. The Unicode RIS are defined to be used in pairs, with semantics according to corresponding ISO 3166 alpha2 codes. In your examples, each successive pair will encode a flag. AFAIK, this is done in font side. Emoji flags are just ligatures, so a font can provide a ligature for 4 RIS characters. This is not an issue here. I agree some strange behaviour can appear if a 3 RIS string, take CAT, is shown in a system with only 2 RIS support (a Canadian will appear followed by a T). If you want to represent every flag of every locality, you first have to figure out how to catalog and label them. You are mentioning provinces, one level down from nation states; I guess there are thousands of them. In much of Europe, every little village http://de.wikipedia.org/wiki/Butterstadt has its own flag and coat of arms. Where do you want the text encoding and fonts to stop? I don't request flag support for every flag in the world. I requested flags for culture/language communities *with* an approved TLD (Top Level Domain). I know flags are an issue, and I know flags represents territories, not languages, but I think some support should be done for these active communities. As I pointed, some country flag collections expand with a fews non-independent country. See [1], [2] and [3] (search for Scottish or Welsh flag). You can check this [4] petition requesting Catalan flag on WhatsApp. So, there is a demand and they are used in real world. What's the way for encoding them in Unicode standard? Thanks, Joan Montané [1] http://www.famfamfam.com/lab/icons/flags/ [2] https://www.gosquared.com/resources/flag-icons/ [3] http://www.sherv.net/flag-emoticons.html [4] https://www.change.org/p/whatsapp-inc-incloure-la-senyera-de-catalunya-a-whatsapp ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
RE: Emoji (was: Re: Unicode block for programming related symbols and codepoints?)
Shervin Afshar shervinafshar at gmail dot com wrote: The issue is with your very rigid interpretation of the criteria for encoding new symbols. Is appearing in an industry character set extension an official phrasing that you keep referring to? It was either from the WG2 Principles and Procedures document, or some other bit of Unicode/10646 folklore that I've read over the past 22 years of keeping up with Unicode/10646. I should look up the exact wording. Of course, Unicode can encode anything they please. That's not in question. But in order to claim compatibility as the basis for encoding something, these specific, rigid definitions and criteria have historically been required. Compatibility with any random JPEG or meme that makes the rounds on the Internet was not enough. Robot Face is available on Gmail (GChat), Facebook, and Twitch among others (calculating the size of user community is left as an assignment for the reader). That's enough usage for consideration by the UTC even if the symbol is not present in a character encoding standard. Also, since Unicode is an industry standard maintained by industry members (among others), then if there is enough request to these corporations from communities of users, then there might be some reason for considering those symbols. I think that's the case for the newer symbols. Great. Go ahead and encode them, UTC. But don't say it's because your hands are tied and you have no choice. IMO, Unicode officers seems to have low patience for such sentiments. You might want to reconsider your tone. There is a time and place for sarcasm. I'll take my chances. I've been called out before for discouraging list members from requesting things that were out of scope according to the old rules. All I'm saying now is, if the old rules no longer apply, say so. -- Doug Ewell | Thornton, CO, USA | http://ewellic.org ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: About cultural/languages communities flags
Joan Montané joan at montane dot cat wrote: I don't request flag support for every flag in the world. I requested flags for culture/language communities *with* an approved TLD (Top Level Domain). Incidentally, about a year and a half ago I discussed this with another list member, on- and off-list. We agreed that some sort of text-based encoding of flags would be an interesting project, but disagreed as to whether this was a Unicode problem. The present discussion seems to approach the issue from the other side: treat it as *only* a Unicode problem, and assume that the encoding problem has been solved by TLD registration. See also http://www.unicode.org/faq/emoji_dingbats.html#12 . This is the Unicode Consortium talking, not me. -- Doug Ewell | Thornton, CO, USA | http://ewellic.org ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Unicode block for programming related symbols and codepoints?
OK, I will now try to answer all of you in one mail, otherwise it gets hard to overlook... Shervin Afshar: All of the requirements mentioned here can be (and are) implemented in higher levels of software (like IDEs). IMO, there isn't any need for adding new characters to Unicode to address these issues. But then it would be incompatible from IDE to IDE, like Python is incompatible using 2 spaces, 4 spaces and tabs. It's the data that is important, not the software. Additionally, people tend to forget that simply because Unicode is doing emoji out of compatibility (or other) requirements, it does not mean that now anything goes. I refer folks to TR51[1] (specifically sections 1.3, 8, and Annex C). [1]: http://www.unicode.org/reports/tr51 You know, the fact that this consortium ever took emoji into consideration immediately justifies to include everything everyone ever wanted. There is no such thing as important data including emoji. :) Jean-Francois Colson: I need a few tens of characters for a conlang I’m developping. ☺ Except two or three control characters don't make a con language. Also, if you don't like con languages in Unicode, what's this: http://unicode.org/charts/PDF/U1F700.pdf The problem is that Unicode only encodes characters which are effectively used today or which have been used in the past. It doesn’t encode characters which could perhaps be used in a hypothetical new programing language in the future. So you want the font encoding scheme to be a limitating factor for new things? Pierpaolo Bernardi: How would your proposed character be displayed as plain text? There is no such thing as plain text. Even line breaks and tabs are a matter of interpretation. It's just that they usually have typographic semantics, even in programming editors, with all the side effects. In very simple (and with that I mean shitty or not even remotely programming oriented) editors, it may show like a control character, like ␄. Browsers and any editor passing the based on scintilla complexity mark of course should display something that makes more sense, like an arrow or ⍈ plus surrounding space. Unicode is a standard for plain text. If you require a special IDE for your programming language then why use plain text at all? Because binary custom encoded databases or blob files are the death of interoperability. Konstantin Ritt: Easier than latin1, a layout one could find on [almost] every keyboard? Good luck. Also: Jean-Francois Colson: Hard to input? Not harder than the new symbols you’d like to propose. That’s only a matter of keyboard layout and input method. Indent by pressing tab and insert the literal thing by pressing . Nothing changes, the IDE/editor does the work on the fly. Just that you have clean semantics, interoperability and customizability. Beat that, APL. Where you would 10 key bindings or an annoying software keyboard. I’ve never used APL so I don’t remember the meanings of its symbols, but couldn’t ⍘ U+2358 APL FUNCTIONAL SYMBOL QUOTE UNDERBAR or ⍞ U+235E APL FUNCTIONAL SYMBOL QUOTE QUAD work as “string litteral quotes” in a new programming language? That's a good idea. That still leaves the indentation character, which is harder than that, because one would want a control character with certain semantics. E.G.: For programming editors it would make sense to only allow it after line breaks and convert other occurences into tabs. If the IDE inputs your new character when you press tab, then your new character is a tab… Not if it detects the beginning of a line. Best regards A. Z. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
RE: Unicode block for programming related symbols and codepoints?
I can't count: It can be argued — and was, repeatedly and persuasively — that the initial collection of emoji in Unicode 6.1 6.0 But the additional emoji added to Unicode 6.2 and 7.0 6.1 and 7.0 -- Doug Ewell | Thornton, CO, USA | http://ewellic.org ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Unicode block for programming related symbols and codepoints?
There is no longer any requirement that the robot faces and burritos appear first in any sort of industry character set extension, with which Unicode is then obliged to maintain compatibility. Only if you don't consider existing usage and popular requests as requirement and precedence; for example Gmail had Robot Face for a long time. ↪ Shervin On Mon, Feb 9, 2015 at 11:25 AM, Doug Ewell d...@ewellic.org wrote: Frédéric Grosshans frederic dot grosshans at gmail dot com wrote: The including of emoji was a considerable debate here, with people strongly against and strongly for. The trick is that they were already used as digital characters by Japanese Telcos and their millions of customers. They were de facto encoded as characters in Japanese text messages. At the time of encoding, the spread of smartphones made them appear in other places (emails, web forums, etc.) Sorry, I can't let the compatibility argument go unchallenged again. It can be argued — and was, repeatedly and persuasively — that the initial collection of emoji in Unicode 6.1 [1] were added for compatibility with Japanese telco extensions to JIS. But the additional emoji added to Unicode 6.2 and 7.0, and planned for 8.0, do not have even this provenance; they were added on foot of novel proposals sent directly to Unicode, or (more recently) by popular request. There is no longer any requirement that the robot faces and burritos appear first in any sort of industry character set extension, with which Unicode is then obliged to maintain compatibility. [1] No, I am not counting the ARIB symbols or any other long-encoded symbols that have been retroactively defined as emoji, to help legitimize the latter. Alfred Zett alfred underscore z at web dot de) replied: The trick is that one doesn't bargain with Telcos and similar criminals. Gotta drop them hard and the pest will go away from itself after five years or so. This does not help to make a case for or against encoding of anything. -- Doug Ewell | Thornton, CO, USA | http://ewellic.org ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Unicode block for programming related symbols and codepoints?
On 9 Feb 2015, at 19:17, Ken Whistler kenwhist...@att.net wrote: ... The use in C of = and == was badly designed from the start, and is the source of bezillions of inadvertent programming errors in practice. It is the ample oversupply of implicit conversions in combination with the lack of a proper boolean type that is causing those programming errors. But if a left arrow, for example, might be a better choice for an assignment operator in a programming language, and a two-character ASCII operator like := or - doesn't seem appropriate or causes other confusion, there still isn't a character *encoding* issue here. Just use ←, which already exists (U+2190), and is a fine left arrow! There are also ≔ COLON EQUALS U+2254 and others. No problems using such characters in Flex: The problem is the lack of input methods. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
RE: About cultural/languages communities flags
And just another follow-up, to try to explain *why* the mechanism for Regional Indicator Codes might be so closely tied to ISO 3166-1 alpha-2 code elements: ISO 3166-1 codes are derived from code elements published by the United Nations Statistics Division. This is the group that ultimately decides what is and isn't a country for the purposes of these codes. While there is inevitably some political influence in the UN, many organizations and projects that use ISO 3166-1 codes do so to avoid getting embroiled in their own debate over what is a country. The IETF language-tagging project (BCP 47, RFC 5646; see IETF language tag in Wikipedia for more information) is one example. Conversely, it is sometimes the case that groups which seek to extend the set of ISO 3166-1 codes unilaterally, or to establish a competing or supplemental coding system, might do so in order to gain acceptance or establish credibility for a nation or territory that is not recognized as such by UNSD. It is entirely reasonable (IMHO) to suggest that if Unicode were to attempt, by whatever means, to enable encoding of flags for entities beyond those encoded in ISO 3166-1, that the door would be opened wide for unrecognized nations and separatist groups to claim that the Unicode Consortium supports their cause by supporting display of their flag. It's very possible that Unicode has thought of this and does not want to put itself in that position. -- Doug Ewell | Thornton, CO, USA | http://ewellic.org ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Emoji (was: Re: Unicode block for programming related symbols and codepoints?)
It was either from the WG2 Principles and Procedures document, or some other bit of Unicode/10646 folklore that I've read over the past 22 years of keeping up with Unicode/10646. I should look up the exact wording. Yes, please. I would like to have that policy noted for my future use. Of course, Unicode can encode anything they please. That's not in question. But in order to claim compatibility as the basis for encoding something, these specific, rigid definitions and criteria have historically been required. Compatibility with any random JPEG or meme that makes the rounds on the Internet was not enough. It's not about encoding what they please. Compatibility was the issue with the first set of emoji symbols. The rest of symbols are being added for various other reasons; e.g. diversity, parity, requests, etc. Also, random JPEG and meme don't apply here and you're mistaken to assume that GChat and Facebook fit in this category. Great. Go ahead and encode them, UTC. But don't say it's because your hands are tied and you have no choice. Quoting an official UTC communication? I'll take my chances. I've been called out before for discouraging list members from requesting things that were out of scope according to the old rules. All I'm saying now is, if the old rules no longer apply, say so. AFAIK, rules haven't changed. Unicode didn't have a policy regarding emoji and symbols with similar usage. Now it does. For a longer while now, some folks tend to use emoji as means to an end other than what is in the scope of conversation regarding emoji. And that is not acceptable. ↪ Shervin On Mon, Feb 9, 2015 at 2:17 PM, Doug Ewell d...@ewellic.org wrote: Shervin Afshar shervinafshar at gmail dot com wrote: The issue is with your very rigid interpretation of the criteria for encoding new symbols. Is appearing in an industry character set extension an official phrasing that you keep referring to? It was either from the WG2 Principles and Procedures document, or some other bit of Unicode/10646 folklore that I've read over the past 22 years of keeping up with Unicode/10646. I should look up the exact wording. Of course, Unicode can encode anything they please. That's not in question. But in order to claim compatibility as the basis for encoding something, these specific, rigid definitions and criteria have historically been required. Compatibility with any random JPEG or meme that makes the rounds on the Internet was not enough. Robot Face is available on Gmail (GChat), Facebook, and Twitch among others (calculating the size of user community is left as an assignment for the reader). That's enough usage for consideration by the UTC even if the symbol is not present in a character encoding standard. Also, since Unicode is an industry standard maintained by industry members (among others), then if there is enough request to these corporations from communities of users, then there might be some reason for considering those symbols. I think that's the case for the newer symbols. Great. Go ahead and encode them, UTC. But don't say it's because your hands are tied and you have no choice. IMO, Unicode officers seems to have low patience for such sentiments. You might want to reconsider your tone. There is a time and place for sarcasm. I'll take my chances. I've been called out before for discouraging list members from requesting things that were out of scope according to the old rules. All I'm saying now is, if the old rules no longer apply, say so. -- Doug Ewell | Thornton, CO, USA | http://ewellic.org ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: About cultural/languages communities flags
To follow up on Doug Ewell's response, the mechanism currently standardized in the Unicode Standard for regional indicator codes has an interpretation tied to the two-letter codes of ISO 3166-1, and *not* to TLD's. The two are not directly connected. If anyone really wants to pursue getting a Scots flag into general implementation via Unicode regional indicator codes, the correct way to make that happen is for somebody to get off their duff and convince the BSI (British Standards Institute) to put in for an exceptional reservation of a two-letter code for Scotland in ISO 3166-1 by petitioning the ISO 3166/MA. See: http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 for the full context, and for the current 26x26 letter matrix which is the basis for the flag glyph implementations of regional indicator code pairs on smartphones. SC, SO, ST are already taken, but might I suggest putting in for registering AB for Alba? That one is currently unassigned. Yeah, yeah, what is the likelihood of BSI pushing for a Scots two-letter code?! But seriously, if folks are planning ahead for Scots independence or even some kind of greater autonomy, this is an issue that needs to be worked, anyway. In the meantime, let me reiterate that there is *no* formal relationship between TLD's and the regional indicator codes in Unicode (or the implementations built upon them). Well, yes, a bunch of registered TLD's do match the country codes, but there is no two-letter constraint on TLD's. This should already be apparent, as Scotland has registered .scot At this point there isn't even a limitation of TLD's to ASCII letters, so there is no way to map them to the limited set of regional indicator codes in the Unicode Standard. Not having a two letter country code for Scotland that matches the four letter TLD for Scotland might indeed be a problem for someone, but I don't see *this* as a problem that the Unicode Standard needs to solve. --Ken On 2/9/2015 2:38 PM, Doug Ewell wrote: Joan Montané joan at montane dot cat wrote: I don't request flag support for every flag in the world. I requested flags for culture/language communities *with* an approved TLD (Top Level Domain). Incidentally, about a year and a half ago I discussed this with another list member, on- and off-list. We agreed that some sort of text-based encoding of flags would be an interesting project, but disagreed as to whether this was a Unicode problem. The present discussion seems to approach the issue from the other side: treat it as *only* a Unicode problem, and assume that the encoding problem has been solved by TLD registration. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Unicode block for programming related symbols and codepoints?
Frédéric Grosshans: Le 09/02/2015 13:55, Alfred Zett a écrit : Additionally, people tend to forget that simply because Unicode is doing emoji out of compatibility (or other) requirements, it does not mean that now anything goes. I refer folks to TR51[1] (specifically sections 1.3, 8, and Annex C). [1]: http://www.unicode.org/reports/tr51 You know, the fact that this consortium ever took emoji into consideration immediately justifies to include everything everyone ever wanted. There is no such thing as important data including emoji. :) The including of emoji was a considerable debate here, with people strongly against and strongly for. The trick is that they were already used as digital characters by Japanese Telcos and their millions of customers. They were de facto encoded as characters in Japanese text messages. At the time of encoding, the spread of smartphones made them appear in other places (emails, web forums, etc.) The trick is that one doesn't bargain with Telcos and similar criminals. Gotta drop them hard and the pest will go away from itself after five years or so. Jean-Francois Colson: I need a few tens of characters for a conlang I’m developping. ☺ Except two or three control characters don't make a con language. Also, if you don't like con languages in Unicode, what's this: http://unicode.org/charts/PDF/U1F700.pdf I doubt that “not liking con languages” is a faithful description of Jean-François ;-) On a more serious notes, this block is actually a set of “scientific” (at his time) notations used by Isaac Newton in its time. They were encoded in Unicode following an academic project to digitize his manuscripts. So here, you have characters used 3 centuries ago by no less than Isaac Newton, most of them having a much longer history, and useful for science historians. See http://www.unicode.org/L2/L2009/09037r2-alchemy.pdf for details. That's actually interesting. Good to know, thanks. I think everyone her knows what you are saying, and that the notion of plain text is a bit fuzzy. But if you cannot argue that your character has a meaning in plaint text, for some value of “plain text”, then you can not hope for an encoding in Unicode. OK, in this case I agree it makes little sense to hope for such characters. Best regards, A. Z. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: About cultural/languages communities flags
Using flags to indicate particular languages on websites has plenty of problems - languages need a better indicator. Scripts could be indicated by a representative glyph. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode