ISO committees (from Re: Tag characters and localizable sentence technology (from Tag characters))
In my post of 22 May 2015, reproduced below, is the following. ... and then the plain text encoding of a particular localizable sentence would be defined as being expressed as the LOCALIZABLE SENTENCE BASE CHARACTER character followed by the code for the localizable sentence specified in the ISO [number] document, the code being expressed using tag characters. As there has been discussion of ISO committees in this mailing list recently and it is clear that there are a number of people involved with ISO on this mailing list who have expert knowledge of the structures and rules of ISO committees, I write to ask advice. Regarding my idea that localizable sentence technology could be implemented in Unicode by reference to detailed codes in an ISO document (not yet written), which would be the best ISO committee to become in charge of producing that document please? William Overington 12 June 2015 Original message From : wjgo_10...@btinternet.com Date : 22/05/2015 - 12:01 (GMTST) To : unicode@unicode.org Subject : Tag characters and localizable sentence technology (from Tag characters) Tag characters and localizable sentence technology (from Tag characters) I refer to the following documents, the first about localizable sentences and the second about, amongst other matters, applying tag characters using a new encoding format. http://www.unicode.org/L2/L2013/13079-loc-sentance.pdf http://www.unicode.org/L2/L2015/15145r-add-regional-ind.pdf Starting from the idea of the markup bubble from the first document and applying the tag method and the ISO standard document method from the second document, there arises the following possibility for the future for localizable sentence technology. A single character would be added into Unicode, the name of the character being LOCALIZABLE SENTENCE BASE CHARACTER and then the plain text encoding of a particular localizable sentence would be defined as being expressed as the LOCALIZABLE SENTENCE BASE CHARACTER character followed by the code for the localizable sentence specified in the ISO [number] document, the code being expressed using tag characters. Please find attached a design for the glyph for the LOCALIZABLE SENTENCE BASE CHARACTER character. I designed the glyph by adapting and then combining the designs for localizable sentence markup bubble brackets from the first of the two documents referenced earlier in this text. Each localizable sentence, carefully written so as to avoid in use any reliance as to meaning on any sentence previously used in the same document, would have a meaning expressed in words and possibly also have a glyph: more commonly used localizable sentences each having a glyph yet not all other localizable sentences necessarily having a glyph, though some could have a glyph, as desired. William Overington 22 May 2015
Re: Tag characters and in-line graphics (from Tag characters)
Mark E. Shoulson mark at kli dot org wrote: Isn't this what webfonts are all about? You specify a font in the stylesheet, give it a URL, and your browser goes and downloads it and displays the text in it. That's great if you have a stylesheet, a URL, and a browser. HTML is fancy text, and pretty much implies some sort of online connection. I thought we were talking about plain text, and apologize if we weren't or if that important detail was not clear. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters and in-line graphics (from Tag characters)
2015-06-07 18:39 GMT+02:00 Doug Ewell d...@ewellic.org: Mark E. Shoulson mark at kli dot org wrote: Isn't this what webfonts are all about? You specify a font in the stylesheet, give it a URL, and your browser goes and downloads it and displays the text in it. That's great if you have a stylesheet, a URL, and a browser. HTML is fancy text, and pretty much implies some sort of online connection. Everything in HTML is embeddable in a standalone document, including graphics. HTML does not imply any online connection. HTML is independant of HTTP or other transports.
Re: Tag characters and in-line graphics (from Tag characters)
On 6/4/2015 17:03 , Chris wrote: This whole discussion is about the fact that it would be technically possible to have private character sets and private agreements that your OS downloads without the user being aware of it. The sticky issues are not the questions of how to make available fonts or images for use by the OS. Instead, they concern the fact that any such a model violates some pretty basic guarantees of plain text that the entire net infrastructure relies on. There are very obvious security issues. The start with tracking; every time you access a custom code point, that fact potentially results in a trackable interaction. This problem affects even the sticker solution that people are hoping for for emoji. (On my system, no external resources are displayed when I first open any message, and there is a reason for that). Beyond tracking, and beyond stickers (that is pictures that look like pictures) a generalized custom character set would allow text that is no longer really stable. You would be able to deliver identical e-mails to people that display differently, because when you serve the custom fonts, you would be able to customize what you deliver under the same custom character set designator. While this would be a wonderful way to circumvent censorship (other than the man in the middle version), you would likewise seriously undermine the ability to filter unwanted or undesirable texts, because the custom character set engine might recognize when a request comes from a filter and not the end user. (Just the other day, I came across a hacked website that responded differently to search engined than to live users, making the hack effective for one and invisible to the other. Custom character sets would seem to just add to the hackers' arsenal here). Finally, custom character sets sound like a great idea when thinking of an extension of an existing character set. But that's not where the issues are. The issues come in when you use the same technology to provide aliases for existing code points or for other custom characters. Aliasing undermines the ability to do search (or any other content-focused processing, from sorting to spell-check). At that point, the circle closes. When Unicode was created, the alternative then was ISO 2022, which was a standard that addressed the issue of how to switch among (albeit pre-defined) character sets to achieve, in principle, coverage equal to the union of these character sets. Unicode was created to address two main deficiencies of that situation. Unification addressed the aliasing issue, so that code points were no longer opaque but could be interpreted by software (other than display), which was the second big drawback of the patchwork of character sets. A processing model for opaque code points is possible to define, but it isn't very practical and in the late eighties people had had enough were glad to be quit of it. Seen from this perspective, the discussion about custom character sets presents itself as a giant step backward, undermining the very advances that underlie the rapid acceptance and spread of Unicode. A./
Re: Tag characters and in-line graphics (from Tag characters)
On 2015/06/04 17:03, Chris wrote: I wish Steve Jobs was here to give this lecture. Well, if Steve Jobs were still around, he could think about whether (and how many) users really want their private characters, and whether it was worth the time to have his engineers working on the solution. I'm not sure he would come to the same conclusion as you. This whole discussion is about the fact that it would be technically possible to have private character sets and private agreements that your OS downloads without the user being aware of it. Now if the unicode consortium were to decide on standardising a technological process whereby rendering engines could seamlessly download representations of custom characters without user intervention, no doubt all the vendors would support it, and all the technical mumbo jumbo of installing privately agreed character sets would be something users could leave for the technology to sort out. You are right that it would be strictly technically possible. Not only that, it has been so for 10 or 20 years. As an example, in 1996 at the WWW Conference in Paris I was participating in a workshop on internationalization for the Web, and by chance I was sitting between the participant from Adobe and the participant from Microsoft. These were the main companies working on font technology at that time, and I asked them how small it would be possible to make a font for a single character using their technologies (the purpose of such a font, as people on this thread should be able to guess, would be as part of a solution to exchange single, user-defined characters). I don't even remember their answers. The important thing here that the idea, and the technology, have been around for a long time. So why didn't it take on? Maybe the demand is just not as big as some contributors on this list claim. Also, maybe while the technology itself isn't rocket science, the responsible people at the relevant companies have enough experience with technology deployment to hold back. To give an example of why the deployment aspect is important, there were various Web-like hypertext technologies around when the Web took off in the 1990. One of them was called HyperG. It was technologically 'better' than the Web, in that it avoided broken links. But it was much more difficult to deploy, and so it is forgotten, whereas the Web took off. Regards, Martin.
Re: Tag characters and in-line graphics (from Tag characters)
Asmus Freytag wrote about security issues. This is interesting reading and I have learned a lot from the post about various security issues. Whilst the post is in this thread and follows from a post in this thread, the topic has seemed to moved to the Custom characters thread. I note that what you write about seems to me that it would not apply to my suggestion in my original post: is that correct? http://www.unicode.org/mail-arch/unicode-ml/y2015-m05/0218.html Also the following two posts. http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0009.html http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0027.html Whilst the ideas raised by Chris are interesting, they do seem to be distinctly different from what I suggested. So, for clarity, do you regard my suggested format as having any security issues, and if so, what please? I know that some people have opined that my suggested format is out of scope for Unicode, yet the scope of Unicode is what the Unicode Technical Committee decides is the scope of Unicode, and my suggested format does provide a way to include custom glyphs within a Unicode plain text document by using the new base character followed by tag characters method. William Overington 5 June 2015
Re: Tag characters and in-line graphics (from Tag characters)
I wrote, crumpled up, and threw away about three different responses. I thought about ISO 2022 and about accessing the web for every PUA character, as Asmus mentioned, and about the size of the user base, as Martin mentioned. I thought about character properties and about ephemerality. I didn't think of the spoofing implications that Asmus described, which would affect both the automatic PUA font download and the inline drawing language. Either of these could be used to spell out, let's say, paypal.com rather convincingly and with minimal effort. I might have more experience with the PUA than many list members, having transcribed the 27,000-word Alice's Adventures in Wonderland into my constructed alphabet two years ago, in a PUA encoding, so that Michael Everson could publish it in book form. One of the many learning experiences of this project was finding out which software tools play nicely with the PUA and which don't. Some tools just worked while others would not give acceptable results with any amount of effort. At no point, however, did I suppose that a font with my alphabet, or any of the jillions of others that have been invented during a boring day in class (see Omniglot for tons of examples), should be silently downloaded to a user's computer, consuming bandwidth and disk space, without her knowledge. That's practically malware. Maybe I'm just not enough of a Distinguished Visionary to understand how insanely great this would be (unfortunately, celebrity name-dropping doesn't work with me). Unicode has stated consistently for at least 23 years that it would not ever standardize PUA usage, and over the years some UTC members have used terms like strongly discouraged and not interoperable even in the presence of an agreement. Given this, and given that no system I'm aware of magically downloads fonts for *regularly encoded characters* (I still have no font for Arabic math symbols), I personally would not expect Unicode to perform a 180 on this. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters and in-line graphics (from Tag characters)
No, that's why you include a reference to the font in the private agreement, so that interested parties can install it and see the special character(s). People with their iphones and ipads and so forth don’t want to have “private agreements”, they don’t want to “install character sets”. The want it to “just work”. I wish Steve Jobs was here to give this lecture. I highly doubt actually that it is even possible to install a private character set font on an iphone such that it would be available to all applications. This whole discussion is about the fact that it would be technically possible to have private character sets and private agreements that your OS downloads without the user being aware of it. Now if the unicode consortium were to decide on standardising a technological process whereby rendering engines could seamlessly download representations of custom characters without user intervention, no doubt all the vendors would support it, and all the technical mumbo jumbo of installing privately agreed character sets would be something users could leave for the technology to sort out.
Re: Tag characters and in-line graphics (from Tag characters)
On 4 Jun 2015, at 10:59 am, David Starner prosfil...@gmail.com wrote: On Wed, Jun 3, 2015 at 5:46 PM Chris idou...@gmail.com mailto:idou...@gmail.com wrote: I personally think emoji should have one, single definitive representation for this exact reason. Then you want an image. I don't see what's hard about that. I already explained why an image and/or HTML5 is not a character. I’ll repeat again. And the world of characters is not limited to emoji. 1. HTML5 doesn’t separate one particular representation (font, size, etc) from the actual meaning of the character. So you can’t paste it somewhere and expect to increase its point size or change its font. 2. It’s highly inefficient in space to drop multi-kilobyte strings into a document to represent one character. 3. The entire design of HTML has nothing to do with characters. So there is no way to process a string of characters interspersed with HTML elements and know which of those elements are a “character”. This makes programatic manipulation impossible, and means most computer applications simply will not allow HTML in scenarios where they expect a list of “characters”. 4. There is no way to compare 2 HTML elements and know they are talking about the same character. I could put some HTML representation of a character in my document, you could put a different one in, and there would absolutely no way to know that they are the same character. Even if we are in the same community and agree on the existence of this character. 5. Similarly, there is no way to search or index html elements. If a HTML document contained an image of a particular custom character, there would be no way to ask google or whatever to find all the documents with that character. Different documents would represent it differently. HTML is a rendering technology. It makes things LOOK a particular way, without actually ENCODING anything about it. The only part of of HTML that is actually searchable in a deterministic fashion is the part that is encoded - the unicode part. The community interested in tony the tiger can make decisions like that. That is a hell of a handwave. In practice, you've got a complex decision that's always going to be a bit controversial, and one a decision that most communities won't bother trying to make. Apparently the world makes decisions all the time without meeting in committee. Strange but true. It’s called making a decision. Facebook have created a lot of emoji characters without consulting any committee and it seems to work fine, albeit restricted to the facebook universe because of a lack of a standard. You can’t know because they’re images. You can't know because the only obvious equivalence relation is exact image identity. Because… there is no standard!! If facebook wants to define 2 emoji images, maybe one is bigger than the other, and yet basically the same, to mean the same thing, then that would be their choice. Since I expect they have a lot of smart people working there, I expect it would work rather well. Just like Microsoft issues courier fonts in different point sizes and we all feel they have made that work fairly well. You seem to be arguing the nonsense position that if someone for example, made a snowflake glyph slightly different to the unicode official one, that it is wrong. That of course is nonsense. People can make sensible decisions about this without the unicode committee. You can’t iterate over compressed bits. You can’t process them. Why not? In any language I know of that has iterators, there would be no problem writing one that iterates over compressed input. If you need to mutate them, that is hard in compressed formats, but a new CPU can store War in Peace in the on-CPU cache. You can’t do it because no standard library, programming language, or operating system is set up to iterate over characters of compressed data. So if you want to shift compressed bits around in your app, it will take an awful lot of work, and the bits won’t be recognised by anyone else. Now if someone wants to define the next version of unicode to be a compressed format, and every platform supports that with standard libraries, computer languages etc, then fine that could work. Yet again I point out, lots of things MIGHT be possible in the real world IF that is how a standard is formulated. But all the chatter about this or that technology is pie in the sky without that standard.
Re: Tag characters and in-line graphics (from Tag characters)
On 3 Jun 2015, at 11:24 pm, David Starner prosfil...@gmail.com wrote: Chris wrote: There is no way to compare 2 HTML elements and know they are talking about the same character That's because character identity is a hard problem. Is the emoji TIGER the same as TONY THE TIGER or as TONY THE TIGER GIVING THE VICTORY SIGN? I personally think emoji should have one, single definitive representation for this exact reason. The subtley of different emotion between one happy face and another can be miles apart. Emoji are a little different to other symbols in that respect. Symbols that are purely symbolic can be changed as much as you like as long as they are recognisable. Emoji have too many shades of meaning for allowing change. Both of these scenarios are an argument that there should be custom characters with at least one official representation. Emoji because you don’t really want variation. Symbols because if you don’t have a local representation, then something is better than nothing. If you don’t have a local Snow Flake for example, any old snow flake will be fine. This is not a hard problem at all. Is one tony the tiger the same as another? The community interested in tony the tiger can make decisions like that. But having made that decision there needs to be a way for generic computer programs that don’t know about that community to do reasonable things with tony the tiger characters. You can index links to images. If two documents represent it differently, then I go back to the above; we can't know that they're the same thing. You can’t know because they’re images. That’s my exact point. Anybody talking about HTML5 and images as a solution to custom characters is not proposing a valid solution. On Tue, Jun 2, 2015 at 7:11 PM Chris idou...@gmail.com mailto:idou...@gmail.com wrote: You can’t ask the entire computing universe to compress everything all the time. Anytime we care about how much space text takes up, it should be compressed. It compresses very well. On the other hand, it's rare that anyone cares anymore; what's a few hundred kilobytes between friends? You compress things when they are on the move. Between computers and as you are writing it to a file. But you can’t compress generically while it is in memory. You can’t iterate over compressed bits. You can’t process them.
Re: Tag characters and in-line graphics (from Tag characters)
So what you’re saying is that the current situation where you see an empty square □ for unknown characters is better than seeing something useful? — Chris On Thu, Jun 4, 2015 at 12:59 AM, Doug Ewell d...@ewellic.org wrote: Chris idou747 at gmail dot com wrote: Right now, what happens if you have a domain or locale requirement for a special character? That's what the PUA is for. Assign a PUA code point to your special character, create a font which implements the PUA character, create a brief private agreement which states that this code point refers to that character and which mentions the font, put the private agreement on the web, and publish your document with a reference to the agreement. For most non-professionals, creating the font is the tricky part. Also see Section 23.5 of TUS. Note that I am disagreeing with Martin about the PUA being useful only as a scratch area for standardization. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters and in-line graphics (from Tag characters)
On Wed, Jun 3, 2015 at 5:46 PM Chris idou...@gmail.com wrote: I personally think emoji should have one, single definitive representation for this exact reason. Then you want an image. I don't see what's hard about that. The community interested in tony the tiger can make decisions like that. That is a hell of a handwave. In practice, you've got a complex decision that's always going to be a bit controversial, and one a decision that most communities won't bother trying to make. You can’t know because they’re images. You can't know because the only obvious equivalence relation is exact image identity. You can’t iterate over compressed bits. You can’t process them. Why not? In any language I know of that has iterators, there would be no problem writing one that iterates over compressed input. If you need to mutate them, that is hard in compressed formats, but a new CPU can store War in Peace in the on-CPU cache.
Re: Tag characters and in-line graphics (from Tag characters)
Compression is even more important today on mobile networks: mobile apps are very verbose over the net, and you can easily pay the extra volume. In addition, mobile networks are frequently much slower than what they are advertized, even if you pay the extra subscription to get 3G/4G, you depend on antennas and the number of peoples around you. In my home, 3G/4G in faact does not work at all, and this is the case in many places around in my city, even though they are sold to have full coverage (for example, just downloading an application or updating it is simply impossible: I have to be at home connected to my Wifi router, but when its internet link fails (this happens sometimes for several hours, I have extremely slow connections on 3G/4G (which is also overcrowded at the same time, and only delivers 2G speeds). Lot of people have to support frequently low bandwidths on mobile networks, independantly of the price they paid for their subscription. So compressing data is stil lextremely important (even for texts or for the smallest web requests). Thanks, compression is now part of the web transport, but this does not mean that apps must learn to represent their interchanged data efficiently, and develop less verbose protocols and APIs). There are more and more people using mobile networks now than fixed landline internet accesses (or home wifi routers connected to it, and even for them, fiber access is still jsut for a minority of people in dense areas, the others don't get more than an handful of mebgatit/s on their DSL access: if you look at worldwide internet connections a large majority of people don't get more than 2 megabit/s: this is enough for reading/sending SMS or phone calls, or exchanging emails, but not if you need frequent updates to your apps and your apps are too verbose and there are too many apps in the background: many people cannot view videos on their mobile access, or only with very poor quality if they view it live (they cannot also download them slowly due to lack of storage space on their mobile device, so videos have to remain short in total volume and duration). So I disagree: compression is absolutely needed (even more today than iut was in the past when mobile Internet accesses were still for a minority. Mobile networks are not really faster today (their bandwidth does not double every three year like local performances of devices ! But with this extra local performance, you can support more complex compression schemes that require more CPU/GPU power which is no longer a bottleneck, when the real bottleneck is the effectively available bandwidth of the mobile network (smaller than the connection bandwidth because this bandwidth is shared... and expensive). 2015-06-03 15:24 GMT+02:00 David Starner prosfil...@gmail.com: Chris wrote: There is no way to compare 2 HTML elements and know they are talking about the same character That's because character identity is a hard problem. Is the emoji TIGER the same as TONY THE TIGER or as TONY THE TIGER GIVING THE VICTORY SIGN? http://www.engadget.com/2014/04/30/you-may-be-accidentally-sending-friends-a-hairy-heart-emoji/ Note that even in Unicode, the set ẛ ᷥ ſ ṡ s S Ŝ may be considered the same character or up to seven different characters, depending on case-folding, canonization and accent dropping. Similarly, there is no way to search or index html elements. If a HTML document contained an image of a particular custom character, there would be no way to ask google or whatever to find all the documents with that character. Different documents would represent it differently. You can index links to images. If two documents represent it differently, then I go back to the above; we can't know that they're the same thing. On Tue, Jun 2, 2015 at 7:11 PM Chris idou...@gmail.com wrote: You can’t ask the entire computing universe to compress everything all the time. Anytime we care about how much space text takes up, it should be compressed. It compresses very well. On the other hand, it's rare that anyone cares anymore; what's a few hundred kilobytes between friends?
Re: Tag characters and in-line graphics (from Tag characters)
Earlier in this thread, on 2 June 2015, I wrote as follows: A mechanism to be able to use the method to define a glyph linked to a Unicode code point would be a useful facility to add for use in a situation where the glyph is for a regular Unicode character. I have now thought of a mechanism to use. Please imagine the base character followed by a sequence of tag characters, the tag characters here represented by ordinary letters and digits. Here is an example of the mechanism for defining the glyph for U+E702 in a particular document as 7 red pixels. HE702U7r The tag H character switches to hexadecimal input mode, then there are as many tag characters as necessary to express in hexadecimal notation the code point of the character for which the definition is being made, then there is a tag U character to action the definition and go out of hexadecimal input mode. The tag 7r is to express 7 red pixels. In practice the number of tag characters after the tag U character might be around 200, the above tag 7r is just a minimal example so as to explain the concept. While posting, may I mention please one other matter? Previously I mentioned using tag R, tag G and tag B is defining colours. I now add tag A into that defining colour so as to define opacity, that is what is sometimes called transparency, yet 0 means totally transparent and 255 means totally opaque. If no value is stated for A then it should be presumed to have a value of 255, so that the default situation is to define opaque colours. I feel that the information in this thread is now a good basis for the assessment of this suggested format as to whether it could be a useful open source system with good interoperability potential that could usefully be submitted to the Unicode Technical Committee. William Overington 3 June 2015
Re: Tag characters and in-line graphics (from Tag characters)
Chris wrote: There is no way to compare 2 HTML elements and know they are talking about the same character That's because character identity is a hard problem. Is the emoji TIGER the same as TONY THE TIGER or as TONY THE TIGER GIVING THE VICTORY SIGN? http://www.engadget.com/2014/04/30/you-may-be-accidentally-sending-friends-a-hairy-heart-emoji/ Note that even in Unicode, the set ẛ ᷥ ſ ṡ s S Ŝ may be considered the same character or up to seven different characters, depending on case-folding, canonization and accent dropping. Similarly, there is no way to search or index html elements. If a HTML document contained an image of a particular custom character, there would be no way to ask google or whatever to find all the documents with that character. Different documents would represent it differently. You can index links to images. If two documents represent it differently, then I go back to the above; we can't know that they're the same thing. On Tue, Jun 2, 2015 at 7:11 PM Chris idou...@gmail.com wrote: You can’t ask the entire computing universe to compress everything all the time. Anytime we care about how much space text takes up, it should be compressed. It compresses very well. On the other hand, it's rare that anyone cares anymore; what's a few hundred kilobytes between friends?
Re: Tag characters and in-line graphics (from Tag characters)
Chris idou747 at gmail dot com wrote: Right now, what happens if you have a domain or locale requirement for a special character? That's what the PUA is for. Assign a PUA code point to your special character, create a font which implements the PUA character, create a brief private agreement which states that this code point refers to that character and which mentions the font, put the private agreement on the web, and publish your document with a reference to the agreement. For most non-professionals, creating the font is the tricky part. Also see Section 23.5 of TUS. Note that I am disagreeing with Martin about the PUA being useful only as a scratch area for standardization. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters and in-line graphics (from Tag characters)
2015-06-04 2:59 GMT+02:00 David Starner prosfil...@gmail.com: You can’t iterate over compressed bits. You can’t process them. Why not? In any language I know of that has iterators, there would be no problem writing one that iterates over compressed input. If you need to mutate them, that is hard in compressed formats, but a new CPU can store War in Peace in the on-CPU cache. You're right, today the CPU is no longer the bottleneck, which is now * the speed of long buses and communcaition links, with their limited (and costly) bandwidth as this is a shared medium used by more and more people but requiring mssive infrastures, or physical constraints even on the fastest serial buses, both implying transmission roundtrip times (limiting random access, which is a severe problem now that we have to access to extremely large volumes of data distributed over multiple devices or over a full network * the storage capacity for the fastest storage medium (such as flash memory, which is the only option for mobile devices, but also the most expensive). In both cases you need compression (the second bottleneck on storage volumes will fade out in a few years, but not the bandwidth constraints). It really pays now to use compression schemes (even the most complex ones such as those used to transmit live video: locally a CPU or GPU will easily handle the compression scheme. Researches on compression schemes is really not ended, it has never been so much active as it is today, including for text because of the explosion of the data volumes, even if now the volume of text is largely overwhelmed by the volume of images, videos and audio (but you can't compute a lot of things from audio/image/video data sources, we still need text for giving semantics to these medias from which you can derive data or perform searches (there is still a lot to do for handling images and audio speech and detect some semantics in them, but you won't get as much info from an audio/video than what can be represented by text: OCR for example is a very heuristic process with lots of false guesses produced, still much more than humain brains can process within a broad ranges of variations that we call cultures; computers are still very poor in recognizing cultures with as many variations as those we recognize through social interactions and years of education and *personal* experience).
Re: Tag characters and in-line graphics (from Tag characters)
Chris John idou747 at gmail dot com wrote: So what you’re saying is that the current situation where you see an empty square □ for unknown characters is better than seeing something useful? No, that's why you include a reference to the font in the private agreement, so that interested parties can install it and see the special character(s). -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters and in-line graphics (from Tag characters)
Once again no ! Unicode is a standard for encoding characters, not for encoding some syntaxic element of a glyph definition ! Your project is out of scope. You still want to reinvent the wheel. For creating syntax, define it within a language (which does not need new characters (you're not creating an APL grammar using specific symbols for some operators more or less based on Greek letters and geometric shapes: they are just like mathematic symbols). Programming languages and data languages (Javascript, XML, JOSN, HTML...) and their syntax are encoded themselves in plain text documents using standard characters) and don't need new characters, APL being an exception only because computers or keyboards were produced to facilitate the input (those that don't have such keyboards used specific editors or the APL runtime envitonment that offer an input method for entering programs in this APL input mode). Anf again you want the chicken before the egg: have you only ever read the encoding policy ? The UCS will not encode characters without a demonstrated usage. Nothing in what you propose is really used except being proposed only by you, and used only by you for your private use (or with a few of your unknown friends, but this is invisible and unverifiable). Nothing has been published. Even for currency symbols (which are an exception to the demonstrated use, only because once they are created they are extremely rapidly needed by lot of people, in fact most people of a region as large as a country, and many other countries that will reference or use it it). But even in this case, what is encoded is the character itself, not the glyph or new characters used to defined the glyph ! Can you stop proposing out of topic subjects like this on this list ? You are not speaking about Unicode or characters. Another list will be more appropriate. You help no one here because all you want is to change radically the goals of TUS. 2015-06-02 11:01 GMT+02:00 William_J_G Overington wjgo_10...@btinternet.com : Perhaps the solution to at least some of the various issues that have been discussed in this thread is to define a tag letter z as a code within the local glyph memory requests, as follows.
Re: Tag characters and in-line graphics (from Tag characters)
Perhaps the solution to at least some of the various issues that have been discussed in this thread is to define a tag letter z as a code within the local glyph memory requests, as follows. Local glyph memory, for use in compressing a document where the same glyph is used two or more times in the document: 3t7r means this is local glyph 3 being defined at its first use in the document as 7 red pixels 3h here local glyph 3 is being used 3z7r means this is local glyph 3 being defined, though not used, at the start of the document as 7 red pixels More than one local glyph could be defined at the start of the document, as desired. This would mean that use of such a glyph within the document would be by just using the quite short base character followed by tag characters sequence using the h request. This would enable document editing to be easier to accomplish. A mechanism to be able to use the method to define a glyph linked to a Unicode code point would be a useful facility to add for use in a situation where the glyph is for a regular Unicode character. May I mention something that I forgot to mention earlier please? When only one pixel of a particular colour is being specified, it can be specified using just the code for the colour. For example, for 1 red pixel please use r on its own, there is no need to use 1r though 1r should be made to work just in case anyone does use that format. There was a time when I used to use the FORTH programming language and this format of first inputting the number then the operator is based on the way that the FORTH programming language works. William Overington 2 June 2015 Original message From : wjgo_10...@btinternet.com Date : 27/05/2015 - 17:26 (GMTST) To : unicode@unicode.org Subject : Tag characters and in-line graphics (from Tag characters) Tag characters and in-line graphics (from Tag characters) This document suggests a way to use the method of a base character together with tag characters to produce a graphic. The approach is theoretical and has not, at this time, been tried in practice. The application in mind is to enable the graphic for an emoji character to be included within a plain text stream, though there will hopefully be other applications. The base character could be either an existing character, such as U+1F5BC FRAME WITH PICTURE, or a new character as decided. Tests could be carried out using a Private Use Area character as the base character. The explanation here is intended to explain the suggested technique by examples, as a basis for discussion. In each example, please consider for each example that the characters listed are each the tag version of the character used here and that they all as a group follow one base character. The examples are deliberately short so as to explain the idea. A real use example might have around two hundred or so tag characters following the base character, maybe more, sometimes fewer. Examples of displays: Each example is left to right along the line then lines down the page from upper to lower. 7r means 7 pixels red 7r5y means 7 pixels red then 5 pixels yellow 7r5y-3b means 7 pixels red then 5 pixels yellow then next line then 3 pixels blue Examples of colours available: k black n brown r red o orange y yellow g green (0, 255, 0) b blue m magenta e grey w white c cyan p pink d dark grey i light grey (thus avoiding using lowercase l so as to avoid confusion with figure 1) f deeper green (foliage colour) (0, 128, 0) Next line request: - moves to the next line Local palette requests: 192R224G64B2s means store as local palette colour 2 the colour (R=192, G=224, B=64) 7,2u means 7 pixels using local palette colour 2 Local glyph memory, for use in compressing a document where the same glyph is used two or more times in the document: 3t7r means this is local glyph 3 being defined at its first use in the document as 7 red pixels 3h here local glyph 3 is being used The above is for bitmaps. It would be possible to use a similar technique to specify a vector glyph as used in fontmaking using on-curve and off-curve points specified as X, Y coordinates together with N for on-curve and F for off-curve. There would need to be a few other commands so as to specify places in the tag character stream where definition of a contour starts and so as to separate the definitions of the glyphs for a colour font and so on. This could be made OpenType compatible so that a received glyph could be added into a font. Please feel free to suggest improvements. One improvement could be as to how to build a Unicode code point into a picture so that a font could be transmitted. William Overington 27 May 2015
Re: Tag characters and in-line graphics (from Tag characters)
On 2015/06/03 07:55, Chris wrote: As you point out, The UCS will not encode characters without a demonstrated usage.”. But there are use cases for characters that don’t meet UCS’s criteria for a world wide standard, but are necessary for more specific use cases, like specialised regional, business, or domain specific situations. Unicode contains *a lot* of characters for specialized regional, business, or domain specific situations. My question is, given that unicode can’t realistically (and doesn’t aim to) encode every possible symbol in the world, why shouldn’t there be an EXTENSIBLE method for encoding, so that people don’t have to totally rearchitect their computing universe because they want ONE non-standard character in their documents? As has been explained, there are technologies that allow you to do (more or less) that. Information technology, like many other technologies, works best when finding common cases used by many people. Let's look at some examples: Character encodings work best when they are used widely and uniformly. I don't know anybody who actually uses all the characters in Unicode (except the guys that work on the standard itself). So for each individual, a smaller set would be okay. And there were (and are) smaller sets, not for individuals, but for countries, regions, scripts, and so on. Originally (when memory was very limited), these legacy encodings were more efficient overall, but that's no longer the case. So everything is moving towards Unicode. Most Website creators don't use all the features in HTML5. So having different subsets for different use cases may seem to be convenient. But overall, it's much more efficient to have one Hypertext Markup Language, so that's were everybody is converging to. From your viewpoint, it looks like having something in between character encodings and HTML is what you want. It would only contain the features you need, and nothing more, and would work in all the places you wanted it to work. Asmus's inline text may be something similar. The problem is that such an intermediate technology only makes sense if it covers the needs of lots and lots of people. It would add a third technology level (between plain text and marked-up text), which would divert energy from the current two levels and make things more complicated. Up to now, such as third level hasn't emerged, among else because both existing technologies were good at absorbing the most important use cases from the middle. Unicode continues to encode whatever symbols that gain reasonable popularity, so every time somebody has a real good use case for the middle layer with a symbol that isn't yet in Unicode, that use case gets taken away. HTML (or Web technology in general) also worked to improve the situation, with technologies such as SVG and Web Fonts. No technology is perfect, and so there are still some gaps between character encoding and markup, some of which may in due time eventually be filled up, but I don't think a third layer in the middle will emerge soon. Regards, Martin.
Re: Tag characters and in-line graphics (from Tag characters)
I was asking why the glyphs for right arrow ➡ are inconsistent in many sources, through a couple of iterations of unicode. Perhaps I might observe that one of the reasons is there is no technical link between the code and the glyph. I can’t realistically write a display engine that goes to unicode.org http://unicode.org/ or wherever, and dynamically finds the right standard glyph for unknown codes. This is also manifest in my seeing empty squares □ for characters my platform doesn’t know about. This isn’t the case with XML where I can send someone a random XML document, and there is a standard way to go out there on the internet and check if that XML is conformant. Why shouldn’t there be a standard way to go out on the net and find the canonical glyph for a code? If there was, then non-standard glyphs would fall out of that technology naturally. So people are talking about all these technologies that are out there, html5, cmap, fonts and so forth, but there is no standard way to construct a list of “characters”, some of which might be non-standard, and be able to embed that ANYWHERE one might reasonably expect characters, have it processed in a normal way as characters, be sent anywhere and understood. As you point out, The UCS will not encode characters without a demonstrated usage.”. But there are use cases for characters that don’t meet UCS’s criteria for a world wide standard, but are necessary for more specific use cases, like specialised regional, business, or domain specific situations. My question is, given that unicode can’t realistically (and doesn’t aim to) encode every possible symbol in the world, why shouldn’t there be an EXTENSIBLE method for encoding, so that people don’t have to totally rearchitect their computing universe because they want ONE non-standard character in their documents? Right now, what happens if you have a domain or locale requirement for a special character? Most likely you suffer without it, because even though you could get it to render in some situations (like hand coding some IMGs into your web site), you just know you won’t be able to realistically input it into emails, word documents, spreadsheets, and whatever other random applications on a daily basis. What I’m saying is it really beyond the unicode consortium’s scope, and/or would it really be a redundant technology to, for example, define a UTF-64 coding format, where 32 bits allow 4 billion businesses and individuals to define their own characters sets (each of up to 4 billion characters), then have standard places on the internet (similar to DNS lookup servers) that can provide anyone with glyphs and fonts for it? Right now, yes there are cmaps, but no standard way to combine characters from different encodings. No standard way to find the cmap for an unknown encoding. There is HTML5, but that doesn’t produce something that is recognisable as a list of characters that can be processed as such. (If there is an IMG in text, is it a “character” or an illustration in the text? How can you refer to a particular set of characters without having your own web server? How you render that text bigger, with the standard reference glyph without manually searching the internet where to find it? There is a host of problems here). All these problems look unsolved to me, and they also look like encoding technology problems to me too. What other consortium is out there are working on character encoding problems? On 2 Jun 2015, at 7:40 pm, Philippe Verdy verd...@wanadoo.fr wrote: Once again no ! Unicode is a standard for encoding characters, not for encoding some syntaxic element of a glyph definition ! Your project is out of scope. You still want to reinvent the wheel. For creating syntax, define it within a language (which does not need new characters (you're not creating an APL grammar using specific symbols for some operators more or less based on Greek letters and geometric shapes: they are just like mathematic symbols). Programming languages and data languages (Javascript, XML, JOSN, HTML...) and their syntax are encoded themselves in plain text documents using standard characters) and don't need new characters, APL being an exception only because computers or keyboards were produced to facilitate the input (those that don't have such keyboards used specific editors or the APL runtime envitonment that offer an input method for entering programs in this APL input mode). Anf again you want the chicken before the egg: have you only ever read the encoding policy ? The UCS will not encode characters without a demonstrated usage. Nothing in what you propose is really used except being proposed only by you, and used only by you for your private use (or with a few of your unknown friends, but this is invisible and unverifiable). Nothing has been published. Even for currency symbols (which are an exception to the demonstrated use,
Re: Tag characters and in-line graphics (from Tag characters)
Martin, you seem to be labouring under the impression that HTML5 is a substitute for character encoding. If it is, why do we need unicode? We could just have documents laden with IMG tags, and restrict ourselves to ascii. It seems I need to spell out one more time why HTML is not character encoding: 1. HTML5 doesn’t separate one particular representation (font, size, etc) from the actual meaning of the character. So you can’t paste it somewhere and expect to increase its point size or change its font. 2. It’s highly inefficient in space to drop multi-kilobyte strings into a document to represent one character. 3. The entire design of HTML has nothing to do with characters. So there is no way to process a string of characters interspersed with HTML elements and know which of those elements are a “character”. This makes programatic manipulation impossible, and means most computer applications simply will not allow HTML in scenarios where they expect a list of “characters”. 4. There is no way to compare 2 HTML elements and know they are talking about the same character. I could put some HTML representation of a character in my document, you could put a different one in, and there would absolutely no way to know that they are the same character. Even if we are in the same community and agree on the existence of this character. 5. Similarly, there is no way to search or index html elements. If a HTML document contained an image of a particular custom character, there would be no way to ask google or whatever to find all the documents with that character. Different documents would represent it differently. HTML is a rendering technology. It makes things LOOK a particular way, without actually ENCODING anything about it. The only part of of HTML that is actually searchable in a deterministic fashion is the part that is encoded - the unicode part. Unicode encodes symbols that have “reasonable popularity”. (a) that is not all of them. (b) how can a symbol attain reasonable popularity when it is not in unicode? Of course some can, but others have their popularity hindered by the very fact that they are not encoded! Take the poop emoji that people recently have been talking about here. It gained popularity because the Japanese telecom companies decided to encode it. If they hadn’t encoded it, well would have become popular through normal culture such that the unicode consortium would have adopted it! No it wouldn’t! The Japanese telcos were able to do this because they controlled their entire user base from hardware on up to encodings. That won’t be happening into the future, so new interesting and potentially universal emojis won’t ever come into existence in the way that this one did because of the control the unicode consortium exercises over this technology. But the problem isn’t restricted to emojis, many other potentially popular symbols can’t come into existence either. The internet *COULD* be the birthplace of lots of interesting new symbols in the same way that Japanese telecom companies birthed the original emojis, but it won’t be because the unicode consortium r! ules it from the top down. Summary: 1. HTML renders stuff, it encodes nothing. It addresses a completely different problem domain. If rendering and encoding were the same problem, unicode can disband now. 2. Unicode encodes stuff, but isn’t extensible in a way that broadly useful. i.e. in a way that allows anybody (or any application) receiving a custom character to know what it is, or how to render it, or to combine it with other custom character sets. 3. The problem under discussion is not a rendering problem. HTML5 lacks nothing in terms of ability to render. Yet the problem remains. Because it’s an encoding problem. Encoding problems are in the unicode domain, not in the HTML5 domain. You say that character encodings work best when they are used widely and uniformly. But they can only be as wide or as uniform as reality itself. We could try and conform reality to technology and… for example… force all the world to use Latin characters and 128 ASCII representations. OR we can conform technology to reality. Not all encodings need to be, or ought to be as universal as requiring one world wide committee to pass judgment on them. On 3 Jun 2015, at 11:09 am, Martin J. Dürst due...@it.aoyama.ac.jp wrote: On 2015/06/03 07:55, Chris wrote: As you point out, The UCS will not encode characters without a demonstrated usage.”. But there are use cases for characters that don’t meet UCS’s criteria for a world wide standard, but are necessary for more specific use cases, like specialised regional, business, or domain specific situations. Unicode contains *a lot* of characters for specialized regional, business, or domain specific situations. My question is, given that unicode can’t realistically (and doesn’t aim to) encode every possible symbol in the world, why shouldn’t
Re: Tag characters and in-line graphics (from Tag characters)
On 3 Jun 2015, at 11:22 am, Martin J. Dürst due...@it.aoyama.ac.jp wrote: On 2015/05/29 11:37, John wrote: If I had a large document that reused a particular character thousands of times, Then it would be either a very boring document (containing almost only that same character) or it would be a very large document. If you have a daughter, look at her Facebook messenger, and then get back to me. would this HTML markup require embedding that character thousands of times, or could I define the character once at the beginning of the sequence, and then refer back to it in a space efficient way? If you want space efficiency, the best thing to do is to use generic compression. Many generic compression methods are available, many of them are widely supported, and all of them will be dealing with your case in a very efficient way You can’t ask the entire computing universe to compress everything all the time. And that is what your comment amounts to. Because the whole point under discussion is how can we encode stuff such that you can hope to universally move it around between different documents, formats, applications, input fields and platforms without any massage. Given that its been agreed that private use ranges are a good thing, That's not agreed upon. I'd say that the general agreement is that the private ranges are of limited usefulness for some very limited use cases (such as designing encodings for new scripts). They are of limited usefulness precisely because it is pathologically hard to make use of them in their current state of technological evolution. If they were easy to make use of, people would be using them all the time. I’d bet good money that if you surveyed a lot of applications where custom characters are being used, they are not using private use ranges. Now why would that be? and given that we can agree that exchanging data is a good thing, Yes, but there are many other ways to do that besides Unicode. And for many purposes, these other ways are better suited. The point is a universally recognised way. Of course you, me or anybody could design many good ways to solve any problem we might come up with. That doesn’t mean it will interoperate with anybody else though. maybe something should bring those two things together. Just a thought. Just a 'non sequitur'. Regards, Martin.
Re: Tag characters and in-line graphics (from Tag characters)
On 2015/05/29 11:37, John wrote: If I had a large document that reused a particular character thousands of times, Then it would be either a very boring document (containing almost only that same character) or it would be a very large document. would this HTML markup require embedding that character thousands of times, or could I define the character once at the beginning of the sequence, and then refer back to it in a space efficient way? If you want space efficiency, the best thing to do is to use generic compression. Many generic compression methods are available, many of them are widely supported, and all of them will be dealing with your case in a very efficient way. Given that its been agreed that private use ranges are a good thing, That's not agreed upon. I'd say that the general agreement is that the private ranges are of limited usefulness for some very limited use cases (such as designing encodings for new scripts). and given that we can agree that exchanging data is a good thing, Yes, but there are many other ways to do that besides Unicode. And for many purposes, these other ways are better suited. maybe something should bring those two things together. Just a thought. Just a 'non sequitur'. Regards, Martin.
Re: Tag characters and in-line graphics (from Tag characters)
No, nothing about what you propose, which is to encode graphics directly with a custom syntax using specific Unicode characters for this syntax itself. There's no such statement in the UTR, even for longer term. What is proposed instead is a way to *reference* (not define) graphics. For the rest, you need a rich-text format to embed graphics (using the syntax of this rich-text format, such as HTML), but this syntax remains out of scope of Unicode which will not standardize any graphic format, or any language by its syntax. Even for CLDR, you will use some JSON or XML rich-text format to create references, or embed some small graphics. But CLDR is NOT part of the Unicode Standard itself, and does not encode new characters (and I've not seen the CLDR requesing additions in the UCS for its own use, instead it uses its own assignments for PUAs where needed, als also for its own private locale tags for internal references within the CLDR data itself). 2015-06-02 12:37 GMT+02:00 William_J_G Overington wjgo_10...@btinternet.com : Responding to Philippe Verdy: Nothing has been published. It has been published. It is published in this thread for discussion prior to a possible submission to the Unicode Technical Committee that could take place if people on this mailing list feel that it is a good solution to the problem raised in section 8 of the following document. http://www.unicode.org/reports/tr51/tr51-2.html Direct link to 8 Longer Term Solutions http://www.unicode.org/reports/tr51/tr51-2.html#Longer_Term William Overington 2 June 2015
Re: Tag characters and in-line graphics (from Tag characters)
On 2015-06-02, William_J_G Overington wjgo_10...@btinternet.com wrote: take place if people on this mailing list feel that it is a good solution to the problem raised in section 8 of the following document. http://www.unicode.org/reports/tr51/tr51-2.html That section does not raise a problem. It says what the solution to the emoji problem is: namely that people who want to embed graphics in text should fix their protocols to allow it, instead of subverting Unicode to do it. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: Tag characters and in-line graphics (from Tag characters)
Responding to Philippe Verdy: Nothing has been published. It has been published. It is published in this thread for discussion prior to a possible submission to the Unicode Technical Committee that could take place if people on this mailing list feel that it is a good solution to the problem raised in section 8 of the following document. http://www.unicode.org/reports/tr51/tr51-2.html Direct link to 8 Longer Term Solutions http://www.unicode.org/reports/tr51/tr51-2.html#Longer_Term William Overington 2 June 2015
Re: Tag characters and in-line graphics (from Tag characters)
On 6/2/2015 2:01 AM, William_J_G Overington wrote: Local glyph memory, for use in compressing a document where the same glyph is used two or more times in the document: Um, that technology already exists. It is called a font. A mechanism to be able to use the method to define a glyph linked to a Unicode code point would be a useful facility to add for use in a situation where the glyph is for a regular Unicode character. And that mechanism has also already been defined. It is called a cmap: http://www.microsoft.com/typography/otspec/cmap.htm --Ken
Re: Tag characters and in-line graphics (from Tag characters)
2015-06-01 1:33 GMT+02:00 Chris idou...@gmail.com: Of course, anyone can invent a character set. The difficult bit is having a standard way of combining custom character sets. That’s why a standard would be useful. And while stuff like this can, to some extent, be recognised by magic numbers, and unique strings in headers, such things are unreliable. Just because example.net/mycharset/ appears near the start of a document, doesn’t necessarily mean it was meant to define a character set. Maybe it was a document discussing character sets. That's not what I described. I spoke about using a MIME-compatible private charset identifier, and how such private identifier can be made reasonnably unique by binding it to a domain name or URI. If you had read more carefully I also said that it was absolutely not necessary to dereference that URL: there are many XML schemas binding their namespaces to a URI which is itself not a webpage or to any downloadable DTD or XML schema or XML stylesheet. Google and Microsoft are using this a lot in lots of schemas (which are not described and documented at this URL if they are documented). The URI by itself is just an identifier, it becomes a webpage only when you use it in a web page with an href attribute to create an hyperlink, or to perform some query to a service returning some data. An identifier for a private charset does not need to perform any request to be usable by itself, we just have the identifier which is sufficient by itself. The URI can be also only a base URI for a collection of resources (whose URLs start by this base URI, with conventional extensions appended to get the character properties, or a font; but the best way is to embed this data in your document, in some header or footer, if your document using the private charset is not part of a collection of docs using the same private charset) In that case, you don't need a new UTF: UTF-8 remains usable and you can map your private charset to standard PUAs (and/or to hacked characters) according to the private charset needs. The charset indicated in your document (by some meta header) should be sufficient to avoid collisions with other private conventions, it will define the scope of your private charset as the document itself, which will then be interchangeable (and possibly mixable with other documents with some renumbering if there a collisions of assignments between two distinct private charsets: in the document header; add to the charset identifier the range of PUAs which is used, then with two documents colling on this range, you can reencode one automatically by creating a compound charset with subranges of PUAs remapped differently to other ranges).
Re: Tag characters and in-line graphics (from Tag characters)
On 5/31/2015 5:33 AM, Chris-as-John wrote: Yes, Asmus good post. But I don’t really think HTML, even a subset, is really the right solution. The longer I think about this, what would be needed would be something like an abstract format. A specification of the capabilities to be supported and the types of properties needed to support them in an extensible way. HTML and CSS would possibly become an implementation of such a specification. There would still be a place for a character set, that is Unicode, as an efficient way to implement the most basic and most standard features of text contents, but perhaps some extension mechanism that can handle various extensions. The first level of extension is support for recent (or rare) code points in the character set (additional fonts, etc, as you mention). The next level of extension could be support for collections of custom entities that are not available as character sets (stickers and the like). And finally, there would have to be a way to deal with one-offs, such as actual images that do not form categorizable sets, but are used in an ad-hoc manner and behave like custom characters. And so on. It should be possible to describe all of this in a way that allows it to be mapped to HMTL and CSS or to any other rich text format -- the goal, after all is to make such inline text as widely and effortlessly interchangeable as plain text is today (or at least nearly so). By keeping the specification abstract, you could accommodate both SGML like formats where ascii-string markup is intermixed with the text, as well as pure text buffers with place holder code points and links to external data. But, however bored you are with plain Unicode emoji, as long as there isn't an agreed upon common format for rich inline text I see very little chance that those cute facebook emoji will do anything other than firmly keep you in that particular ghetto. A./ I’m reminded of the design for XML itself, it is supposed to start with a header that defines what that XML will conform to. Those definitions contain some unique identifiers of that XML schema, which happens to be a URL. The URL is partly just a convenient unique identifier, but also, the XML engine, if it doesn’t know about that schema could go to that URL and download the schema, and check that the XML conforms to that schema. Similarly, imagine a text format that had a header with something like: \uCHARSET:facebook.com/charsets/pusheen-the-cat-emoji/,12345 Now all the characters following in the text will interpret characters that start with 12345 with respect to that character set. What would you find at at facebook.com/charsets/pusheen-the-cat-emoji/? You might find bitmaps, truetype fonts, vector graphics, etc. You might find many many representations of that character set that your rendering engine could cache for future use. The text format wouldn’t be reliant on today’s favorite rendering technology, whether bitmap, truetype fonts, or whatever. Right now, if you go to a website that references unicode that your platform doesn’t know about, you see nothing. If a format like this existed, character sets would be infinitely extensible, everybody on earth could see characters, even if their platform wasn’t previously aware of them, and the format would be independent of today’s rendering technologies. Let’s face it, HTML5 changes every few years, and I don’t think anybody wants the fundamental textual representation dependant on an entire layout engine. And also the whole range of what HTML5 can do, even some subset, is too much information. You don’t necessarily want your text to embed the actual character set. Perhaps that might be a useful option, but I think most people would want to uniquely identify the character set, in a way that an engine can download it, but without defining the actual details itself. Of course, certain charsets would probably become pervasive enough that platforms would just include them for convenience. Emojis by major messaging platforms. Maybe characters related to specialised domains like, I don’t know, mapping or specialised work domains or whatever, But without having to be subservient to the central unicode committee. As someone who is a keen user of Facebook messenger, and who sees them bring out a new set of emoji almost every week, I think the world will soon be totally bored with the plain basic emoji that unicode has defined. — Chris On Sun, May 31, 2015 at 9:06 PM, Asmus Freytag (t) asmus-...@ix.netcom.com mailto:asmus-...@ix.netcom.com wrote: reading this discussion, I agree with your reaductio ad absurdum of infinitely nested HTML. But I think you are onto something with your hypothetical example of the subset that works in ALL textual situations. There's clearly a use case for something like it, and I believe many people would intuitively agree on a set of features
Re: Tag characters and in-line graphics (from Tag characters)
The abstract format already exists also for HTML (with MIME charset extension of the media-type text/plain (it can also be embedded in a meta tag, where the HTML source file ius just stored in a filesystem, so that a webserver can parse it and provide the correct MIME header, if the webserver has no repository for metadata and must infer the media type from the file content itself with some guesser). It also exists in various conventions for source code (recognized by editors such as vi(m) or Emacs, or for Unic shells using embedded magic identifiers near the top of the file. You can use it to send an identifier for a private charset without having to request for a registration of the charset in the IANA database (which is not intended for private encodings). The pricate chrset can be named a unique way (consider using a private charset name based on a domain name you own, such as x-www.example.net-mycharset-1 if you own the domain name example.net). It will be enough for the initial experimentation for a few years (or more, provided that you renew this domain name). Your charset can contain various defitnitions: a mapping of your codepoints (including PUAs, or standard codepoints, or hacked codepoints if you have no other solution to get the correct character properties working with existing algorithms such as case mappings, collation, layout behavior in text renderers). Such solution would allow a more predictable management of PUAs (byt allowing to control their scope of use, by binding them, only in some magic header of the document, to a private charset that remains reasonnably unique. for example x-example.net-mycharset-1 would map to an URL like // www.example.net/mycharset/1/ containing some schema (it could be the base adress of an XML of JSON file, and of a web font containing the relevant glyphs, and of a character properties database to override the default ones from the standard: if you already know this private charset in your application, you don't need to download any of these files, the URL is just an identifier and you file can still be used in standalone mode, just like you can parse many standard XML schemas by just recognizing the URLs assigned to the XML namespaces, without even having to find a DTD or XML schema definition from an external resource; if needed you app can contain a local repository in some cache folder where you can extend the number of private charsets that can be recognized). Full interopability will still not be possible if you need to mix in the same document texts encoded with different private charsets (there's always a risk of collision), without a way to reencode some of them to a joined charset without the collisions) by infering a new private charset (it's not impossible to do, after all this is done already with XML schemas that you can mix together: you just need to rename the XML namespaces, keeping the URLs to which they are bound, when there's a collision on the XML namespace names, a situation that occurs sometimes because of versioning where some features of a schema are not fully upward compatible). Yes this complicate things a bit, but much less than when using documents in which PUA assignments are not negociated at all (even minimally to make sure they are compatible when mixing sources); and for which there exits for now absolutely no protocol defined for such negociation (TUS says that PUAs are usable and interchangeable under private mutual agreement but still provides no schemes for supporting such mutual agreement, and for this reason, PUAs are alsmost always rejected, and people want true permanent assignments for characters that are very specific, badly documented, or insufficiently known to have reliable permanent properties). So let's think about securing the use of PUAs with some identification scheme (for plain-text formats, it should just be allowed to negocaite a single charset for the whole, using the magic header tricks that re used since long by charset guessers (including for autodetecting UTF-8 encoded files). This would also solve the chicken-and-egg problem where we need more sources to attest an effective usage before encoding new characters, but developping this usages is extremely difficult (and much slower) in our modern technologies where most documents are now handled numerically (in the past it was possible to create a metal font and use it immediately to start editing books, and there were many more people using handwriting and drawings, so it was much less difficult to invent new characters, than it is today, unless you're a big company that has enough resources to develop this usage alone, such as Japanese telcos or Google, Yahoo, Samsung or Microsoft introducing new sets of Emojis for their instant messaging platform, with tons of developers working for them to develop a wide range of services around it...) However I'm not saying that Unicode should specify how such private charset containing private
Re: Tag characters and in-line graphics (from Tag characters)
David Starner wrote: I would say that a system would conform with Unicode in having yellow heart red (in a non-monochrome font) as well as if it made it a cross. Either way it's violating character identity. I'd say that being monochromatic is now like being monospaced; it's suboptimal for a Unicode implementation, but hardly something Unicode can condemn as nonconformant. This seems fair and sensible. My main point was that being monochromatic (i.e. black) is conformant, and was an attempt to challenge the statement about character color sometimes being a recorded property. I don't see any Unicode character properties that identify color, only character names, which don't carry property information. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters and in-line graphics (from Tag characters)
Of course, anyone can invent a character set. The difficult bit is having a standard way of combining custom character sets. That’s why a standard would be useful. And while stuff like this can, to some extent, be recognised by magic numbers, and unique strings in headers, such things are unreliable. Just because example.net/mycharset/ http://example.net/mycharset/ appears near the start of a document, doesn’t necessarily mean it was meant to define a character set. Maybe it was a document discussing character sets. And while it is tempting to allow the “container” to define the “header” information, whether the container be html defining something in its HEAD tag, or some proprietary format (MS-Word), or whatever, that doesn’t really solve anybody’s problem in a standard way. For a start, what if you want to copy text to the clipboard? You want the thing receiving it to be able to interpret it in a self-contained way. The 2 obvious implementations for a standard seem to be: 1) A standard (optional) header. Perhaps if the string starts with a special character, then follows a header defining charsets first. These would allocate character ranges for custom characters, and point to where their renderings can be found. Standard programming libraries on all platforms would invisibly act appropriately on these headers. If you concatenated strings with conflicting namespaces, standard libraries would seamlessly reallocate one of the custom namespaces and merge the headers. 2) Make a new character set, let’s call it UTF-64. 32 bits would be allocated for custom character sets. Anybody could apply to a central authority to be allocated a custom id (32 bits=4 billion ids). A central location, kind of like a domain name system, would map that id to the URL where the canonical definition for that character set is. The 2nd option has the advantage that the file format is fixed width like normal plain text documents. Concatenating custom character set strings is no issue. The canonical location for a character set isn’t forevermore mapped to a particular domain owner. Nothing about the meaning of the characters is defined in the actual bits other than the unique id. The disadvantage is it needs a central authority to maintain the list of ids, and map them to domains. On 1 Jun 2015, at 7:26 am, Philippe Verdy verd...@wanadoo.fr wrote: The abstract format already exists also for HTML (with MIME charset extension of the media-type text/plain (it can also be embedded in a meta tag, where the HTML source file ius just stored in a filesystem, so that a webserver can parse it and provide the correct MIME header, if the webserver has no repository for metadata and must infer the media type from the file content itself with some guesser). It also exists in various conventions for source code (recognized by editors such as vi(m) or Emacs, or for Unic shells using embedded magic identifiers near the top of the file. You can use it to send an identifier for a private charset without having to request for a registration of the charset in the IANA database (which is not intended for private encodings). The pricate chrset can be named a unique way (consider using a private charset name based on a domain name you own, such as x-www.example.net-mycharset-1 if you own the domain name example.net http://example.net/). It will be enough for the initial experimentation for a few years (or more, provided that you renew this domain name). Your charset can contain various defitnitions: a mapping of your codepoints (including PUAs, or standard codepoints, or hacked codepoints if you have no other solution to get the correct character properties working with existing algorithms such as case mappings, collation, layout behavior in text renderers). Such solution would allow a more predictable management of PUAs (byt allowing to control their scope of use, by binding them, only in some magic header of the document, to a private charset that remains reasonnably unique. for example x-example.net-mycharset-1 would map to an URL like //www.example.net/mycharset/1/ http://www.example.net/mycharset/1/ containing some schema (it could be the base adress of an XML of JSON file, and of a web font containing the relevant glyphs, and of a character properties database to override the default ones from the standard: if you already know this private charset in your application, you don't need to download any of these files, the URL is just an identifier and you file can still be used in standalone mode, just like you can parse many standard XML schemas by just recognizing the URLs assigned to the XML namespaces, without even having to find a DTD or XML schema definition from an external resource; if needed you app can contain a local repository in some cache folder where you can extend the number of private charsets that can be recognized).
Re: Tag characters and in-line graphics (from Tag characters)
John, reading this discussion, I agree with your reaductio ad absurdum of infinitely nested HTML. But I think you are onto something with your hypothetical example of the subset that works in ALL textual situations. There's clearly a use case for something like it, and I believe many people would intuitively agree on a set of features for it. What people seem to have in mind is something like inline text. Something beyond a mere stream of plain text (with effectively every character rendered visibly), but still limited in important ways by general behavior of inline text: a string of it, laid out, must wrap and line break, any objects included in it must behave like characters (albeit of custom width, height and appearance), and so on. Paragraph formatting, stacked layout, header levels and all those good things would not be available. With such a subset clearly defined, many quirky limitations might no longer be necessary; any container that today only takes plain text could be upgraded to take inline text. I can see some inline containers retaining a nesting limitation, but I could imagine that it is possible to arrive at a consistent definition of such inline format. Going further, I can't shake the impression that without a clean definition of an inline text format along those lines, any attempts at making stickers and similar solutions stick are doomed to failure. The interesting thing in defining such a format is not how to represent it in HTML or CSS syntax, but in describing what feature sets it must (minimally) support. Doing it that way would free existing implementations of rich text to map native formats onto that minimally required subset and to add them to their format translators for HMTL or whatever else they use for interchange. Only with a definition can you ever hope to develop a processing model. It won't be as simple as for plain text strings, but it should be able to support common abstractions (like iteration by logical unit). It would have to support the management of external resources - if the inline format allows images, custom fonts, etc. one would need a way to manage references to them in the local context. If your skeptical position proves correct in that this is something that turns out to not be tractable, then I think you've provided conclusive proof why stickers won't happen and why encoding emoji was the only sensible decision Unicode could have taken. A./ On 5/30/2015 7:14 AM, John wrote: Hmm, these once entities of which you speak, do they require javascript? Because I'm not sure what we are looking for here is static documents requiring a full programming language. But let's say for a moment that html5 can, or could do the job here. Then to make the dream come true that you could just cut and paste text that happened to contain a custom character to somewhere else, and nothing untoward would happen, would mean that everything in the computing universe should allow full blown html. So every Java Swing component, every Apple gui component, every .NET component, every windows component, every browser, every Android and IOS component would allow text entry of HTML entities. OK, so let's say everyone agrees with this course of action, now the universal text format is HTML. But in this new world where anywhere that previously you could input text, you can now input full blown html, does that actually make sense? Does it make sense that you can for example, put full blown HTML inside a H1 tag in html itself? That's a lot of recursion going on there. Or in a MS-Excel cell? Or interspersed in some otherwise fairly regular text in a Word document? I suppose someone could define a strict limited subset of HTML to be that subset that makes sense in ALL textual situations. That subset would be something like just defining things that act like characters, and not like a full blown rendering engine. But who would define that subset? Not the HTML groups, because their mandate is to define full blown rendering engines. It would be more likely to be something like the unicode group. And also, in this brave new world where HTML5 is the new standard text format, what would the binary format of it be? I mean, if I have the string of unicode characters IMG would that be HTML5 image definition that should be rendered as such? Or would it be text that happens to contain greater than symbol, I, M and G? It would have to be the former I guess, and thereby there would no longer be a unicode symbol for the mathematical greater than symbol. Rather there would be a unicode symbol for opening a HTML tag, and the text code for greater than would be gt; Never again would a computer store to mean greater than. Do we want HTML to be so pervasive? Not sure it deserves that. And from a programmers point of view, he wants to be able to iterate over an array of characters and treat each one the same way,
Re: Tag characters and in-line graphics (from Tag characters)
Yes, Asmus good post. But I don’t really think HTML, even a subset, is really the right solution. I’m reminded of the design for XML itself, it is supposed to start with a header that defines what that XML will conform to. Those definitions contain some unique identifiers of that XML schema, which happens to be a URL. The URL is partly just a convenient unique identifier, but also, the XML engine, if it doesn’t know about that schema could go to that URL and download the schema, and check that the XML conforms to that schema. Similarly, imagine a text format that had a header with something like: \uCHARSET:facebook.com/charsets/pusheen-the-cat-emoji/,12345 Now all the characters following in the text will interpret characters that start with 12345 with respect to that character set. What would you find at at facebook.com/charsets/pusheen-the-cat-emoji/? You might find bitmaps, truetype fonts, vector graphics, etc. You might find many many representations of that character set that your rendering engine could cache for future use. The text format wouldn’t be reliant on today’s favorite rendering technology, whether bitmap, truetype fonts, or whatever. Right now, if you go to a website that references unicode that your platform doesn’t know about, you see nothing. If a format like this existed, character sets would be infinitely extensible, everybody on earth could see characters, even if their platform wasn’t previously aware of them, and the format would be independent of today’s rendering technologies. Let’s face it, HTML5 changes every few years, and I don’t think anybody wants the fundamental textual representation dependant on an entire layout engine. And also the whole range of what HTML5 can do, even some subset, is too much information. You don’t necessarily want your text to embed the actual character set. Perhaps that might be a useful option, but I think most people would want to uniquely identify the character set, in a way that an engine can download it, but without defining the actual details itself. Of course, certain charsets would probably become pervasive enough that platforms would just include them for convenience. Emojis by major messaging platforms. Maybe characters related to specialised domains like, I don’t know, mapping or specialised work domains or whatever, But without having to be subservient to the central unicode committee. As someone who is a keen user of Facebook messenger, and who sees them bring out a new set of emoji almost every week, I think the world will soon be totally bored with the plain basic emoji that unicode has defined. — Chris On Sun, May 31, 2015 at 9:06 PM, Asmus Freytag (t) asmus-...@ix.netcom.com wrote: John, reading this discussion, I agree with your reaductio ad absurdum of infinitely nested HTML. But I think you are onto something with your hypothetical example of the subset that works in ALL textual situations. There's clearly a use case for something like it, and I believe many people would intuitively agree on a set of features for it. What people seem to have in mind is something like inline text. Something beyond a mere stream of plain text (with effectively every character rendered visibly), but still limited in important ways by general behavior of inline text: a string of it, laid out, must wrap and line break, any objects included in it must behave like characters (albeit of custom width, height and appearance), and so on. Paragraph formatting, stacked layout, header levels and all those good things would not be available. With such a subset clearly defined, many quirky limitations might no longer be necessary; any container that today only takes plain text could be upgraded to take inline text. I can see some inline containers retaining a nesting limitation, but I could imagine that it is possible to arrive at a consistent definition of such inline format. Going further, I can't shake the impression that without a clean definition of an inline text format along those lines, any attempts at making stickers and similar solutions stick are doomed to failure. The interesting thing in defining such a format is not how to represent it in HTML or CSS syntax, but in describing what feature sets it must (minimally) support. Doing it that way would free existing implementations of rich text to map native formats onto that minimally required subset and to add them to their format translators for HMTL or whatever else they use for interchange. Only with a definition can you ever hope to develop a processing model. It won't be as simple as for plain text strings, but it should be able to support common abstractions (like iteration by logical unit). It would have to support the management of external resources - if the inline format allows images, custom fonts, etc. one would need a way to manage references to them in the local context. If
Re: Tag characters and in-line graphics (from Tag characters)
2015-05-30 10:47 GMT+02:00 William_J_G Overington wjgo_10...@btinternet.com : Responding to Doug Ewell: I think this cuts to the heart of what people have been trying to say all along. Historically, Unicode was not meant to be the means by which brand new ideas are run up the proverbial flagpole to see if they will gain traction. History is interesting and can be a good guide, yet many things that are an accepted part of Unicode today started as new ideas that gained traction and became implemented. So history should not be allowed to be a reason to restrict progress. For example, there was the extension from 1 plane to 17 planes. Actually this was a restriction of the UCS to *only* 17 planes. Before that the UCS contained 31-bit code points, i.e. 32768 planes ! If you're speaking about the old Unicode 1.0 it was then still not the UCS and it was then incompatible with the UCS for many important parts, and the initial targets of Unicode was only to have an industry standard immediately usable between a few software providers (Unicode 1.0 was then not an international standard, forget it !).
Re: Tag characters and in-line graphics (from Tag characters)
Note: Everything below is my personal opinion and does not represent any official Unicode Consortium or UTC position. William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote: Historically, Unicode was not meant to be the means by which brand new ideas are run up the proverbial flagpole to see if they will gain traction. History is interesting and can be a good guide, yet many things that are an accepted part of Unicode today started as new ideas that gained traction and became implemented. So history should not be allowed to be a reason to restrict progress. I used historically to distinguish between the pre- and post-Emoji Revolution eras. There have clearly been changes recently, but there is still at least a minimal expectation that proposed characters will fulfill a demonstrated need. I'm not seeing any truly novel, untested ideas in the list below that Unicode implemented purely on speculation. For example, there was the extension from 1 plane to 17 planes. That was an architectural extension, brought about by the realization that 64K code points wasn't enough for even the original scope. There's no comparison. There was the introduction of emoji support. Emoji proponents would argue that emoji support began in 1.0 with the inclusion of various dingbats. But even emoji are arguably characters in some sense. They aren't a mini-language used to define images pixel by pixel. There was the introduction of the policy of colour sometimes being a recorded property rather than having just the original monochrome recording policy. There isn't any such policy. There is a variation selector to suggest that the rendering engine show certain characters in emoji style instead of text style, and there are characters with colors in their names, but there is no policy that specific colors are recorded as part of the encoding. YELLOW HEART could conformantly appear in any color. There has been the change of encoding policy that facilitated the introduction of the Indian Rupee character into Unicode and ISO/IEC 10646 far more quickly than had been thought possible, so that the encoding was ready for use when needed. That's not a change to what types of things get encoded. It's a procedural change, one which I would agree has been applied with increasing creativity. There has been the recent encoding policy change regarding encoding of pure electronic use items taking place without (extensive prior use using a Private Use Area encoding), such as the encoding of the UNICORN FACE. This is probably your best analogy. People like Asmus have addressed it, saying it's not reasonable to expect users to adopt PUA solutions and wait for them to catch on. There is the recent change to the deprecation status of most of the tag characters and the acceptance of the base character followed by tag characters technique so as to allow the specifying of a larger collection of particular flags. There must have been a great wailing and gnashing of teeth over that decision. So many statements were made over the years about the basic evilness of tag characters. But the concept of representing flags was already agreed upon as a compatibility measure, and the Regional Indicator Symbols solution was a compromise that allowed expansion beyond the 10 flags that Japanese telcos chose to include. RIS were an architectural decision. The tag solution (to be fully outlined in a future PRI) was another architectural decision. Neither (I believe) is analogous to a scope decision to start encoding different types of non-character things as if they were characters, and as I have said before, assigning a glyph to a thing that isn't a character doesn't make it one. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters and in-line graphics (from Tag characters)
I would say that a system would conform with Unicode in having yellow heart red (in a non-monochrome font) as well as if it made it a cross. Either way it's violating character identity. I'd say that being monochromatic is now like being monospaced; it's suboptimal for a Unicode implementation, but hardly something Unicode can condemn as nonconformant. On 4:25pm, Sat, May 30, 2015 Doug Ewell d...@ewellic.org wrote: Note: Everything below is my personal opinion and does not represent any official Unicode Consortium or UTC position. William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote: Historically, Unicode was not meant to be the means by which brand new ideas are run up the proverbial flagpole to see if they will gain traction. History is interesting and can be a good guide, yet many things that are an accepted part of Unicode today started as new ideas that gained traction and became implemented. So history should not be allowed to be a reason to restrict progress. I used historically to distinguish between the pre- and post-Emoji Revolution eras. There have clearly been changes recently, but there is still at least a minimal expectation that proposed characters will fulfill a demonstrated need. I'm not seeing any truly novel, untested ideas in the list below that Unicode implemented purely on speculation. For example, there was the extension from 1 plane to 17 planes. That was an architectural extension, brought about by the realization that 64K code points wasn't enough for even the original scope. There's no comparison. There was the introduction of emoji support. Emoji proponents would argue that emoji support began in 1.0 with the inclusion of various dingbats. But even emoji are arguably characters in some sense. They aren't a mini-language used to define images pixel by pixel. There was the introduction of the policy of colour sometimes being a recorded property rather than having just the original monochrome recording policy. There isn't any such policy. There is a variation selector to suggest that the rendering engine show certain characters in emoji style instead of text style, and there are characters with colors in their names, but there is no policy that specific colors are recorded as part of the encoding. YELLOW HEART could conformantly appear in any color. There has been the change of encoding policy that facilitated the introduction of the Indian Rupee character into Unicode and ISO/IEC 10646 far more quickly than had been thought possible, so that the encoding was ready for use when needed. That's not a change to what types of things get encoded. It's a procedural change, one which I would agree has been applied with increasing creativity. There has been the recent encoding policy change regarding encoding of pure electronic use items taking place without (extensive prior use using a Private Use Area encoding), such as the encoding of the UNICORN FACE. This is probably your best analogy. People like Asmus have addressed it, saying it's not reasonable to expect users to adopt PUA solutions and wait for them to catch on. There is the recent change to the deprecation status of most of the tag characters and the acceptance of the base character followed by tag characters technique so as to allow the specifying of a larger collection of particular flags. There must have been a great wailing and gnashing of teeth over that decision. So many statements were made over the years about the basic evilness of tag characters. But the concept of representing flags was already agreed upon as a compatibility measure, and the Regional Indicator Symbols solution was a compromise that allowed expansion beyond the 10 flags that Japanese telcos chose to include. RIS were an architectural decision. The tag solution (to be fully outlined in a future PRI) was another architectural decision. Neither (I believe) is analogous to a scope decision to start encoding different types of non-character things as if they were characters, and as I have said before, assigning a glyph to a thing that isn't a character doesn't make it one. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters and in-line graphics (from Tag characters)
Responding to Leo Broukhis: A more common occurrence is the need to include a non-standard character in a text message, be it a ski piste symbol or an obscure CJK ideogram. Have you thought of embedding TrueType in Unicode? Not congruently so, yet, in effect, yes, as I have considered including individual OpenType-compatible glyphs in a base character followed by tag characters format. OpenType is a development from TrueType that can achieve more than can TrueType on its own. There is a little about this in the last two paragraphs of the following post. http://www.unicode.org/mail-arch/unicode-ml/y2015-m05/0218.html There would need to be a few additions to make if work effectively: for example, a value for each of advance width, ascent maximum, descent maximum and fontunits per em. William Overington 30 May 2015
Re: Tag characters and in-line graphics (from Tag characters)
Responding to Doug Ewell: I think this cuts to the heart of what people have been trying to say all along. Historically, Unicode was not meant to be the means by which brand new ideas are run up the proverbial flagpole to see if they will gain traction. History is interesting and can be a good guide, yet many things that are an accepted part of Unicode today started as new ideas that gained traction and became implemented. So history should not be allowed to be a reason to restrict progress. For example, there was the extension from 1 plane to 17 planes. There was the introduction of emoji support. There was the introduction of the policy of colour sometimes being a recorded property rather than having just the original monochrome recording policy. There has been the change of encoding policy that facilitated the introduction of the Indian Rupee character into Unicode and ISO/IEC 10646 far more quickly than had been thought possible, so that the encoding was ready for use when needed. There has been the recent encoding policy change regarding encoding of pure electronic use items taking place without (extensive prior use using a Private Use Area encoding), such as the encoding of the UNICORN FACE. There is the recent change to the deprecation status of most of the tag characters and the acceptance of the base character followed by tag characters technique so as to allow the specifying of a larger collection of particular flags. The two questions that I asked in my response to a post by Mark E. Shoulson are relevant here. Suppose that a plain text file is to include just one non-standard emoji graphic. How would that be done otherwise than by the format that I am suggesting? What if there were three such non-standard emoji graphics needed in the plain text file, the second graphic being used twice. How would that be done otherwise than by the format that I am suggesting? William Overington 30 May 2015
Re: Tag characters and in-line graphics (from Tag characters)
Hmm, these once entities of which you speak, do they require javascript? Because I'm not sure what we are looking for here is static documents requiring a full programming language. But let's say for a moment that html5 can, or could do the job here. Then to make the dream come true that you could just cut and paste text that happened to contain a custom character to somewhere else, and nothing untoward would happen, would mean that everything in the computing universe should allow full blown html. So every Java Swing component, every Apple gui component, every .NET component, every windows component, every browser, every Android and IOS component would allow text entry of HTML entities. OK, so let's say everyone agrees with this course of action, now the universal text format is HTML. But in this new world where anywhere that previously you could input text, you can now input full blown html, does that actually make sense? Does it make sense that you can for example, put full blown HTML inside a H1 tag in html itself? That's a lot of recursion going on there. Or in a MS-Excel cell? Or interspersed in some otherwise fairly regular text in a Word document? I suppose someone could define a strict limited subset of HTML to be that subset that makes sense in ALL textual situations. That subset would be something like just defining things that act like characters, and not like a full blown rendering engine. But who would define that subset? Not the HTML groups, because their mandate is to define full blown rendering engines. It would be more likely to be something like the unicode group. And also, in this brave new world where HTML5 is the new standard text format, what would the binary format of it be? I mean, if I have the string of unicode characters IMG would that be HTML5 image definition that should be rendered as such? Or would it be text that happens to contain greater than symbol, I, M and G? It would have to be the former I guess, and thereby there would no longer be a unicode symbol for the mathematical greater than symbol. Rather there would be a unicode symbol for opening a HTML tag, and the text code for greater than would be gt; Never again would a computer store to mean greater than. Do we want HTML to be so pervasive? Not sure it deserves that. And from a programmers point of view, he wants to be able to iterate over an array of characters and treat each one the same way, regardless if it is a custom character or not. Without that kind of programmatic abstraction, the whole thing can never gain traction. I don't think fully blown HTML embedded in your text can fulfill that. A very strictly defined subset, possibly could. Sure HTML5 can RENDER stuff adquately, if the only aim of the game is provide a correct rendering. But to be able to actually treat particular images embedded as characters, and have some programming library see that abstraction consistently, I'm not sure I'm convinced that is possible. Not without nailing down exactly what html elements in what particular circumstances constitute a character. I guess in summary, yes we have the technology already to render anything. But I don't think the whole standards framework does anything to allow the computing universe to actually exchange custom characters as if they were just any other text. Someone would actually have to work on a standard to do that, not just point to html5. On Saturday, 30 May 2015 at 5:08 am, Philippe Verdy verd...@wanadoo.fr, wrote: 2015-05-29 4:37 GMT+02:00 John idou...@gmail.com: Today the world goes very well with HTML(5) which is now the bext markup language for document (including for inserting embedded images that don’t require any external request” If I had a large document that reused a particular character thousands of times, would this HTML markup require embedding that character thousands of times, or could I define the character once at the beginning of the sequence, and then refer back to it in a space efficient way? HTML(5) allows defining *once* entities for images that can then be reused thousands of times without repeting their definition. You can do this as well with CSS styles, just define a class for a small element. This element may still be an image, but the semantic is carried by the class you assign to it. You are not required to provide an external source URL for that image if the CSS style provides the content. You may also use PUAs for the same purpose (however I have not seen how CSS allows to style individual characters in text elements as these characters are not elements, and there's no defined selector for pseudo-elements matching a single character). PUAs are perfectly usable in the situation where you have embedded a custom font in your document for assigning glyphs to characters (you can still do that, but I would avoid TrueType/OpenType for this purpose, but would use the SVG
Re: Tag characters and in-line graphics (from Tag characters)
Responding to Mark E. Shoulson: As was pointed out to me, essentially what you are saying is you reject my premise that one size does not fit all. Well, I do not know where that came from, but no, I do not reject that premise. There is plain text, there is HTML, there is XML. HTML is good for web pages. Plain text is, amongst other applications, good for text messages. The format that I am suggesting would allow the image for a non-standard emoji character to be included in a text message, with the image located at the correct place in the text. I have not purported that it become the only format for transmitting images. You would prefer *everything* be in plain text, so you wouldn't have to use other formats for it. You're essentially converting plain text into THE format for everything. No. Use the best format for the task that is being carried out. I am enthusiastic that as much as possible can be done in open source formats rather than an end user of computing equipment needing to rely on expensive propriety software packages with proprietary file formats that cannot be accessed without expensive software. If you really believe one size should fit all in this way, ... But I don't. Just because I opine that plain text is best for some applications and I have suggested a format that would allow a graphic to be included directly in a plain text file does not mean that I opine that everything should be plain text. For example, I use HTML files, gif files, png files, pdf files, wav files, TTF files as appropriate. http://www.users.globalnet.co.uk/~ngo/library.htm http://www.users.globalnet.co.uk/~ngo/spec0001.htm http://www.users.globalnet.co.uk/~ngo/song1018.htm http://www.users.globalnet.co.uk/~ngo/song1021.htm I have embedded a wav file in a pdf and published the result on the web. http://www.users.globalnet.co.uk/~ngo/the_mobile_art_shop.pdf Suppose that a plain text file is to include just one non-standard emoji graphic. How would that be done otherwise than by the format that I am suggesting? What if there were three such non-standard emoji graphics needed in the plain text file, the second graphic being used twice. How would that be done otherwise than by the format that I am suggesting? William Overington 29 May 2015
Re: Tag characters and in-line graphics (from Tag characters)
Responding to Philippe Verdy: There's no advantage because what you want to create is effectively another markup language with its own syntax (but requiring new obscure characters that most applications and users will not be able to interpret and render correctly in the way intended by you, ... Well, if the format became accepted as part of Unicode then appropriate applications could well be produced that would interpret the format and display an image in the desired place. ... and with still many things you have forgotten about the specific needs for images (e.g. colorimetry profiles, aspect ratio of pixels with bitmaps, undesired effects that must be controled such as moiré artefacts). The format is just at present a basic suggestion. Rather than just state what you consider what I have forgotten and dismiss the format, how about joining in progress and specifying what you consider needs adding to the format and perhaps suggest how to add in that functionality in the style that the format uses. You don't need new characters to create a markup language and its syntax. Today the world goes very well with HTML(5) which is now the bext markup language for document (including for inserting embedded images that don't require any external request, or embedding special effects on images, such as animation or dynamic layouts for adapting the document to the redering device, with the help of CSS and Javascript that are also embeddable). The two questions that I asked in my response to a post by Mark E. Shoulson are relevant here. Suppose that a plain text file is to include just one non-standard emoji graphic. How would that be done otherwise than by the format that I am suggesting? What if there were three such non-standard emoji graphics needed in the plain text file, the second graphic being used twice. How would that be done otherwise than by the format that I am suggesting? At least with HTML5 they don't try to reinvent the image formats and there's ample space for supporting multiple images formats tuned for specific needs (e.g. JPEG, PNG, GIF, SVG, TIFF...) including animation and video, and synchronization of images and audio in time for videos, or with user interactions. They are designed separately and benefit from patient researches made since long (your desired format, still undocumented, is largely under the level needed for images, independantly of the markup syntax you want to create to support them, and independantly of the fact that you also want to encode these syntaxic elements with new characters, something that is absolutely not needed for any markup language) Well it is undocumented apart from posts in this thread because I have put forward the format for discussion. A pdf document for consideration by the Unicode Technical Committee could be produced and submitted if there is interest in the format, the content of the pdf document perhaps including suggestions from this thread if any such suggestions are forthcoming. In summary, you are reinventing the wheel. Well, this is progress, producing an additional format for expressing an image for application in various specific specialised circumstances. William Overington 29 May 2015
Re: Tag characters and in-line graphics (from Tag characters)
The format that I am suggesting would allow the image for a non-standard emoji character to be included in a text message, with the image located at the correct place in the text. A more common occurrence is the need to include a non-standard character in a text message, be it a ski piste symbol or an obscure CJK ideogram. Have you thought of embedding TrueType in Unicode? Leo On Fri, May 29, 2015 at 1:38 AM, William_J_G Overington wjgo_10...@btinternet.com wrote: Responding to Mark E. Shoulson: As was pointed out to me, essentially what you are saying is you reject my premise that one size does not fit all. Well, I do not know where that came from, but no, I do not reject that premise. There is plain text, there is HTML, there is XML. HTML is good for web pages. Plain text is, amongst other applications, good for text messages. The format that I am suggesting would allow the image for a non-standard emoji character to be included in a text message, with the image located at the correct place in the text. I have not purported that it become the only format for transmitting images. You would prefer *everything* be in plain text, so you wouldn't have to use other formats for it. You're essentially converting plain text into THE format for everything. No. Use the best format for the task that is being carried out. I am enthusiastic that as much as possible can be done in open source formats rather than an end user of computing equipment needing to rely on expensive propriety software packages with proprietary file formats that cannot be accessed without expensive software. If you really believe one size should fit all in this way, ... But I don't. Just because I opine that plain text is best for some applications and I have suggested a format that would allow a graphic to be included directly in a plain text file does not mean that I opine that everything should be plain text. For example, I use HTML files, gif files, png files, pdf files, wav files, TTF files as appropriate. http://www.users.globalnet.co.uk/~ngo/library.htm http://www.users.globalnet.co.uk/~ngo/spec0001.htm http://www.users.globalnet.co.uk/~ngo/song1018.htm http://www.users.globalnet.co.uk/~ngo/song1021.htm I have embedded a wav file in a pdf and published the result on the web. http://www.users.globalnet.co.uk/~ngo/the_mobile_art_shop.pdf Suppose that a plain text file is to include just one non-standard emoji graphic. How would that be done otherwise than by the format that I am suggesting? What if there were three such non-standard emoji graphics needed in the plain text file, the second graphic being used twice. How would that be done otherwise than by the format that I am suggesting? William Overington 29 May 2015
Re: Tag characters and in-line graphics (from Tag characters)
William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote: There's no advantage because what you want to create is effectively another markup language with its own syntax (but requiring new obscure characters that most applications and users will not be able to interpret and render correctly in the way intended by you, ... Well, if the format became accepted as part of Unicode then appropriate applications could well be produced that would interpret the format and display an image in the desired place. I think this cuts to the heart of what people have been trying to say all along. Historically, Unicode was not meant to be the means by which brand new ideas are run up the proverbial flagpole to see if they will gain traction. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters and in-line graphics (from Tag characters)
2015-05-29 4:37 GMT+02:00 John idou...@gmail.com: Today the world goes very well with HTML(5) which is now the bext markup language for document (including for inserting embedded images that don’t require any external request” If I had a large document that reused a particular character thousands of times, would this HTML markup require embedding that character thousands of times, or could I define the character once at the beginning of the sequence, and then refer back to it in a space efficient way? HTML(5) allows defining *once* entities for images that can then be reused thousands of times without repeting their definition. You can do this as well with CSS styles, just define a class for a small element. This element may still be an image, but the semantic is carried by the class you assign to it. You are not required to provide an external source URL for that image if the CSS style provides the content. You may also use PUAs for the same purpose (however I have not seen how CSS allows to style individual characters in text elements as these characters are not elements, and there's no defined selector for pseudo-elements matching a single character). PUAs are perfectly usable in the situation where you have embedded a custom font in your document for assigning glyphs to characters (you can still do that, but I would avoid TrueType/OpenType for this purpose, but would use the SVG font format which is valid in CSS, for defining a collection of glyphs). If the document is not restricted to be standalone, of course you can use links to an external shared CSS stylesheet and to this SVG font referenced by the stylesheet. With such approach, you don't even need to use classes on elements, you use plain-text with very compact PUAs (it's up to you to decide if the document must be standalone (embedding everything it needs) or must use external references for missing definitions, HTML allows both (and SVG as well when it contains plain-text elements).
Re: Tag characters and in-line graphics (from Tag characters)
As was pointed out to me, essentially what you are saying is you reject my premise that one size does not fit all. You would prefer *everything* be in plain text, so you wouldn't have to use other formats for it. You're essentially converting plain text into THE format for everything. But it isn't suited for that. If you really believe one size should fit all in this way, I think the problem is that pretty much all of the rest of the computer science community doesn't agree with you. Sorry. ~mark On 05/28/2015 07:50 AM, William_J_G Overington wrote: Responding to Mark E. Shoulson: The big advantage of this new format is that the result is an unambiguous Unicode plain text file and could be placed within a file of plain text without having to make the whole document a markup file to some format. Plain text is the key advantage. The following may be useful as a guide to the original problem that I am trying to solve. http://www.unicode.org/reports/tr51/tr51-2.html#Longer_Term I tried to apply the brilliant new base character followed by tag characters format to the problem. In the future, maybe Serif DrawPlus will have the ability to export a picture to this new format. William Overington 28 May 2015
Re: Tag characters and in-line graphics (from Tag characters)
Today the world goes very well with HTML(5) which is now the bext markup language for document (including for inserting embedded images that don’t require any external request” If I had a large document that reused a particular character thousands of times, would this HTML markup require embedding that character thousands of times, or could I define the character once at the beginning of the sequence, and then refer back to it in a space efficient way? Part of the reason at least of having any code system rather than just pixels and images is to efficiently and consistently encode data. Unicode has private use ranges of codes. I can see an argument that it would be desirable to be able to send someone text with private use ranges and have the header define some default renderings. I’m not sure that replacing a document of 100,000 characters with 100,000 embedded html5 img tags is the same thing. It would be inefficient in space. Impossible to process (e.g. find all the instances of a particular character, or sequence), and so forth. Given that its been agreed that private use ranges are a good thing, and given that we can agree that exchanging data is a good thing, maybe something should bring those two things together. Just a thought. — Chris On Fri, May 29, 2015 at 9:45 AM, Mark E. Shoulson m...@kli.org wrote: As was pointed out to me, essentially what you are saying is you reject my premise that one size does not fit all. You would prefer *everything* be in plain text, so you wouldn't have to use other formats for it. You're essentially converting plain text into THE format for everything. But it isn't suited for that. If you really believe one size should fit all in this way, I think the problem is that pretty much all of the rest of the computer science community doesn't agree with you. Sorry. ~mark On 05/28/2015 07:50 AM, William_J_G Overington wrote: Responding to Mark E. Shoulson: The big advantage of this new format is that the result is an unambiguous Unicode plain text file and could be placed within a file of plain text without having to make the whole document a markup file to some format. Plain text is the key advantage. The following may be useful as a guide to the original problem that I am trying to solve. http://www.unicode.org/reports/tr51/tr51-2.html#Longer_Term I tried to apply the brilliant new base character followed by tag characters format to the problem. In the future, maybe Serif DrawPlus will have the ability to export a picture to this new format. William Overington 28 May 2015
Re: Tag characters and in-line graphics (from Tag characters)
Responding to Mark E. Shoulson: The big advantage of this new format is that the result is an unambiguous Unicode plain text file and could be placed within a file of plain text without having to make the whole document a markup file to some format. Plain text is the key advantage. The following may be useful as a guide to the original problem that I am trying to solve. http://www.unicode.org/reports/tr51/tr51-2.html#Longer_Term I tried to apply the brilliant new base character followed by tag characters format to the problem. In the future, maybe Serif DrawPlus will have the ability to export a picture to this new format. William Overington 28 May 2015
Re: Tag characters
Doug, Read on in the minutes to the next day. 143-C27 and related actions. There are a few things to keep in mind here. 1. The un-deprecation of the tags U+E0020..U+E007E *is* part of the UCD for Unicode 8.0. The change has already taken place in the revised beta files now posted (see PropList.txt), and will be part of the 8.0 release next month. 2. UTR #51, while scheduled to come out at the same time as the Unicode 8.0 release, is a UTR and is not formally either a part of the Unicode Standard per se, nor a formal part of the Unicode 8.0 release. 3. As per the minutes, when the approved version of UTR #51 is first published, more or less simultaneously with the Unicode 8.0 release (and explaining other aspects of emoji related to the release, such as the use of emoji modifiers), it will *not* yet contain the flag-tag discussion and mechanism. 4. Once the PRI is up, it will be used as the basis for the next proposed update of UTR #51. And the review of that proposed update and publication of the *subsequent* revision of UTR #51 need not wait for the next Unicode release (9.0 in summer, 2016). So at that point, the flag-tag mechanism will be available for use *with* Unicode 8.0 -- it just won't be a formal part of the release per se. Clear? --Ken On 5/27/2015 10:49 AM, Doug Ewell wrote: On Tuesday, May 19, Mark Davis mark at macchiato dot com wrote: A more concrete proposal will be in a PRI to be issued soon, If the new mechanism is intended for Unicode 8.0, as stated in the minutes at http://www.unicode.org/L2/L2015/15107.htm#143-M1 ... ... and if Unicode 8.0 is planned for release in June, 2015, as stated on the Beta Review page... ... and if June 2015 starts in less than a week... ... shouldn't we be seeing that PRI real soon now? -- Doug Ewell | http://ewellic.org | Thornton, CO
RE: Tag characters
Ken Whistler kenwhistler at att dot net wrote: Read on in the minutes to the next day. 143-C27 and related actions. Ah. Thank you. Now I understand what Steven meant by read the minutes, too. That's the problem with reading individual items in meeting minutes: each item is a snapshot in time, and the next day of the meeting might have brought no change, or a big change. -- Doug Ewell | http://ewellic.org | Thornton, CO
RE: Tag characters
Well, the same reasoning could also argue for the contra-positive (a→b ⊨ ¬b→¬a): that UTC should not consider endorsing such a tag scheme. Peter From: William_J_G Overington [mailto:wjgo_10...@btinternet.com] Sent: Wednesday, May 27, 2015 12:54 AM To: unicode@unicode.org; Peter Constable; eric.mul...@efele.net; asmus-...@ix.netcom.com Subject: Re: Tag characters Peter Constable wrote as follows: Would Unicode really want to get into the business of running a UFL service? Well, Unicode is about precision, interoperability and long-term stability, and, given, in relation to one particular specified base character followed by some tag characters, that a particular sequence of Unicode characters is intended to lead to the display of an image representing a particular flag, it seems to me highly reasonable that the Unicode Technical Committee might seriously consider providing that facility. William Overington 27 May 2015
RE: Tag characters and in-line graphics (from Tag characters)
William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote: Please feel free to suggest improvements. http://en.wikipedia.org/wiki/Scalable_Vector_Graphics -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters
Thanks Ken; and yes Doug; http://www.unicode.org/L2/L2015/15107.htm#143-C27 was the reference I was looking for when I wrote my too- brief reply earlier. My apologies. S Enviado desde nuestro iPhone. On May 27, 2015, at 2:06 PM, Doug Ewell d...@ewellic.org wrote: Ken Whistler kenwhistler at att dot net wrote: Read on in the minutes to the next day. 143-C27 and related actions. Ah. Thank you. Now I understand what Steven meant by read the minutes, too. That's the problem with reading individual items in meeting minutes: each item is a snapshot in time, and the next day of the meeting might have brought no change, or a big change. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters and in-line graphics (from Tag characters)
I think I've figured out the philosophy WJGO is trying to follow here. We should have a way to encode graphics in Unicode We should have a way to encode programming instructions in Unicode How about We should have a way to encode sound-waves in Unicode? Or We should have a way to encode *moving* graphics, maybe with sound, in Unicode? Now, he didn't say the last two, in fairness to him. But I think that's the thinking. WJGO, not *everything* computers do has to be part of Unicode. Doing so essentially makes *everything* that wants to support Unicode have to be... well, pretty much *everything* all other computers are. We have graphics formats that encode graphics; they're *good* at it. They're made for it. We have sound formats for encoding sounds. We have various bytecodes for programming--different ones, written by different people, that do things in different ways, because one size does not fit all. Unicode can't be the one size. It was never intended to. Don't make Unicode into an operating system, or worse, THE operating system. It's a character encoding. For encoding characters. ~mark On 05/27/2015 12:26 PM, William_J_G Overington wrote: Tag characters and in-line graphics (from Tag characters) This document suggests a way to use the method of a base character together with tag characters to produce a graphic. The approach is theoretical and has not, at this time, been tried in practice. The application in mind is to enable the graphic for an emoji character to be included within a plain text stream, though there will hopefully be other applications.
Re: Tag characters
Peter Constable wrote as follows: Would Unicode really want to get into the business of running a UFL service? Well, Unicode is about precision, interoperability and long-term stability, and, given, in relation to one particular specified base character followed by some tag characters, that a particular sequence of Unicode characters is intended to lead to the display of an image representing a particular flag, it seems to me highly reasonable that the Unicode Technical Committee might seriously consider providing that facility. William Overington 27 May 2015
Re: Tag characters
On 5/21/2015 1:25 PM, Asmus Freytag (t) wrote: On 5/21/2015 8:46 AM, Peter Constable wrote: Would Unicode really want to get into the business of running a UFL service? I suspect both Eric and I may have have been slightly tongue-in-cheek with respect to UFLs... Actually, I was serious. Eric.
Re: Tag characters
Aww... I was SURE you meant UFOs! On 2015-05-26 09:48, Eric Muller wrote: On 5/21/2015 1:25 PM, Asmus Freytag (t) wrote: On 5/21/2015 8:46 AM, Peter Constable wrote: Would Unicode really want to get into the business of running a UFL service? I suspect both Eric and I may have have been slightly tongue-in-cheek with respect to UFLs... Actually, I was serious. Eric. No virus found in this message. Checked by AVG - www.avg.com http://www.avg.com Version: 2015.0.5961 / Virus Database: 4354/9871 - Release Date: 05/26/15
Re: Tag characters
On 5/21/2015 8:46 AM, Peter Constable wrote: Would Unicode really want to get into the business of running a UFL service? I suspect both Eric and I may have have been slightly tongue-in-cheek with respect to UFLs... ... not sure about anybody else. Cheers, A./ P *From:*Unicode [mailto:unicode-boun...@unicode.org] *On Behalf Of *Asmus Freytag (t) *Sent:* Wednesday, May 20, 2015 10:15 PM *To:* Eric Muller; unicode@unicode.org *Subject:* Re: Tag characters On 5/20/2015 9:57 PM, Eric Muller wrote: On 5/20/2015 7:11 PM, Doug Ewell wrote: In any event, URLs that point to images would be an awful basis for an encoding. I would make an exception for the URL http://unicode.org/Public/8.0.0/ucd/StandardizedFlags.html http://unicode.org/Public/8.0.0/ucd/StandardizedFlags.html. Eric. Currently that gives me Not Found The requested URL /Public/8.0.0/ucd/StandardizedFlags.html was not found on this server. :) However, I agree, all we need to do is create a UFL (Universal Flag Locator) and we can keep it as stable as we want. A./
RE: Tag characters
I don’t think so. Sincerely, Erkki Lähettäjä: Unicode [mailto:unicode-boun...@unicode.org] Puolesta Peter Constable Lähetetty: 21. toukokuuta 2015 18:46 Vastaanottaja: Asmus Freytag (t); Eric Muller; unicode@unicode.org Aihe: RE: Tag characters Would Unicode really want to get into the business of running a UFL service? P From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Asmus Freytag (t) Sent: Wednesday, May 20, 2015 10:15 PM To: Eric Muller; unicode@unicode.org Subject: Re: Tag characters On 5/20/2015 9:57 PM, Eric Muller wrote: On 5/20/2015 7:11 PM, Doug Ewell wrote: In any event, URLs that point to images would be an awful basis for an encoding. I would make an exception for the URL http://unicode.org/Public/8.0.0/ucd/StandardizedFlags.html. Eric. Currently that gives me Not Found The requested URL /Public/8.0.0/ucd/StandardizedFlags.html was not found on this server. :) However, I agree, all we need to do is create a UFL (Universal Flag Locator) and we can keep it as stable as we want. A./
RE: Tag characters
Would Unicode really want to get into the business of running a UFL service? P From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Asmus Freytag (t) Sent: Wednesday, May 20, 2015 10:15 PM To: Eric Muller; unicode@unicode.org Subject: Re: Tag characters On 5/20/2015 9:57 PM, Eric Muller wrote: On 5/20/2015 7:11 PM, Doug Ewell wrote: In any event, URLs that point to images would be an awful basis for an encoding. I would make an exception for the URL http://unicode.org/Public/8.0.0/ucd/StandardizedFlags.html. Eric. Currently that gives me Not Found The requested URL /Public/8.0.0/ucd/StandardizedFlags.html was not found on this server. :) However, I agree, all we need to do is create a UFL (Universal Flag Locator) and we can keep it as stable as we want. A./
Re: Tag characters
2015-05-21 4:11 GMT+02:00 Doug Ewell d...@ewellic.org: Philippe Verdy wrote: URLs were initially deisgned to be stable (and this is still a strong recommendation). [+ 559 words] It doesn't matter if they were designed to be stable. Users don't keep them stable. I can't believe we're debating whether URLs are stable on a list where people have raised concerns about whether 50 years is stable enough for ISO 3166-1. I just say that the URL encoding itself is stable and allows to use them for stable references. The W3C itself uses URIs (in fact just URLs, even if they don't return a resource when queried) for making the XML schemas identifiables. In SGML there are similar stable identifiers (but in a naming scheme). In both cases they are meant to make identifiers unique and stable over time. An URL does NOT have to return a stable content, it JUST has to remain stable by itself. There's absolutely no obligation for its associated content to be accessible or retrievable. It will survive even if the referenced content is later changed or deleted: an URL is a valid URI, it is an identifier.
Re: RE: Tag characters
Peter Constable wrote as follows. Evidently there were more than two type of people. There are those who feel 50 years is long enough; there are others who feel that five years is long enough; there are likely others that feel 75 or 30 or some other values are long enough. Then there are also those who feel that any finite length is probably not long enough. Unicode is about long-term stability. Hopefully the people in charge of the codes to be used for the flags will agree never to reuse a code. Whether they do or not, would it be good to add an option into the tag coding of the flags whereby at the end one may optionally add TAG COLON then at least four TAG DIGIT characters, those TAG DIGIT characters representing the year? This feature would be ready if a future archivist finds the need to edit a text from years before so that it would display as its author intended, and indeed an author could use the method now so as to lock in his or her meaning. This could also be of use now so as to display such items as the flag of the USA at various historical periods. It would be helpful if a particular year were chosen for normalization purposes: for example for the flag of the USA used in the 1940s and most of the 1950s have one particular year rather than just using any year within the period when that particular design of flag was in use. Also for other flags at various historical periods. It has been speculated that had Scotland left the United Kingdom as a result of the referendum last year (in the event, the people voted for Scotland to stay in the United Kingdom) that the flag of the United Kingdom would have become changed, though some people advocated keeping it the same anyway. William Overington 20 May 2015
Re: Tag characters
Well for now a reasonnably stable standard exists: URLs, that can point to a collection of pagenames (each site can choose its own registry to name/encode the flags) URLs are then returening images (you can make a site that can return images in several formats and with variable sizes as well or with some transforms such as rotations, flips, animations... Instad of just isolated URLs, you can organize them into a base URL or static URL with query (acting as a resolver address), and then append the URN (name or code of the flags, which can include historic variants), and then allow the base URL to be replaced : keep just the part of the URL (end of pathname, or part of the query string) as standard and you get what is generally termed a mirror. Mirrors however are not nececessarily bound to remain in the web, they can be any locals store (e.g in a local IP file, or a folder in your filesystem). Basically, even the existing FOTW site (and its mirrors) can be already seen as supporting these relatively stable URNs (provded that the site is not retructuring constantly its URLs and file names are kept or at least resolved by keeping internally redirecting links) So what is need is just a way to support URLs. However URLs today can be IRIs and contain most of Unicode and we cannot duplicte this code. It is however possible to do that by using the chracter sets used by Punycode (for domain names). But if FOTW just designs a naming convnetion for the paths it supports, so that it will use only a restricted set (ASCII letters, digits, and punctuation, with only some restrictions on slashs and controls) it is possible to use them as partial path names (excluding also file extension in file names) that can be used as URNs, and act as identifiers (all other parameters: size, transforms, image formats... should be separate parameters). And with this restricted set, it is possible to encode them in a stable (but still very extensible) way. 2015-05-20 19:35 GMT+02:00 Doug Ewell d...@ewellic.org: William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote: Hopefully the people in charge of the codes to be used for the flags will agree never to reuse a code. Normally I would completely agree about the need for archival stability. In this case, however, we are talking about flags used primarily as emoji, like the one in my signature block. People will pop these flags into their text messages alongside party or celebration icons. I'm not sure the requirement for stability is quite as critical as it might be. However... Whether they do or not, would it be good to add an option into the tag coding of the flags whereby at the end one may optionally add TAG COLON then at least four TAG DIGIT characters, those TAG DIGIT characters representing the year? It's remarkable how similar this suggestion is to a discussion between Philippe and me two years ago. There is currently no well-known coding system for flags -- the owner of the Flags of the World site doesn't know of one -- and there should be. (The term flag code already has two meanings that are very different from this, which makes it hard to find information.) Getting UTC to accept the extended syntax of a standard like this would, of course, require that the standard gain reasonable acceptance and popularity beforehand. Requiring it to become an ISO standard might not be unreasonable. If you want to discuss this specific idea further, please write to me privately and *not to the list*. -- Doug Ewell | http://ewellic.org | Thornton, CO
RE: Tag characters
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: Well for now a reasonnably stable standard exists: URLs, that can point to a collection of pagenames (each site can choose its own registry to name/encode the flags) URLs are the opposite of stability. Anyone can post whatever they like, publish the URL, then change or remove the content at any time. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters
On Wed, 20 May 2015 17:15:28 -0700 Asmus Freytag (t) asmus-...@ix.netcom.com wrote: Have there been any discussions of the flag alphabet? (Signal flags). It seems to me that when schemes for representing sets of flags are discussed, it would be useful to keep open the ability to use the same scheme for signal flags -- perhaps with a different base character to avoid collisions in the letter codes. If these are worthy of coding, I think the Unified Canadian Aboriginal Syllabics would be a better model - encode the form, not the semantic. Braille is another precedent. Richard.
Re: Tag characters
URLs were initially deisgned to be stable (and this is still a strong recommendation). However I did not describe just URLs but URNs (whose URLs are just resolvers locating them). URNs share with URLs (and URIs in general, as well the UCS) the initial U which is intended to be universal (both in space but also in time). The problem being that it is still open to anyone that do not want to maintain this stability (but also because URLs have a limit of time which is the time of registration of their domain name, this limits their universility in time). The web also is currently having difficulties to maintain its universitlity in space (look for ongoing political discussions for its neutrality). URNs however should be stable... provided that there's a stable registry for maintaining the references. (the UCS is stable only because this registry exists and is managed by a joint authority which is also still managed and with enough participants so that no other attempts are made to concurrence it with the same success). Stability laregely depends on the status of the standard that supports it, and by the number of interested people that want to participate. It is never warrantied over a long time as any particopant may decide to retive from the project). But stability also requires that the participants do not change their mind in that project. Such such is less likely to occur if there are lot of users of the standard. Even the UCS has had its own history of instability in its early versions. And it's very difficult to maintain this stability when frequently there are people that contest this stability (sometimes in the UCS this means that a new proprerty must be designed to satisfy more people, but this also adds to the total cost of management of the whole standard, however new sets of characters are now slowing down. The remaining ones are a few isolates to complement existing scripts, or scripts that are extremely similar in structure to existing ones, for which compeltely new solutions rarely need to be designed. Most important difficulties are solved, even for the remaining scripts that need to be encoded ... except the more recent addition of emojis where we still cannot see how they will be bounded in scope (and I count flags within emojis), and scripts with complex layouts for which there are still missing standard solutions (e.g. SignWriting, hieroglyphs and old cuneiforms). We'll probably have more discussions about conventional symbols used in signalisation (e.g. signals on roads, including traffic lights, and marks on the ground), or conventional signs on products (standard conformance marks...) and various security related symbols. We know we are stable only for alphabetic/phonetic scripts, but we have lots of candidate symbols and ideograms (whose creation and explosion in definitely not terminated, and do not concern just CJK scripts). The industry and legislations are creating new symbols every day around the world... and also deprecating a lot at almost the same rate. So yes URLs can be stable, but only those from recognized standard bodies that want to maintain them stable (e.g. URLs to W3C standards are stable... but not necessarilyt all tose linking to temporary discussions. The same is true for URLs to temporary work documents used by the UTC or ISO, or W3C themselves where docuemtns may be moved elsewherein some archives and with other formats, loosing some formatting details). 2015-05-20 20:57 GMT+02:00 Doug Ewell d...@ewellic.org: Philippe Verdy verdy underscore p at wanadoo dot fr wrote: Well for now a reasonnably stable standard exists: URLs, that can point to a collection of pagenames (each site can choose its own registry to name/encode the flags) URLs are the opposite of stability. Anyone can post whatever they like, publish the URL, then change or remove the content at any time.
RE: Tag characters
I've always been a bit partial to them and found it odd that they are intentionally not included in Unicode. Especially the novel concepts like the repeats. -Original Message- From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Richard Wordingham Sent: Wednesday, May 20, 2015 6:08 PM To: unicode@unicode.org Subject: Re: Tag characters On Wed, 20 May 2015 17:15:28 -0700 Asmus Freytag (t) asmus-...@ix.netcom.com wrote: Have there been any discussions of the flag alphabet? (Signal flags). It seems to me that when schemes for representing sets of flags are discussed, it would be useful to keep open the ability to use the same scheme for signal flags -- perhaps with a different base character to avoid collisions in the letter codes. If these are worthy of coding, I think the Unified Canadian Aboriginal Syllabics would be a better model - encode the form, not the semantic. Braille is another precedent. Richard.
Re: Tag characters
Have there been any discussions of the flag alphabet? (Signal flags). They are not that infrequently used online or in print, although the concentration tends to be higher in publications/sites geared to nautical audiences (not that different from chess pieces and chess publications). Now, before you leap on the it's just a font bandwagon, consider that the signal flags not only represent letters and digits, but also contain special pennants for functions like repeat once to repeat four times as well as a number of special flags that are associated with two-letter codes. Also, the use of certain individual flags has conventional meanings other than the letter itself, so a reference to the flag in text would not necessarily survive a font substitution, because you'd lose the fact that you are talking about flags. Some of these uses have spread to enthusiasts, for example divers like to use the old PO flag (that curiously is now obsolete for this purpose) as a logo for their sport. The diver down flag (flag A) is now a different one in the International Regulations for the Prevention of Collisions at Sea (IRPCAS), but for e-moji style use that would not matter as the other one (whatever it's origin) is now the recognized tribal symbol for divers. It seems to me that when schemes for representing sets of flags are discussed, it would be useful to keep open the ability to use the same scheme for signal flags -- perhaps with a different base character to avoid collisions in the letter codes. A./
Re: Tag characters
On Wed, 20 May 2015 17:29:28 +0100 (BST) William_J_G Overington wjgo_10...@btinternet.com wrote: This could also be of use now so as to display such items as the flag of the USA at various historical periods. It would be helpful if a particular year were chosen for normalization purposes: for example for the flag of the USA used in the 1940s and most of the 1950s have one particular year rather than just using any year within the period when that particular design of flag was in use. That is a singularly poor example. An example that would jar is the use of the tricolour to represent France in an account of the Hundred Years' War, or the present German flag to represent Germany in an account of the Second World War. A problem we have is that flags are not stable enough to use in plain text that is to last a human lifetime. It has been speculated that had Scotland left the United Kingdom as a result of the referendum last year (in the event, the people voted for Scotland to stay in the United Kingdom) that the flag of the United Kingdom would have become changed, though some people advocated keeping it the same anyway. It won't be kept if England secedes from the UK so as to leave the European Union. It may not be a likely outcome, but it's certainly a possibility. Richard.
Re: Tag characters
On 5/20/2015 9:57 PM, Eric Muller wrote: On 5/20/2015 7:11 PM, Doug Ewell wrote: In any event, URLs that point to images would be an awful basis for an encoding. I would make an exception for the URL http://unicode.org/Public/8.0.0/ucd/StandardizedFlags.html. Eric. Currently that gives me Not Found The requested URL /Public/8.0.0/ucd/StandardizedFlags.html was not found on this server. :) However, I agree, all we need to do is create a UFL (Universal Flag Locator) and we can keep it as stable as we want. A./
Re: Tag characters
On 5/20/2015 6:14 PM, Shawn Steele wrote: I've always been a bit partial to them and found it odd that they are intentionally not included in Unicode. Especially the novel concepts like the repeats. :) If I were to write an actual proposal I would suggest naming them after their international/modern use, but with the understanding that the actual interpretation would be based on whatever signalling system you intend to follow. None of the existing users would be helped by having them named after their shapes and colors. That is because some of the shapes and colors are a bit complex an nobody I know learns them by description. In a way, this is also what we do for many standard alphabets. We encode LATIN SMALL LETTER O, not small letter looking like a round circle, and we leave it to the language whether to pronounce that long like an oh or short, as in hot (for English) or more as an oo sound, as in Swedish. We pick a conventional name for the element of the alphabet, and then allow variations in use. (Some of the consonants show much greater variation in pronunciation). When I said naming we should use the alphabetic abbreviations that they are associated with so that we can fit them into an open ended system, like the other flags. Then, whatever techniques we will be using (such as UFLs - Universal Flag Locators) would apply to them analogously to the national flags. A./ -Original Message- From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Richard Wordingham Sent: Wednesday, May 20, 2015 6:08 PM To: unicode@unicode.org Subject: Re: Tag characters On Wed, 20 May 2015 17:15:28 -0700 Asmus Freytag (t) asmus-...@ix.netcom.com wrote: Have there been any discussions of the flag alphabet? (Signal flags). It seems to me that when schemes for representing sets of flags are discussed, it would be useful to keep open the ability to use the same scheme for signal flags -- perhaps with a different base character to avoid collisions in the letter codes. If these are worthy of coding, I think the Unified Canadian Aboriginal Syllabics would be a better model - encode the form, not the semantic. Braille is another precedent. Richard.
Re: Tag characters
On 5/20/2015 7:11 PM, Doug Ewell wrote: In any event, URLs that point to images would be an awful basis for an encoding. I would make an exception for the URL http://unicode.org/Public/8.0.0/ucd/StandardizedFlags.html. Eric.
RE: Tag characters
Evidently there were more than two type of people. There are those who feel 50 years is long enough; there are others who feel that five years is long enough; there are likely others that feel 75 or 30 or some other values are long enough. Then there are also those who feel that any finite length is probably not long enough. Peter -Original Message- From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Doug Ewell Sent: Tuesday, May 19, 2015 10:01 AM To: Unicode Mailing List Cc: William_J_G Overington Subject: Re: Tag characters William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote: Hopefully the MA will adhere to the new 50-year limit. What is MA please? Maintenance Agency: http://www.iso.org/iso/home/standards/country_codes.htm A 50-year limit seems far too short a time. There are two types of people: those who feel 50 years is too short, and those who feel it is too long. Fifty years is much better than five, which was the previous limit. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters
2015-05-19 7:18 GMT+02:00 Mark Davis ☕️ m...@macchiato.com: There is a difference between EU and UN; the former is in BCP47. That being said, we could look at making the exceptionally reserved codes valid for this purpose (or at least the UN code). It appears that there are only 3 exceptionally reserved codes that aren't in BCP47: EZ, UK, UN. There are also reserved codes for WIPO areas; there are special codes requested by ITU and UPU or not removed from ISO3166 also on their demand for maintaining their own standards (may be there will be other codes requested by IATA and OACI or some international railways organisation, or maritime organisation for oceans in the international waters). Thanks for now we don't have to handle specific region code for the Moon or divisions of the solar system, or even for some groups of orbital airspace over the Earth (from stratospheric to geostationnary), as for now they are still considered international (and country laws only apply to individual equipements or when they have to fall back to ground or preferably oceans)... We could as well imagine other regions like poles, or hemispheres, or 1 hour (15°) bands of longitude (excluding polar areas within arctic/antarctic circle or within the +/-85°circle, commonly used in geography for showing maps with Mercator projections) There are various standards that define codes for their regions; some of them have political importances, and some have specific localized data associated to them and for which there must not exist collisions with existing or future ISO3166-1 country codes. For such applications however aplpications should use the concept of namespace to qualify each code source (ISO3166 being just one of them, IETF being another one, the local application using another namespace if needed for its regions; the same remark also applies if there's need of private codes for pseudo-languages or pseudo-language-variants or pseudo-scripts), and with the mechanism of namespaces you could even track versions (like it is used in XMLNS)
Re: Tag characters
Re: Tag characters Mark Davis ⛾ mark at macchiato dot com wrote: A more concrete proposal will be in a PRI to be issued soon, and people will have a chance to comment more then. I'll hold off on most other questions until the PRI appears. The principal reason for 3 digit codes is because that is the mechanism used by BCP47 in case ISO screws up codes (as they did for CS). Hopefully the MA will adhere to the new 50-year limit. The example given in the proposal talked about trans-national flags. The syntax does not need to follow the 3166 syntax - the codes correspond but are not the same anyway. So we didn't see the necessity for the hyphen, syntactically. Well, the codes are the same, but you're defining a new syntax, so you get to remove the hyphen if you want to. But again, the proposal didn't say that. There is a difference between EU and UN; the former is in BCP47. I didn't know that was relevant to flag tagging. Just because a code is valid doesn't mean that there is a flag associated with it. Of course not. I'd also not expect CLDR or Unicode or even vendors to keep track of every state and territory flag around the world. Vendors will support some subset of flags of their choice, just as they currently do, and that's consistent with existing Unicode principles about not having to display every possible character. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters
Doug Ewell wrote: Hopefully the MA will adhere to the new 50-year limit. The example given in the proposal talked about trans-national flags. What is MA please? A 50-year limit seems far too short a time. With that figure, a document could have its meaning retrospectively changed at least 20 years before its copyright runs out, and maybe a lot longer before its copyright runs out, maybe as much as 80 years before its copyright runs out, or even longer! Surely for archiving our culture, and the British Library is actively archiving, there should never be a retrospective change of meaning. William Overington 19 May 2015
Re: Tag characters
William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote: Hopefully the MA will adhere to the new 50-year limit. What is MA please? Maintenance Agency: http://www.iso.org/iso/home/standards/country_codes.htm A 50-year limit seems far too short a time. There are two types of people: those who feel 50 years is too short, and those who feel it is too long. Fifty years is much better than five, which was the previous limit. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters
A few notes. A more concrete proposal will be in a PRI to be issued soon, and people will have a chance to comment more then. (I'm not trying to discourage discussion, just pointing out that there will be something more concrete relatively soon to comment on—people are pretty busy getting 8.0 out the door right now.) The principal reason for 3 digit codes is because that is the mechanism used by BCP47 in case ISO screws up codes (as they did for CS). The syntax does not need to follow the 3166 syntax - the codes correspond but are not the same anyway. So we didn't see the necessity for the hyphen, syntactically. There is a difference between EU and UN; the former is in BCP47. That being said, we could look at making the exceptionally reserved codes valid for this purpose (or at least the UN code). It appears that there are only 3 exceptionally reserved codes that aren't in BCP47: EZ, UK, UN. Just because a code is valid doesn't mean that there is a flag associated with it. Just like the fact that you can have the BCP47 code ja-Ahom-AQ doesn't mean that it denotes anything useful. I'd expect vendors to not waste time with non-existent flags. However, we could also discuss having a mechanism in CLDR to help provide guidelines as to which subdivisions are suitable as flags. Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Sat, May 16, 2015 at 10:07 AM, Doug Ewell d...@ewellic.org wrote: L2/15-145R says: On some platforms that support a number of emoji flags, there is substantial demand to support additional flags for the following: [...] Certain supra-national regions, such as Europe (European Union flag) or the world (e.g. United Nations flag). These can be represented using UN M49 3-digit codes, for example 150 for Europe or 001 for World. These are uncomfortable equivalence classes. Not all countries in Europe are members of the European Union, and the concept of United Nations is not really the same by definition as all countries in the world. The remaining UN M.49 code elements that don't have a 3166-1 equivalent seem wholly unsuited for this mechanism (and those that do, don't need it). There are no flags for Middle Africa or Latin America and the Caribbean or Landlocked developing countries. Some trans-national organizations might _almost_ seem as if they could be shoehorned into an M.49 code element, like identifying 035 South-Eastern Asia with the ASEAN flag, but this would be problematic for the same reasons as 150 and 001. Among the ISO 3166-1 exceptionally reserved code elements are EU for European Union and UN for United Nations. If these flags are the use cases, why not simply use those alpha-2 code elements, instead of burdening the new mechanism with the 3-digit syntax? -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters
2015-05-16 19:07 GMT+02:00 Doug Ewell d...@ewellic.org: L2/15-145R says: On some platforms that support a number of emoji flags, there is substantial demand to support additional flags for the following: [...] Certain supra-national regions, such as Europe (European Union flag) or the world (e.g. United Nations flag). These can be represented using UN M49 3-digit codes, for example 150 for Europe or 001 for World. These are uncomfortable equivalence classes. Not all countries in Europe are members of the European Union But the flag of the European in fact belongs to the Council of Europe that created it 30 years before the European Community adopted it. According to the Coucil of Europe, the flag is appropriate for ALL countries in Europe. In summary the flag does represents *not only* the EU. It is suitable as well for Russia, Belarussia (even if its seat is suspended in the Coucil of Europe), or Kazakhstan and Turkey (even if only a part of these countries is in Europe). and the concept of United Nations is not really the same by definition as all countries in the world. Yes but the UN recognizes a set of territories (not always their government) that covers the whole world (including Antarctica where no government is also recognized, as well as territorial waters of these territories, plus the international waters that the UN protects). Not all countries also are required to become members of the UN (the Holy See/Vatica is not a full member, but it is recognized; same remark for Palestine). So the UN has a competence on the whole world, and all people of the world can legally seek protection from the UN, wherever they live, or even if they have no country to recognize them a nationality). If you want to seek territories where the UN has no authority at all, the nearest ones are on the Moon !
Re: Tag characters
See the meeting minutes and the actual utr51. Enviado desde nuestro iPhone. El may 16, 2015, a las 10:07 AM, Doug Ewell d...@ewellic.org escribió: L2/15-145R says: On some platforms that support a number of emoji flags, there is substantial demand to support additional flags for the following: [...] Certain supra-national regions, such as Europe (European Union flag) or the world (e.g. United Nations flag). These can be represented using UN M49 3-digit codes, for example 150 for Europe or 001 for World. These are uncomfortable equivalence classes. Not all countries in Europe are members of the European Union, and the concept of United Nations is not really the same by definition as all countries in the world. The remaining UN M.49 code elements that don't have a 3166-1 equivalent seem wholly unsuited for this mechanism (and those that do, don't need it). There are no flags for Middle Africa or Latin America and the Caribbean or Landlocked developing countries. Some trans-national organizations might _almost_ seem as if they could be shoehorned into an M.49 code element, like identifying 035 South-Eastern Asia with the ASEAN flag, but this would be problematic for the same reasons as 150 and 001. Among the ISO 3166-1 exceptionally reserved code elements are EU for European Union and UN for United Nations. If these flags are the use cases, why not simply use those alpha-2 code elements, instead of burdening the new mechanism with the 3-digit syntax? -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters
Steven R. Loomis wrote: See the meeting minutes and the actual utr51. Sorry, I didn't find anything dealing with numeric codes in Section E.1.3 of the meeting minutes, and the copy of UTR #51 at unicode.org doesn't appear to have been updated for anything beyond the existing RIS. What specifically should I be looking for? -- Doug Ewell | http://ewellic.org | Thornton, CO
RE: Future of Emoji? (was Re: Tag characters)
Ah,yes. And Messenger “winks”. E.g., http://www.msn-tools.net/free-msn-winks-1.htm I note that this has .swf files, and that’s what we saw one of the Japanese carriers saying they’d be moving to instead of PUA characters. Peter From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Peter Constable Sent: Friday, May 15, 2015 8:47 AM To: Shervin Afshar Cc: unicode@unicode.org Subject: RE: Future of Emoji? (was Re: Tag characters) MSN Messenger supported extensible stickers years ago. A couple of sites still offering add-ons: http://www.getsmile.com/ http://www.smileys4msn.com/ Peter From: Shervin Afshar [mailto:shervinafs...@gmail.com] Sent: Thursday, May 14, 2015 10:40 PM To: Peter Constable Cc: unicode@unicode.orgmailto:unicode@unicode.org Subject: Re: Future of Emoji? (was Re: Tag characters) Good point. I missed these while looking into compatibility symbols. Of course, as with Yahoo[1] and MSN[2] Messenger emoji sets, most of these are mappable to current or proposed sets of Unicode emoji (e.g. Lips Sealed ≈ U+1F910 ZIPPER-MOUTH FACE). It would be interesting to see how the extended support for flags, most of smiley faces, objects, etc. on all platforms would affect this approach. My idea of a sticker-based solution is something more like Facebook's[3] or Line's[4] implementations. [1]: http://www.unicode.org/L2/L2015/15059-emoji-im-yahoo.pdf [2]: http://www.unicode.org/L2/L2015/15058-emoji-im-msn.pdf [3]: http://www.huffingtonpost.com/2014/10/14/facebook-stickers-comments_n_5982546.html [4]: https://creator.line.me/en/guideline/ ↪ Shervin On Thu, May 14, 2015 at 9:37 PM, Peter Constable peter...@microsoft.commailto:peter...@microsoft.com wrote: Skype uses stickers, including animated stickers. Here’s the documented set: https://support.skype.com/en/faq/FA12330/what-is-the-full-list-of-emoticons And if you search, you’ll find lots more “hidden” emoticons, like “(bartlett)”. Peter From: Shervin Afshar [mailto:shervinafs...@gmail.commailto:shervinafs...@gmail.com] Sent: Thursday, May 14, 2015 8:12 PM To: Peter Constable Cc: unicode@unicode.orgmailto:unicode@unicode.org Subject: Future of Emoji? (was Re: Tag characters) Peter, This very topic was discussed in last meeting of the subcommittee and my impression is that there are plans to promote the use of embedded graphics (aka stickers) either through expansions to section 8 of TR51 or through some other means. It should also be noted that none of current members of Unicode seem to have a sticker-based implementation (with the exception of an experimental limited trial by Twitter[1]). [1]: http://mashable.com/2015/04/16/twitter-star-wars-emoji/ ↪ Shervin On Thu, May 14, 2015 at 7:44 PM, Peter Constable peter...@microsoft.commailto:peter...@microsoft.com wrote: And yet UTC devotes lots of effort (with an entire subcommittee) to encode more emoji as characters, but no effort toward any preferred longer term solution not based on characters. Peter From: Unicode [mailto:unicode-boun...@unicode.orgmailto:unicode-boun...@unicode.org] On Behalf Of Shervin Afshar Sent: Thursday, May 14, 2015 2:27 PM To: wjgo_10...@btinternet.commailto:wjgo_10...@btinternet.com Cc: unicode@unicode.orgmailto:unicode@unicode.org Subject: Re: Tag characters Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions IMO, the industry preferred longer term solution (which is also discussed in that section with few existing examples) for emoji, is not going to be based on characters. ↪ Shervin On Thu, May 14, 2015 at 1:40 PM, William_J_G Overington wjgo_10...@btinternet.commailto:wjgo_10...@btinternet.com wrote: What else would be possible if the same sort of technique were applied to another base character? Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions of http://www.unicode.org/reports/tr51/tr51-2.html ? Both colour pixel map and colour OpenType vector font solutions would be possible. Colour voxel map and colour vector 3d solids solutions are worth thinking about too as fun coding thought experiments that could possibly lead to useful practical results. William Overington 14 May 2015
RE: Future of Emoji? (was Re: Tag characters)
MSN Messenger supported extensible stickers years ago. A couple of sites still offering add-ons: http://www.getsmile.com/ http://www.smileys4msn.com/ Peter From: Shervin Afshar [mailto:shervinafs...@gmail.com] Sent: Thursday, May 14, 2015 10:40 PM To: Peter Constable Cc: unicode@unicode.org Subject: Re: Future of Emoji? (was Re: Tag characters) Good point. I missed these while looking into compatibility symbols. Of course, as with Yahoo[1] and MSN[2] Messenger emoji sets, most of these are mappable to current or proposed sets of Unicode emoji (e.g. Lips Sealed ≈ U+1F910 ZIPPER-MOUTH FACE). It would be interesting to see how the extended support for flags, most of smiley faces, objects, etc. on all platforms would affect this approach. My idea of a sticker-based solution is something more like Facebook's[3] or Line's[4] implementations. [1]: http://www.unicode.org/L2/L2015/15059-emoji-im-yahoo.pdf [2]: http://www.unicode.org/L2/L2015/15058-emoji-im-msn.pdf [3]: http://www.huffingtonpost.com/2014/10/14/facebook-stickers-comments_n_5982546.html [4]: https://creator.line.me/en/guideline/ ↪ Shervin On Thu, May 14, 2015 at 9:37 PM, Peter Constable peter...@microsoft.commailto:peter...@microsoft.com wrote: Skype uses stickers, including animated stickers. Here’s the documented set: https://support.skype.com/en/faq/FA12330/what-is-the-full-list-of-emoticons And if you search, you’ll find lots more “hidden” emoticons, like “(bartlett)”. Peter From: Shervin Afshar [mailto:shervinafs...@gmail.commailto:shervinafs...@gmail.com] Sent: Thursday, May 14, 2015 8:12 PM To: Peter Constable Cc: unicode@unicode.orgmailto:unicode@unicode.org Subject: Future of Emoji? (was Re: Tag characters) Peter, This very topic was discussed in last meeting of the subcommittee and my impression is that there are plans to promote the use of embedded graphics (aka stickers) either through expansions to section 8 of TR51 or through some other means. It should also be noted that none of current members of Unicode seem to have a sticker-based implementation (with the exception of an experimental limited trial by Twitter[1]). [1]: http://mashable.com/2015/04/16/twitter-star-wars-emoji/ ↪ Shervin On Thu, May 14, 2015 at 7:44 PM, Peter Constable peter...@microsoft.commailto:peter...@microsoft.com wrote: And yet UTC devotes lots of effort (with an entire subcommittee) to encode more emoji as characters, but no effort toward any preferred longer term solution not based on characters. Peter From: Unicode [mailto:unicode-boun...@unicode.orgmailto:unicode-boun...@unicode.org] On Behalf Of Shervin Afshar Sent: Thursday, May 14, 2015 2:27 PM To: wjgo_10...@btinternet.commailto:wjgo_10...@btinternet.com Cc: unicode@unicode.orgmailto:unicode@unicode.org Subject: Re: Tag characters Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions IMO, the industry preferred longer term solution (which is also discussed in that section with few existing examples) for emoji, is not going to be based on characters. ↪ Shervin On Thu, May 14, 2015 at 1:40 PM, William_J_G Overington wjgo_10...@btinternet.commailto:wjgo_10...@btinternet.com wrote: What else would be possible if the same sort of technique were applied to another base character? Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions of http://www.unicode.org/reports/tr51/tr51-2.html ? Both colour pixel map and colour OpenType vector font solutions would be possible. Colour voxel map and colour vector 3d solids solutions are worth thinking about too as fun coding thought experiments that could possibly lead to useful practical results. William Overington 14 May 2015
Re: Tag characters
The consortium is in no position to enhance protocols *itself* for exchanging images. That's firmly in other groups' hands. We can try to noodge them a bit, but what *will* make a difference is when the *vendors* of sticker solutions put pressure on the different groups responsible for the protocols to provide interoperability for images. Because there is a lot of growth in sticker solutions, I would expect there to be more such pressure. And even so, I expect it will take those some time to be deployed. We've said what our longer-term position is, and I think we all pretty much agree with that; exchanging images is much more flexible. However, we do have strong short-term pressure to show that we are responsive and responsible in adding emoji. And our adding a reasonable number of emoji per year is not going to stop Line or Skype from adding stickers! There are a few possible scenarios, and it's hard to predict the results. It could be that emoji are largely supplanted by stickers in 5 years; could be 10; could be that they both coexist indefinitely. I have no , and neither does anyone else... Mark https://google.com/+MarkDavis *— Il meglio è l’inimico del bene —* On Thu, May 14, 2015 at 7:44 PM, Peter Constable peter...@microsoft.com wrote: And yet UTC devotes lots of effort (with an entire subcommittee) to encode more emoji as characters, but no effort toward any preferred longer term solution not based on characters. Peter *From:* Unicode [mailto:unicode-boun...@unicode.org] *On Behalf Of *Shervin Afshar *Sent:* Thursday, May 14, 2015 2:27 PM *To:* wjgo_10...@btinternet.com *Cc:* unicode@unicode.org *Subject:* Re: Tag characters Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions IMO, the industry preferred longer term solution (which is also discussed in that section with few existing examples) for emoji, is not going to be based on characters. ↪ Shervin On Thu, May 14, 2015 at 1:40 PM, William_J_G Overington wjgo_10...@btinternet.com wrote: What else would be possible if the same sort of technique were applied to another base character? Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions of http://www.unicode.org/reports/tr51/tr51-2.html ? Both colour pixel map and colour OpenType vector font solutions would be possible. Colour voxel map and colour vector 3d solids solutions are worth thinking about too as fun coding thought experiments that could possibly lead to useful practical results. William Overington 14 May 2015
A few emoji per year... (was: Re: Tag characters)
And to put Mark's comments in some statistical perspective, in the context of all the media hype, the true big bang for emoji in Unicode was Version 6.0, released over 4-1/2 years ago now. *That* was the Unicode release that added hundreds and hundreds of emoji for Japanese carrier interoperability, as well as the regional indicator mechanism for the representation of flag pictographs. But at the time, relatively few people noticed, because no Unicode emoji were on phones yet. Unicode 7.0, which resulted in the huge media splash about emoji last year, actually only added 103 emoji, and the majority of those were very old news: old-fashioned pictographs for Webdings compatibility. There were only a few high visibility, emotionally catchy new additions among that set, such as the CHIPMUNK and the you-know-what-I'm-talking-about hand gesture, that convinced people this was a bigger deal new release than it was. But suddenly everything was visible on phones, and that made all the difference for the general public. Unicode 8.0 is about to be released, and it will have just 41 emoji additions -- among them the 5 emoji modifiers that are already available on phones to address the emoji diversity issue. And the UTC just approved 38 new emoji candidates that will be the likely basis of the emoji additions for Unicode 9.0 next year. Once we get through the Unicode 8.0 and Unicode 9.0 cycles, this process will have settled into a kind of a routine -- and it will be apparent to all what the likely scale and scope of future emoji additions *as Unicode characters* will be: a few dozen per year, carefully picked based on a set of criteria now to be set out in the new UTR #51 regarding emoji. The sky isn't falling here. ;-) The Unicode Consortium has not suddenly transmogrified into the Emoji Consortium. People will get used to the fact that a few dozen new emoji characters get added to the standard every year -- ho hum. And for folks who can't wait through the two-years-from-proposal-to-implementation cycles of character encoding committees, well... those stickers are out there waiting for you. --Ken On 5/15/2015 5:18 PM, Mark Davis ☕️ wrote: However, we do have strong short-term pressure to show that we are responsive and responsible in adding emoji. And our adding a reasonable number of emoji per year is not going to stop Line or Skype from adding stickers!
Re: Future of Emoji? (was Re: Tag characters)
These are all great pointers which we might want to look into more closely for expanding the longer term solution section in TR51 or any other document encouraging folks to use stickers. May be Microsoft people who are attending emoji SC can provide some insight on these issues, too. I think I still prefer the current situation compared to Japanese carriers having to go with .SWF! ↪ Shervin On Fri, May 15, 2015 at 8:57 AM, Peter Constable peter...@microsoft.com wrote: Ah,yes. And Messenger “winks”. E.g., http://www.msn-tools.net/free-msn-winks-1.htm I note that this has .swf files, and that’s what we saw one of the Japanese carriers saying they’d be moving to instead of PUA characters. Peter *From:* Unicode [mailto:unicode-boun...@unicode.org] *On Behalf Of *Peter Constable *Sent:* Friday, May 15, 2015 8:47 AM *To:* Shervin Afshar *Cc:* unicode@unicode.org *Subject:* RE: Future of Emoji? (was Re: Tag characters) MSN Messenger supported extensible stickers years ago. A couple of sites still offering add-ons: http://www.getsmile.com/ http://www.smileys4msn.com/ Peter *From:* Shervin Afshar [mailto:shervinafs...@gmail.com shervinafs...@gmail.com] *Sent:* Thursday, May 14, 2015 10:40 PM *To:* Peter Constable *Cc:* unicode@unicode.org *Subject:* Re: Future of Emoji? (was Re: Tag characters) Good point. I missed these while looking into compatibility symbols. Of course, as with Yahoo[1] and MSN[2] Messenger emoji sets, most of these are mappable to current or proposed sets of Unicode emoji (e.g. Lips Sealed ≈ U+1F910 ZIPPER-MOUTH FACE). It would be interesting to see how the extended support for flags, most of smiley faces, objects, etc. on all platforms would affect this approach. My idea of a sticker-based solution is something more like Facebook's[3] or Line's[4] implementations. [1]: http://www.unicode.org/L2/L2015/15059-emoji-im-yahoo.pdf [2]: http://www.unicode.org/L2/L2015/15058-emoji-im-msn.pdf [3]: http://www.huffingtonpost.com/2014/10/14/facebook-stickers-comments_n_5982546.html [4]: https://creator.line.me/en/guideline/ ↪ Shervin On Thu, May 14, 2015 at 9:37 PM, Peter Constable peter...@microsoft.com wrote: Skype uses stickers, including animated stickers. Here’s the documented set: https://support.skype.com/en/faq/FA12330/what-is-the-full-list-of-emoticons And if you search, you’ll find lots more “hidden” emoticons, like “(bartlett)”. Peter *From:* Shervin Afshar [mailto:shervinafs...@gmail.com] *Sent:* Thursday, May 14, 2015 8:12 PM *To:* Peter Constable *Cc:* unicode@unicode.org *Subject:* Future of Emoji? (was Re: Tag characters) Peter, This very topic was discussed in last meeting of the subcommittee and my impression is that there are plans to promote the use of embedded graphics (aka stickers) either through expansions to section 8 of TR51 or through some other means. It should also be noted that none of current members of Unicode seem to have a sticker-based implementation (with the exception of an experimental limited trial by Twitter[1]). [1]: http://mashable.com/2015/04/16/twitter-star-wars-emoji/ ↪ Shervin On Thu, May 14, 2015 at 7:44 PM, Peter Constable peter...@microsoft.com wrote: And yet UTC devotes lots of effort (with an entire subcommittee) to encode more emoji as characters, but no effort toward any preferred longer term solution not based on characters. Peter *From:* Unicode [mailto:unicode-boun...@unicode.org] *On Behalf Of *Shervin Afshar *Sent:* Thursday, May 14, 2015 2:27 PM *To:* wjgo_10...@btinternet.com *Cc:* unicode@unicode.org *Subject:* Re: Tag characters Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions IMO, the industry preferred longer term solution (which is also discussed in that section with few existing examples) for emoji, is not going to be based on characters. ↪ Shervin On Thu, May 14, 2015 at 1:40 PM, William_J_G Overington wjgo_10...@btinternet.com wrote: What else would be possible if the same sort of technique were applied to another base character? Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions of http://www.unicode.org/reports/tr51/tr51-2.html ? Both colour pixel map and colour OpenType vector font solutions would be possible. Colour voxel map and colour vector 3d solids solutions are worth thinking about too as fun coding thought experiments that could possibly lead to useful practical results. William Overington 14 May 2015
Re: Tag characters
What else would be possible if the same sort of technique were applied to another base character? Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions of http://www.unicode.org/reports/tr51/tr51-2.html ? Both colour pixel map and colour OpenType vector font solutions would be possible. Colour voxel map and colour vector 3d solids solutions are worth thinking about too as fun coding thought experiments that could possibly lead to useful practical results. William Overington 14 May 2015
Re: Tag characters
http://www.unicode.org/L2/L2015/15107.htm points indirectly to: http://www.unicode.org/L2/L2015/15145r-add-regional-ind.pdf which says: The proposal has two parts 1. Un-deprecate TAG characters E0020-E007E. Hee hee. Hee hee. 2. Define a character as the “base” for a following sequence of TAG characters that specifies a region or subregion to be represented using a sequence of TAG characters. There are two possibilities for the base character: a. Preferred: Use the Unicode 7.0 character WAVING WHITE FLAG: 1F3F3;WAVING WHITE FLAG;So;0;ON;N; The advantage is no new characters need be encoded. Add language to UTR #51 describing the mechanism given in 2A means that U+1F3F3 will be the tag introducer, basically the flag emoji equivalent of U+E0001 LANGUAGE TAG. I think I understand why the TAG/CANCEL TAG start-end mechanism which was invented for Plane 14 language tags wasn't reused for flag emoji. Adding U+E0002 FLAG TAG would have implied that the sequence ends with CANCEL TAG. Flags don't have scope and there is no need to indicate the end of the sequence explicitly for scoping purposes, as there is with tagged text. I assume that existing text with U+1F3F3 followed by no tag characters should continue to display the waving white flag glyph, whereas text conforming to this new mechanism should suppress that glyph and show the Scottish, Welsh, Delawarean, or Nordlending flag instead. Using the following notation - B designates the chosen base character (U+1F3F3 or new U+1F1E5) TL designates a TAG LATIN CAPITAL LETTER (A..Z) TD designates a TAG DIGIT (ZERO..NINE) TH designates TAG HYPHEN-MINUS - a well-formed sequence for for designating flags for ISO 3166-1, 3166-2 or UN M49 codes would be B ((TL{2} (TH (TL|TD){3})?) | (TD{3})) Will the subdivision sequence always be exactly 3 characters long? CLDR ticket #8423 seems to say that ISO 3166-2 code elements that are only 1 or 2 characters long will be prepended with xx or x to make them all exactly 3. Obviously some research will need to be done to ensure this doesn't result in conflicts with existing code elements, and of course 3166-2 makes no promises that future assignments will deliberately avoid such a conflict. Will both mechanisms, old and new, be available for encoding national flags? For example, for a French flag: 1F1EB 1F1F7 or 1F3F3 E0046 E0052 In CLDR 28, LDML will define a unicode_subdivision_subtag which also provides validity criteria for the codes used for regional subdivisions (see CLDR ticket #8423). When representing regional subdivisions using ISO 3166-2 codes, only those codes that are valid for the LDML unicode_subdivision_subtag should be used. I note that a preliminary file is already available at http://unicode.org/repos/cldr/trunk/common/supplemental/subdivisions.xml . -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Tag characters
Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions IMO, the industry preferred longer term solution (which is also discussed in that section with few existing examples) for emoji, is not going to be based on characters. ↪ Shervin On Thu, May 14, 2015 at 1:40 PM, William_J_G Overington wjgo_10...@btinternet.com wrote: What else would be possible if the same sort of technique were applied to another base character? Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions of http://www.unicode.org/reports/tr51/tr51-2.html ? Both colour pixel map and colour OpenType vector font solutions would be possible. Colour voxel map and colour vector 3d solids solutions are worth thinking about too as fun coding thought experiments that could possibly lead to useful practical results. William Overington 14 May 2015
RE: Future of Emoji? (was Re: Tag characters)
Skype uses stickers, including animated stickers. Here’s the documented set: https://support.skype.com/en/faq/FA12330/what-is-the-full-list-of-emoticons And if you search, you’ll find lots more “hidden” emoticons, like “(bartlett)”. Peter From: Shervin Afshar [mailto:shervinafs...@gmail.com] Sent: Thursday, May 14, 2015 8:12 PM To: Peter Constable Cc: unicode@unicode.org Subject: Future of Emoji? (was Re: Tag characters) Peter, This very topic was discussed in last meeting of the subcommittee and my impression is that there are plans to promote the use of embedded graphics (aka stickers) either through expansions to section 8 of TR51 or through some other means. It should also be noted that none of current members of Unicode seem to have a sticker-based implementation (with the exception of an experimental limited trial by Twitter[1]). [1]: http://mashable.com/2015/04/16/twitter-star-wars-emoji/ ↪ Shervin On Thu, May 14, 2015 at 7:44 PM, Peter Constable peter...@microsoft.commailto:peter...@microsoft.com wrote: And yet UTC devotes lots of effort (with an entire subcommittee) to encode more emoji as characters, but no effort toward any preferred longer term solution not based on characters. Peter From: Unicode [mailto:unicode-boun...@unicode.orgmailto:unicode-boun...@unicode.org] On Behalf Of Shervin Afshar Sent: Thursday, May 14, 2015 2:27 PM To: wjgo_10...@btinternet.commailto:wjgo_10...@btinternet.com Cc: unicode@unicode.orgmailto:unicode@unicode.org Subject: Re: Tag characters Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions IMO, the industry preferred longer term solution (which is also discussed in that section with few existing examples) for emoji, is not going to be based on characters. ↪ Shervin On Thu, May 14, 2015 at 1:40 PM, William_J_G Overington wjgo_10...@btinternet.commailto:wjgo_10...@btinternet.com wrote: What else would be possible if the same sort of technique were applied to another base character? Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions of http://www.unicode.org/reports/tr51/tr51-2.html ? Both colour pixel map and colour OpenType vector font solutions would be possible. Colour voxel map and colour vector 3d solids solutions are worth thinking about too as fun coding thought experiments that could possibly lead to useful practical results. William Overington 14 May 2015
Future of Emoji? (was Re: Tag characters)
Peter, This very topic was discussed in last meeting of the subcommittee and my impression is that there are plans to promote the use of embedded graphics (aka stickers) either through expansions to section 8 of TR51 or through some other means. It should also be noted that none of current members of Unicode seem to have a sticker-based implementation (with the exception of an experimental limited trial by Twitter[1]). [1]: http://mashable.com/2015/04/16/twitter-star-wars-emoji/ ↪ Shervin On Thu, May 14, 2015 at 7:44 PM, Peter Constable peter...@microsoft.com wrote: And yet UTC devotes lots of effort (with an entire subcommittee) to encode more emoji as characters, but no effort toward any preferred longer term solution not based on characters. Peter *From:* Unicode [mailto:unicode-boun...@unicode.org] *On Behalf Of *Shervin Afshar *Sent:* Thursday, May 14, 2015 2:27 PM *To:* wjgo_10...@btinternet.com *Cc:* unicode@unicode.org *Subject:* Re: Tag characters Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions IMO, the industry preferred longer term solution (which is also discussed in that section with few existing examples) for emoji, is not going to be based on characters. ↪ Shervin On Thu, May 14, 2015 at 1:40 PM, William_J_G Overington wjgo_10...@btinternet.com wrote: What else would be possible if the same sort of technique were applied to another base character? Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions of http://www.unicode.org/reports/tr51/tr51-2.html ? Both colour pixel map and colour OpenType vector font solutions would be possible. Colour voxel map and colour vector 3d solids solutions are worth thinking about too as fun coding thought experiments that could possibly lead to useful practical results. William Overington 14 May 2015
Re: Future of Emoji? (was Re: Tag characters)
Good point. I missed these while looking into compatibility symbols. Of course, as with Yahoo[1] and MSN[2] Messenger emoji sets, most of these are mappable to current or proposed sets of Unicode emoji (e.g. Lips Sealed ≈ U+1F910 ZIPPER-MOUTH FACE). It would be interesting to see how the extended support for flags, most of smiley faces, objects, etc. on all platforms would affect this approach. My idea of a sticker-based solution is something more like Facebook's[3] or Line's[4] implementations. [1]: http://www.unicode.org/L2/L2015/15059-emoji-im-yahoo.pdf [2]: http://www.unicode.org/L2/L2015/15058-emoji-im-msn.pdf [3]: http://www.huffingtonpost.com/2014/10/14/facebook-stickers-comments_n_5982546.html [4]: https://creator.line.me/en/guideline/ ↪ Shervin On Thu, May 14, 2015 at 9:37 PM, Peter Constable peter...@microsoft.com wrote: Skype uses stickers, including animated stickers. Here’s the documented set: https://support.skype.com/en/faq/FA12330/what-is-the-full-list-of-emoticons And if you search, you’ll find lots more “hidden” emoticons, like “(bartlett)”. Peter *From:* Shervin Afshar [mailto:shervinafs...@gmail.com] *Sent:* Thursday, May 14, 2015 8:12 PM *To:* Peter Constable *Cc:* unicode@unicode.org *Subject:* Future of Emoji? (was Re: Tag characters) Peter, This very topic was discussed in last meeting of the subcommittee and my impression is that there are plans to promote the use of embedded graphics (aka stickers) either through expansions to section 8 of TR51 or through some other means. It should also be noted that none of current members of Unicode seem to have a sticker-based implementation (with the exception of an experimental limited trial by Twitter[1]). [1]: http://mashable.com/2015/04/16/twitter-star-wars-emoji/ ↪ Shervin On Thu, May 14, 2015 at 7:44 PM, Peter Constable peter...@microsoft.com wrote: And yet UTC devotes lots of effort (with an entire subcommittee) to encode more emoji as characters, but no effort toward any preferred longer term solution not based on characters. Peter *From:* Unicode [mailto:unicode-boun...@unicode.org] *On Behalf Of *Shervin Afshar *Sent:* Thursday, May 14, 2015 2:27 PM *To:* wjgo_10...@btinternet.com *Cc:* unicode@unicode.org *Subject:* Re: Tag characters Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions IMO, the industry preferred longer term solution (which is also discussed in that section with few existing examples) for emoji, is not going to be based on characters. ↪ Shervin On Thu, May 14, 2015 at 1:40 PM, William_J_G Overington wjgo_10...@btinternet.com wrote: What else would be possible if the same sort of technique were applied to another base character? Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions of http://www.unicode.org/reports/tr51/tr51-2.html ? Both colour pixel map and colour OpenType vector font solutions would be possible. Colour voxel map and colour vector 3d solids solutions are worth thinking about too as fun coding thought experiments that could possibly lead to useful practical results. William Overington 14 May 2015
RE: Tag characters
And yet UTC devotes lots of effort (with an entire subcommittee) to encode more emoji as characters, but no effort toward any preferred longer term solution not based on characters. Peter From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Shervin Afshar Sent: Thursday, May 14, 2015 2:27 PM To: wjgo_10...@btinternet.com Cc: unicode@unicode.org Subject: Re: Tag characters Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions IMO, the industry preferred longer term solution (which is also discussed in that section with few existing examples) for emoji, is not going to be based on characters. ↪ Shervin On Thu, May 14, 2015 at 1:40 PM, William_J_G Overington wjgo_10...@btinternet.commailto:wjgo_10...@btinternet.com wrote: What else would be possible if the same sort of technique were applied to another base character? Thinking about this further, could the technique be used to solve the requirements of section 8 Longer Term Solutions of http://www.unicode.org/reports/tr51/tr51-2.html ? Both colour pixel map and colour OpenType vector font solutions would be possible. Colour voxel map and colour vector 3d solids solutions are worth thinking about too as fun coding thought experiments that could possibly lead to useful practical results. William Overington 14 May 2015
Re: Tag Characters (from Re: Fwd: RFC 6082 on Deprecating Unicode Language Tag Characters: RFC 2482 is Historic)
I remembered that I produced a font with visible glyphs for the tag characters. Some readers might like a copy of the font, free, from the following forum post. http://forum.high-logic.com/viewtopic.php?p=10587#p10587 I have been trying the font out again and find that I can, with the font installed, use Microsoft Calculator in its View Scientific mode to convert the hexadecimal E0041 of U+E0041 into decimal 917569 and then use Alt 917569 in Microsoft WordPad with the font, which has the font name Tags and Emoji 001, selected to display a visible glyph for the tag character. I then found that I could copy from WordPad and paste into SC UniPad and that SC UniPad has a visible glyph that can be displayed for the tag character as well, as an optional choice, the default case being an invisible glyph. One can then convert within SC UniPad to display \U000e0041 for the character. William Overington 12 November 2010
Re: Tag Characters (from Re: Fwd: RFC 6082 on Deprecating Unicode Language Tag Characters: RFC 2482 is Historic)
William_J_G Overington wjgo underscore 10009 at btinternet dot com wrote: I feel that deprecating the tag characters within Unicode was a mistake. There aren't many times when I agree with William, but this is one of them. -- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s