Re: A last missing link for interoperable representation
Tex Texin wrote, > ... However, the fact that there is a rich text solution for italics > isn't helpful to plain text users. Truer words were never spoken. > In the '90s it made sense to resist styling plain text. In the 2020's, > with more than 100k characters, numerous pictures and character > adornments, it seems anachronistic to be arguing against a handful > of control characters that would standardize a common text > requirement. Most rendering systems will handle it easily and any > plain text editor or other software that supports a combining > strikethrough character would easily adapt a combining italicize or > a combining bold character. Exactly. William Overington has already posted a proof-of-concept here: https://forum.high-logic.com/viewtopic.php?f=10=7831 ... using a P.U.A. character /in lieu/ of a combining formatting or VS character. The concept is straightforward and works properly with existing technology.
Re: A last missing link for interoperable representation
Martin J. Dürst wrote, > Almost by definition, styled text isn't plain text, even if it's > simulated by something else. By an earlier definition, in-line pictures weren't plain text, until people started exchanging them as though they were. In this case, people are exchanging plain text as plain text. > And the simulation is highly limited, as > the voicing examples and the fact that the math alphanumerics > only cover basic Latin have shown. The voicing examples are software shortcomings which could be overcome. The software people might seize the opportunity to accommodate their users and vocalize bold *loudly*, italics with /stress/, and fraktur with a Boris Karloff (or Bela Lugosi) voice. That would be up to them. But the voicing examples aren't really about reading and writing and how they relate to the character encoding. (Not saying that the voicing examples aren't interesting and relevant to the overall topic.) The fact that the math alphanumerics are incomplete may have been part of what prompted Marcel Schneider to start this thread. If stringing encoded italic Latin letters into words is an abuse of Unicode, then stringing punctuation characters to simulate a "smiley" (☺) is an abuse of ASCII - because that's not what those punctuation characters are *for*. If my brain parses such italic strings into recognizable words, then I guess my brain is non-compliant.
RE: A last missing link for interoperable representation
On 11.01.2019 11:43, Tex via Unicode wrote: Martin, James is making the case there is demand or a user need and that the proof is that users are using inconsistent tactics to simulate a solution to their problem. The use of math characters is mostly to get around limitations of Twitter (and some other platforms). There are plenty of rich text formats like Markdown and Html existing already. I am rather doubtful that it should be Unicode's responsibility to get around lack of rich text support via special characters and fonts, especially since many platforms do not allow users to freely change the fonts (and if these platforms installed such fonts, they could just as easily support markup/rich text instead). Even if they do, the programs/platforms involved would not necessarily enable these fonts by default: if the wanted rich text, they would be supporting it already. Also, any Unicode-based rich text standard would not really be standard compared to the vast amount of HTML out there already. David Faulks
Re: A last missing link for interoperable representation
Emoji were being encoded as characters, as codepoints in private use areas. That inherently called for a Unicode response. Bidirectional support is a headache; the amount of confusion and outright exploits from them is way higher then we like.The HTML support probably doesn't help that. However, properly mixing Hebrew and English (e.g.) is pretty clearly a plain text problem. There are terabytes of Latin text out there, most of it encoded in formats that already support italics. Whereas emoji, encoded as characters in a then limited number of systems, could be subsumed into Unicode easily, much of that text will never be edited and those formats will never exclude the existing means of marking italics out of bounds, offering multiple ways to do italics in perpetuity. -- Kie ekzistas vivo, ekzistas espero.
RE: A last missing link for interoperable representation
Martin, James is making the case there is demand or a user need and that the proof is that users are using inconsistent tactics to simulate a solution to their problem. The response that: "Almost by definition, styled text isn't plain text, even if it's simulated by something else." is a bit like Humpty Dumpty saying words mean what I want them to mean. Most of the emoji aren't plain text and Unicode has them in abundance. Ruby text is also not plain text. Their inclusion was the user need for consistency and interoperability. The original emoji had inconsistent encodings and were a problem for interchange as well as search and rendering. Their existence and popularity became their own problem requiring further styling (e.g. coloring) and greatly expanded enumeration (foods, animals, et al.) Let's be honest and admit the actual demand for some of these latter objects in plain text is marginal and certainly is less than the prevalence of italics. The response that: "the simulation is highly limited, as the voicing examples and the fact that the math alphanumerics only cover basic Latin have shown." unless I misunderstand your meaning, is the argument that we encoded only these therefore the use case is limited to these. In a different message you say: "Also, in contrast to the issue discussed in the current thread, there's no consistent or widely deployed solution for such CJK variants in rich text scenarios such as HTML." I don't see how a rich text solution has any bearing on plain text. We could take the point that if there was no need in HTML to solve the problem than there wasn't demand justifying the need in Unicode. :-) I understand your actual intent to say there was a need for CJK variants and there was no other solution. However, the fact that there is a rich text solution for italics isn't helpful to plain text users. HTML had bidirectional isolates and after the fact Unicode encoded them as well. The fact that there isn't a consistent way to represent stress or the other uses for italics (or obliques, and bold, etc.) does make certain searches across large numbers of plain texts problematic. In the same way it is sometimes important to distinguish capitalized text when searching (polish vs Polish) it would be helpful to do the same for italicized text. For example, if I am searching for the movie title "Contact" vs. all the places where texts reference a personal "Contact", distinguishing italicized titles would help. And to the extent that users are inserting non-standardized punctuation or other characters for "styling" it makes reliable searching difficult. As James mentioned it helps with interoperability as well. In the '90s it made sense to resist styling plain text. In the 2020's, with more than 100k characters, numerous pictures and character adornments, it seems anachronistic to be arguing against a handful of control characters that would standardize a common text requirement. Most rendering systems will handle it easily and any plain text editor or other software that supports a combining strikethrough character would easily adapt a combining italicize or a combining bold character. tex
Re: A last missing link for interoperable representation
On 2019/01/11 16:13, James Kass via Unicode wrote: > Styled Latin text is being simulated with math alphanumerics now, which > means that data is being interchanged and archived. That's the user > demand illustrated. Almost by definition, styled text isn't plain text, even if it's simulated by something else. And the simulation is highly limited, as the voicing examples and the fact that the math alphanumerics only cover basic Latin have shown. Regards, Martin.