Re: A last missing link for interoperable representation

2019-01-11 Thread James Kass via Unicode



Tex Texin wrote,

> ... However, the fact that there is a rich text solution for italics
> isn't helpful to plain text users.

Truer words were never spoken.

> In the '90s it made sense to resist styling plain text. In the 2020's,
> with more than 100k characters, numerous pictures and character
> adornments, it seems anachronistic to be arguing against a handful
> of control characters that would standardize a common text
> requirement. Most rendering systems will handle it easily and any
> plain text editor or other software that supports a combining
> strikethrough character would easily adapt a combining italicize or
> a combining bold character.

Exactly.  William Overington has already posted a proof-of-concept here:
https://forum.high-logic.com/viewtopic.php?f=10=7831
... using a P.U.A. character /in lieu/ of a combining formatting or VS 
character.  The concept is straightforward and works properly with 
existing technology.




Re: A last missing link for interoperable representation

2019-01-11 Thread James Kass via Unicode



Martin J. Dürst wrote,

> Almost by definition, styled text isn't plain text, even if it's
> simulated by something else.

By an earlier definition, in-line pictures weren't plain text, until 
people started exchanging them as though they were.  In this case, 
people are exchanging plain text as plain text.


> And the simulation is highly limited, as
> the voicing examples and the fact that the math alphanumerics
> only cover basic Latin have shown.

The voicing examples are software shortcomings which could be overcome.  
The software people might seize the opportunity to accommodate their 
users and vocalize bold *loudly*, italics with /stress/, and fraktur 
with a Boris Karloff (or Bela Lugosi) voice. That would be up to them.  
But the voicing examples aren't really about reading and writing and how 
they relate to the character encoding.  (Not saying that the voicing 
examples aren't interesting and relevant to the overall topic.)


The fact that the math alphanumerics are incomplete may have been part 
of what prompted Marcel Schneider to start this thread.


If stringing encoded italic Latin letters into words is an abuse of 
Unicode, then stringing punctuation characters to simulate a "smiley" 
(☺) is an abuse of ASCII - because that's not what those punctuation 
characters are *for*.  If my brain parses such italic strings into 
recognizable words, then I guess my brain is non-compliant.




RE: A last missing link for interoperable representation

2019-01-11 Thread via Unicode




On 11.01.2019 11:43, Tex via Unicode wrote:

Martin,

James is making the case there is demand or a user need and that the
proof is that users are using inconsistent tactics to simulate a
solution to their problem.



The use of math characters is mostly to get around limitations of 
Twitter (and some other platforms). There are plenty of rich text 
formats like Markdown and Html existing already.


I am rather doubtful that it should be Unicode's responsibility to get 
around lack of rich text support via special characters and fonts, 
especially since many platforms do not allow users to freely change the 
fonts (and if these platforms installed such fonts, they could just as 
easily support markup/rich text instead). Even if they do, the 
programs/platforms involved would not necessarily enable these fonts by 
default: if the wanted rich text, they would be supporting it already.


Also, any Unicode-based rich text standard would not really be standard 
compared to the vast amount of HTML out there already.


David Faulks


Re: A last missing link for interoperable representation

2019-01-11 Thread David Starner via Unicode
Emoji were being encoded as characters, as codepoints in private use
areas. That inherently called for a Unicode response. Bidirectional
support is a headache; the amount of confusion and outright exploits
from them is way higher then we like.The HTML support probably doesn't
help that. However, properly mixing Hebrew and English (e.g.) is
pretty clearly a plain text problem.

There are terabytes of Latin text out there, most of it encoded in
formats that already support italics. Whereas emoji, encoded as
characters in a then limited number of systems, could be subsumed into
Unicode easily, much of that text will never be edited and those
formats will never exclude the existing means of marking italics out
of bounds, offering multiple ways to do italics in perpetuity.

-- 
Kie ekzistas vivo, ekzistas espero.


RE: A last missing link for interoperable representation

2019-01-11 Thread Tex via Unicode
Martin,

James is making the case there is demand or a user need and that the proof is 
that users are using inconsistent tactics to simulate a solution to their 
problem.

The response that:
"Almost by definition, styled text isn't plain text, even if it's simulated by 
something else." 
is a bit like Humpty Dumpty saying words mean what I want them to mean. 

Most of the emoji aren't plain text and Unicode has them in abundance. Ruby 
text is also not plain text. Their inclusion was the user need for consistency 
and interoperability. The original emoji had inconsistent encodings and were a 
problem for interchange as well as search and rendering. Their existence and 
popularity became their own problem requiring further styling (e.g. coloring) 
and greatly expanded enumeration (foods, animals, et al.) Let's be honest and 
admit the actual demand for some of these latter objects in plain text is 
marginal and certainly is less than the prevalence of italics.

The response that:
"the simulation is highly limited, as the voicing examples and the fact that 
the math alphanumerics only cover basic Latin have shown." unless I 
misunderstand your meaning, is the argument that we encoded only these 
therefore the use case is limited to these.

In a different message you say:
"Also, in contrast to the issue discussed in the current thread, there's no 
consistent or widely deployed solution for such CJK variants in rich text 
scenarios such as HTML."
I don't see how a rich text solution has any bearing on plain text. We could 
take the point that if there was no need in HTML to solve the problem than 
there wasn't demand justifying the need in Unicode. :-)
 I understand your actual intent to say there was a need for CJK variants and 
there was no other solution. However, the fact that there is a rich text 
solution for italics isn't helpful to plain text users.
HTML had bidirectional isolates and after the fact Unicode encoded them as well.

The fact that there isn't a consistent way to represent stress or the other 
uses for italics (or obliques, and bold, etc.) does make certain searches 
across large numbers of plain texts problematic. In the same way it is 
sometimes important to distinguish capitalized text when searching (polish vs 
Polish) it would be helpful to do the same for italicized text. For example, if 
I am searching for the movie title "Contact" vs. all the places where texts 
reference a personal "Contact", distinguishing italicized titles would help. 
And to the extent that users are inserting non-standardized punctuation or 
other characters for "styling" it makes reliable searching difficult. As James 
mentioned it helps with interoperability as well.

In the '90s it made sense to resist styling plain text. In the 2020's, with 
more than 100k characters, numerous pictures and character adornments, it seems 
anachronistic to be arguing against a handful of control characters that would 
standardize a common text requirement. Most rendering systems will handle it 
easily and any plain text editor or other software that supports a combining 
strikethrough character would easily adapt a combining italicize or a combining 
bold character.

tex





Re: A last missing link for interoperable representation

2019-01-11 Thread Martin J . Dürst via Unicode
On 2019/01/11 16:13, James Kass via Unicode wrote:

> Styled Latin text is being simulated with math alphanumerics now, which 
> means that data is being interchanged and archived.  That's the user 
> demand illustrated.

Almost by definition, styled text isn't plain text, even if it's 
simulated by something else. And the simulation is highly limited, as 
the voicing examples and the fact that the math alphanumerics only cover 
basic Latin have shown.

Regards,   Martin.