How is meaning changed by context and typgraphy - in art, emoji and language

2020-04-01 Thread wjgo_10...@btinternet.com via Unicode
I received a circulated email from MoMA, the Museum of Modern Art in New 
York. I am, at my request, on their mailing list.


There is a link to a web page.

https://www.moma.org/magazine/articles/257

There is a video embedded in the web page, 8 minutes.

I watched the video and found it interesting.

There is one part where two identical images each have a different 
title.


I noticed that both titles were in English.

With typography today it has become almost obligatory these days for a 
proposal for a new emoji character to become encoded, for the emoji 
character to be suggested as having multiple possible meanings, possibly 
linked to context, or maybe just anyway.


The beginnings of this phenomenon and the problems of ambiguity of 
meaning of emoji characters was discussed in a talk at the Unicode 
conference in 2015.


https://www.youtube.com/watch?v=9ldSVbXbjl4

There was mention of the possibility of "precise emoji".

Yet these days  imprecision of emoji meaning has become widespread. Yet 
has the possibility of QID emoji brought back the possibility of precise 
emoji? Decoding could be to an image, or to language-localized speech or 
language-localized text, or even all three at once. Yet only if QID 
emoji are allowed to flourish, perhaps after a few careful modifications 
to the original proposal so as to minimize, or at least limit, the 
possibility of encoding chaos.


I have long been fascinated by what I regard as subtle changes of 
meaning that setting a piece of text in different fonts produces, though 
some other people opine that the meaning is unchanged, regardless of the 
font.


 Also, can some meanings not be expressed from one language to another? 
If so, is that due to the nature of the languages or the culture where 
the original text was produced, or some of each. Does the general shape 
of the way that a particular script has developed reflect, or influence, 
the original literature written in that script? Do words that rhyme in 
one language produce imagery that does not arise in a language where 
their translations do not rhyme? For example, boaco and erinaco rhyme In 
Esperanto, yet their translations in English, reindeer and hedgehog, do 
not rhyme.


The art works in the MoMA video also reminded me of something that was 
in this mailing list probably in the early 2000s.


The post was about translations linked to an art project.

It was an art project about some orange blocks and people were taking 
photographs of art works where one of the orange blocks was presented in 
some context.


Maybe it was a student project, I don't know.

I have looked on the web and thus far found nothing about it, not even 
the original post in this mailing list thus far.


Since then technology has changed a lot, much more is now possible for 
more people. There are now widespread emoji, there is Google street 
view, and so on.


New art possibilities.

Does anyone else remember the orange blocks please? Maybe an interesting 
stepping stone in the history of art.


William Overington

Tuesday 31 March 2020


Base character plus tag sequences (from RE: Is the binaryness/textness of a data format a property?)

2020-03-23 Thread wjgo_10...@btinternet.com via Unicode


Doug Ewell wrote:

When 137,468 private-use characters aren't enough?
In my opinion, a base character plus tag sequence has the potential to 
be used for many large scale applications for the future.
A base character plus tag sequence encoding has the advantage over a 
Private Use Area encoding (except for a prompt experimental use or for 
some applications) that the encoding can be unique and thus 
interoperability is possible amongst people generally.


QID emoji is just the very start of applications, some not even dreamed 
of yet, for which a base character sequence encoding could be used.


Once restrictions of the result of a specific encoding of being only 
allowed to be a fixed image are removed, then new information technology 
applications will be possible within text streams.


There is the QID Emoji Public Review and issues like this can be 
explored there so that they will be before the Unicode Technical 
Committee when it assesses the responses to the public review.


In my response of Monday 2 March 2020 I put forward an idea that could 
allow the idea of QID emoji to proceed yet without the disadvantages.


No comment after that has been published as of the time of sending this 
post.


https://www.unicode.org/review/pri408/

Whatever your view on whether such ideas should be allowed to flourish 
and become mainstream in the future I opine that it would be good for 
there to be more responses to the public review so that as wide a range 
of views as possible are before the Unicode Technical Committee when it 
assesses the responses to the public review, not on just QID emoji as 
such but on whether the underlying method of encoding of a base 
character and tag character sequence for  large sets of items should be 
encouraged.


William Overington

Monday 23 March 2020






Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-15 Thread wjgo_10...@btinternet.com via Unicode


Joel Kalvesmaki asks nine questions, six in the first block and three in 
the second block.
Numbering from 1 through to 9 in the order that they are asked, I do 
not, at present understand the question for many of them and I can, at 
present, only answer question 7 definitively. Some questions may need an 
answer in two parts, one of the parts about my specific project, and the 
other part about if one or more people also decide to have his or her 
own encoding space in a similar manner.
I realize that not even understanding the question at this time may not 
sound very good to just some of the people who do understand the 
question, but I am not someone who knowingly purports that he knows what 
he is talking about when he does not. I am a researcher and as I am now 
on awareness of these questions.I need to find out so that in the future 
I can answer such questions with a sound background knowledge of the 
topic.
It might be that I know of some matters but that I am not aware of the 
parlance used to describe them in the post to which I am replying..

So now to my thoughts on some of the questions.
1 to 4. I do not at present understand the question.
5. Perhaps, independent of each other, you bind !123 to a character 
semantically identical to one I've bound to !234. What rules are in 
place to allow interchangeability?
I am not sure this is the best possible answer, but with care the 
problem should not happen in the first place. I am thinking that people 
could perhaps avoid it happening in the first place by using an informal 
discussion method similar that used when proposing a new alt. group in 
the usenet system that was in widespread use before the web was 
invented.

6. I do not at present understand the question.
7. Or maybe you're not so much concerned about interoperability as are 
you are with extending the PUA block beyond its current limits?
No, absolutely not. I have used the Private Use Areas on a number of 
occasions and found them extremely useful to have available. Yet any 
assignment in not unique and, except in very limited special limited 
prearranged circumstances, interoperability is not possible. My research 
project is very much about interoperability with provenance. 
Interoperabilty with provenance is central to what I am trying fo 
achieve.

8. Something like SGML/XML entities?
Until mention in the post to which I am replying, I had never known of 
them.
9.  Couldn't you simply capitalize on the rules that already exist for 
entities?
From what I have read about them today, well, I suppose that I could, 
but that is not my approach and I am not going to use them.
My items are not emoji, but emoji are either expressed by an atomic 
character or by a sequence of atomic characters, such sequences decoded 
upon reception to produce a glyph. My proposed system uses sequences of 
atomic character such that such sequences could be decoded upon 
reception to produce localized output. A similar yet different process. 
I simply do not want, as a design choice, all that angled bracket stuff, 
it is just not what I am trying to do.


If anyone on this mailing list who understands some or all of what I do 
not, your comments in this thread would be very welcome please.
The first three links on my webspace are relevant to my research 
project.

http://www.users.globalnet.co.uk/~ngo/
The website is safe to use. It is hosted on a server run these days by 
Plusnet PLC, a United Kingdom internet service provider. It is not 
hosted on my computer.

William Overington
Saturday 15 February 2020



-- Original Message --
From: "via Unicode" 
To: wjgo_10...@btinternet.com
Cc: unicode@unicode.org
Sent: Saturday, 2020 Feb 15 At 10:11
Subject: Re: What should or should not be encoded in Unicode? (from Re: 
Egyptian Hieroglyph Man with a Laptop)

Hi William,

I don't fully understand your proposed encoding scheme (e.g., Is there a 
namespace each encoding scheme is bound to? How do namespaces get 
encoded? How are syntax strictures encoded?), but even then, presuming 
it's sound, you've said in the message before that this encoding space 
will enhance interoperability. What mechanism is in place to make my 
encoding space interoperable with yours? Perhaps, independent of each 
other, you bind !123 to a character semantically identical to one I've 
bound to !234. What rules are in place to allow interchangeability? What 
about one-to-many or many-to-many or vague or ambiguous mappings across 
encoding schemes, or mappings that we might reasonably contest?


Or maybe you're not so much concerned about interoperability as are you 
are with extending the PUA block beyond its current limits? Something 
like SGML/XML entities? Couldn't you simply capitalize on the rules that 
already exist for entities?


Best wishes,

jk
--
Joel Kalvesmaki
Director, Text Alignment Network
http://textalign.net <http://textalign.net>

On 2020-02-14 15:52, wjgo_10...@btinternet.com v

Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-14 Thread wjgo_10...@btinternet.com via Unicode
The solution is to invent my own encoding space. This sits on top of 
Unicode, could be (perhaps?) called markup, but it works!


It may be perilous, because some software may enforce the strict 
official code point limits.


I  have now realized that what I wrote before is ambiguous.

When I wrote "sits on top of Unicode" I was not meaning at some code 
points above U+10 in the Unicode map, though I accept that it could 
quite reasonably be read as meaning that.


My encoding space sits on top of Unicode in the sense that it uses a 
sequence of regular Unicode characters for each code point in my 
encoding space.


For example

∫⑦⑧①

or

!781

or

a character sequence of a base character, followed by a tag exclamation 
mark followed by three tag digits and a cancel tag.


All three examples above have the same meaning.

∫⑦⑧① is useful as more unlikely otherwise than !123, though !123 is 
easier to use and could be used in a GS1-128 barcode.


The tag sequence has the potential to become incorporated into Unicode 
for universal standardization of unambiguous interoperability 
everywhere. That is a long term goal for me.


The example above uses a three-digit code number. My encoding space 
allows for various numbers of digits, with a minimum of three digits and 
a much larger theoretical maximum. The most digits in use at present in 
my research project in any one code number is six.


William Overington

Friday 14 February 2020




What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-13 Thread wjgo_10...@btinternet.com via Unicode
Hans Åberg >>> From the point of view of Unicode, it is simpler: If the 
character is in use or have had use, it should be included somehow.


Shawn Steele >> That bar, to me, seems too low.  Many things are only 
used briefly or in a private context that doesn;t really require 
encoding.


Hans Åberg > That is a private use area for more special use.

I have used the Private Use Area, quite a lot over many years.

I have a licence for a fontmaking program, FontCreator. A good feature 
of the Windows operating system is that all installed fonts can be used 
in most installed programs. Private Use Area code points are official 
Unicode code points. These three factors together allow me to design and 
produce TrueType fonts for new symbols each encoded at a Private Use 
Area code point (a different code point for each such novel symbol), 
install the fonts, and use them in various programs, including a desktop 
publishing program and thereby make PDF (Portable Document Format) 
documents that include both ordinary text and the novel symbols. These 
PDF documents are then suitable for placing on the web and for Legal 
Deposit with The British Library.


Yet a Private Use Area encoding at a particular code point is not 
unique. Thus, except with care amongst people who are aware of the 
particular encoding, there is no interoperability, such as with regular 
Unicode encoded characters.


However faced with a need for interoperability for my research project, 
I have found a solution making use of the Glyph Substitution capability 
of an OpenType font.


The solution is to invent my own encoding space. This sits on top of 
Unicode, could be (perhaps?) called markup, but it works!


I am hoping that at some future time the results of my research will 
become encoded as an International Standard, and that my encoding space 
will then after that become integrated into Unicode, thus achieving 
fully standardized unique interoperable encoding as part of Unicode. 
Quite a dream, but the way to achieve such a fully standardized unique 
interoperable encoding as part of Unicode is from a technological point 
of view, quite straightforward. There are details of this in the 
Accumulated Feedback on Public Review Issue #408.


https://www.unicode.org/review/pri408/

Yet having my encoding space in this manner is just something that I 
have done on my own initiative. Anybody can have his or her own encoding 
space if he or she so chooses. With a little care and consideration for 
others these encodings need not clash one with another and all could 
even coexist in one document.


Having my own encoding space has enabled me to make progress with my 
research project.


William Overington

Thursday 13 February 2020





RE: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-12 Thread wjgo_10...@btinternet.com via Unicode


Hi

At the time, I thought that my post yesterday concluded the thread. 
However, later something occurred to me as a result of something in the 
post by Sławomir Osipiuk.


The gentleman wrote as follows:

Sending multiples of the same message in different languages is really 
only applicable to broadcast/multicast scenarios, where you have a 
transmission going out live to multiple recipients who have different 
language demands. I can't immediately think of any examples where this 
is done with plain-text only, though I'd be glad to learn about them, 
if they exist.
Whilst I do not know of anything of where this is presently done, I 
realized that this would be a practical proposition for some of the 
things in the Internet of things.
I am reminded of the teletext system (with brand names such as Ceefax 
and Oracle) in the United KIngdom, which was a broadcasting technology 
introduced in the 1970s and which became very much a part of British 
culture during the 1980s and 1990s. A digital signal of a special 
purpose 7-bit character set was broadcast in the vertical blanking 
interval of a 625 line analogue television signal. Basically in some 
lines normally used for the colour picture but some lines were not used 
during the time allowed for the scan go back to the top of the picture 
once it reached the lower edge of the picture. So this digital 
information service got a free ride in the picture signal going out to 
receivers all over the country. The information was organised into pages 
and an end user could go to "text" and then wait for a selected page to 
come round again in the continuous cyclic broadcasting of pages. Pages 
could be arranged by the broadcaster so that, say, the news headlines 
page came around maybe four times in each, say, 20 second cycle and some 
pages only once. It was very effective as the special purpose 7-bit 
character set, while being basically ASCII, had control characters that 
were stateful and displayed each as a space yet some of them switched 
the colour of the following text until a new control character for a 
colour were received, if it indeed one were received; or until the end 
of the 40 character line of the display. Each line started  with white 
text, though if the first character of the line switched to a colour, 
the end user would not see any white text. The control codes set also 
included switching to chunky graphics mode. There was also a facility to 
use the system for subtitles to the television programme, optional 
subtitles so that end users could have them on if desired yet other 
users were not thereby forced to have subtitles. It was good, as various 
participants in a discussion - whether news or drama - could each have a 
colour for their speaking, such as green, yellow, cyan, white. No return 
link was needed to send information from the end user to the central 
broadcasting computer.
A system with the same format of display was a viewdata system (brand 
name Prestel) but that was very different from teletext and used a 
two-way telephone line connection. In a viewdata system, the end user 
selected a page from a menu then a message requesting that page was sent 
to the central computer and just that page was sent to the end user. A 
fee for a page was often charged and the system never really took off. 
Teletext thrived because economy of scale brought the cost of 
teletext-capable electronics down and it was installed using a set of 
for-the-purpose integrated circuits during manufacture of most colour 
television sets in that era, and once installed then it was a free 
add-on with no ongoing cost apart from the ordinary television licence.
It seems to me that there could be, in the future, a type of thing that 
sends out a continuous signal over a wire of, say, a temperature reading 
at its location, all formatted in several languages. So, no passwords, 
no input from an end user, just a continuous feeding into The Internet 
of Things its output, with the numerical value in the messages changed 
as the temperature changes. This would allow the digits to be expressed 
in the digits used in the particular script of the particular language 
used in an individual  message.

William Overington
Wednesday 12 February 2020




Re: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-11 Thread wjgo_10...@btinternet.com via Unicode

Hi

Thank you to everybody who replied to this thread, both online and 
offline.


Sławomir Osipiuk wrote:

As for "concatenation of such plain text sequences" where each 
sequence is in a different language, ...


Actually I was meaning the concatenation of a number of messages, one 
from each of a number "things", where each message includes text in 
several languages. The result being a report in several languages, just 
by simple concatenation of the number of reports. That is, if there are 
seven sensors, the final report has seven uses of the language code for 
English, seven for French, seven for German, seven for Polish, and so 
on.


Mark E. Shoulson wrote:

So at least this particular application would be a solution to a 
problem that's already been solved.


Well, maybe it is now a solution that is out there and maybe some day a 
problem will arise for which this would be a solution worth considering. 
So for now it drifts into the archives.


Best regards,

William Overington

Tuesday 11 February 2020




Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-10 Thread wjgo_10...@btinternet.com via Unicode

Hi

Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good 
reason why I ask


There is a German song, Lorelei, and I searched to find an English 
translation.


I found the following video.

https://www.youtube.com/watch?v=lJ3JhxOUbw0

The video is an instrumental version and is particularly interesting is 
that there are lyrics displayed in four languages, with two versions of 
the translation in English.


Being a native speaker of English and living in England I first watched 
the video viewing just the version labelled British:. Later I played the 
video again and I just viewed the version labelled U.S..


Remembering that I had some time ago heard a version in Esperanto, I 
searched nd found the two following videos.


https://www.youtube.com/watch?v=reUpdGgdBsA

https://www.youtube.com/watch?v=7dHhTXDmP0k

They may be of the same recording. This first has in its notes the text 
of the lyrics.


The song in Esperanto has the rather expressive Esperanto word belega in 
it. This single word, an adjective, is composed from the Esperanto word 
bela which means beautiful augmented with the Esperanto word-building 
component -eg- that modifies the word to which it is an augmentation to 
indicate greatness. So the word belega expresses in one three-syllable 
Esperanto word the concept that is in English "greatly beautiful".


http://esperanto.davidgsimpson.com/eo-affixes.html

Thinking of the first video to which I linked, it occurred to me that if 
a plain text message were sent containing each of two or more versions 
of the same text, for whatever text, probably a short message in 
practice, each in a different language from the other or others, with 
the language of a particular version preceded by a tag sequence: then 
software at the receiving end could be set to a chosen language and only 
text in that language would be displayed.


Thinking around this idea I thought that this could be very useful in 
The Internet of Things for machine to human communication, whereby, if, 
say, an end user (human) is wanting to dialogue with a device (thing) 
then the technique could be used to send the message


Please enter the password

from the thing in a number of languages. The decoding software in the 
end user's computer could use the first message in the list as the 
default if the sequence sent by the thing does not have a version for 
the particular language set by the end user in his or her computer.


The list of languages supported by a particular thing would not be 
specified by a universal standard, but could perhaps have English, 
French, German and one or more others depending up the location and 
application of the thing. Any language expressible in Unicode could be 
included in the list.


Support for Unicode characters beyond plane 0 is much more obtainable in 
software these days.


I know that people have been urged to use a higher level protocol for 
indicating in  language documents, but please consider if one is wanting 
to assemble automatically a status report by combing reports from each 
of a number of mutually independent sensors on the Internet of Things, 
each of relatively small size, located in a variety of physical 
locations perhaps miles apart. In such a case the concatenation of such 
plain text sequences would be straightforward.


Such an undeprecating of U+E0001 LANGUAGE TAG would, in my opinion, 
contribute to the development of The Internet of Things.


William Overington

Monday 10 February 2020



Re: New Unicode Working Group: Message Formatting

2020-01-14 Thread wjgo_10...@btinternet.com via Unicode


The reply from Mr Verdy has indeed been helpful, as indeed has also been 
an offlist private reply from someone who has, thus far, not been a 
participant in this thread.


Mr Verdy wrote:

You seem to have never seen how translation packages work and are used 
in common projects (not just CLDR, but you could find them as well in 
Wikimedia projects, or translation packages for lot of open source 
packages).

What seems to be the case to Mr Verdy is in fact the actual situation.


I do not satisfy the second of the two conditions of the invitation to 
join the working group. I am, in fact, retired and I have never worked 
in the i18n/l10n industry. Also, from the explanations it is not as 
close to my research interests as I had thought, and indeed hoped. I 
just do what I can on my research project from time to time using a home 
computer, a personal webspace hosted by an internet service provider, 
some budget software, mainly High-Logic FontCreator, and Serif PagePlus 
desktop publishing package, together with the software bundled with 
Windows 10. Older people are often advised to try to keep the mind 
active, so my research activity at least does that. If the research 
itself has benefits more generally in making progress in the application 
of information technology then that is an additional benefit.


One thing that of which you might like to take account and specifically 
"build-out" in computer formatting is a tendency that can occur in some 
computer systems software and also in everyday transactions also before 
computers became widespread, namely of not allowing a person to be 
recorded or listed with more that two initials before his or her 
surname, to the extent that some people even have a practice of not 
using more than two initials even when the document, such as a letter, 
or a form, before them specifically uses three or more initials. Common 
explanations are that "It's for the computer" and "Two initials is 
enough to identify someone" and "Someone could have many names". Yet the 
second is not true and the first is only because somewhere along the 
line someone has decided that that is how it to be done: the third is 
true, but the fact that that is the person's name on his or her birth 
certificate is the legal fact of the matter and so needs to be properly 
accommodated in systems recording names. Also, the United Kingdom and 
United States format of a given name, one or more additional given 
names, then a surname is not suitable for some other cultures. I 
remember some registration forms for college courses that would ask for 
surname and forenames, with a panel for each, together with a printed 
note on every such form "If your name cannot be expressed in that 
format, please write your whole name in the box labelled 'surname'".


However, with localization there are other issues. I seem to remember 
somewhere that people whose name is correctly expressed in a script 
other than Latin script often have a transliterated "Romanized form" of 
their name as well for use on travel documents. So will your format 
system include provision for this please, such as by allowing both to be 
linked together in a document please?


Another feature is that I have known people from various countries who 
have, in everyday use, chosen to be known in everyday workplace 
situations by an English first name rather than their official given 
name, while using their original surname, perhaps transliterated. So it 
would be good if the name format accounts for that too please, in a 
manner that does not give the possible impression of that use being for 
some questionable purpose. Maybe a new term such as ChosenSocialName 
could be used for that please.


An interesting facet of transliteration is that the name of a famous 
mathematician whose name was properly written using Cyrillic characters, 
was transliterated into English as Chebyshev, whereas the set of 
polynomials named after him are each designated by including the letter 
T. The transliteration of the name of the mathematician into German 
starts with a T rather than the C used in English. There was a short 
thread that explored within it this topic in this mailing list around 
the year 2000, not necessarily in the year 2000 itself, but I have not 
been able to locate it.


William Overington

Tuesday 14 January 2020



Re: New Unicode Working Group: Message Formatting

2020-01-13 Thread wjgo_10...@btinternet.com via Unicode

I notice that in the web page

https://github.com/unicode-org/message-format-wg/issues/3

there is a request to add more features.

One of those requested features is as follows


Inflections (genders, articles, delensions, etc.)


So I am wondering quite what formats will be covered by the project and 
how those formats can be applied, in various contexts, not necessarily 
only those initially considered.


William Overington

Monday 13 January 2020



Re: New Unicode Working Group: Message Formatting

2020-01-11 Thread wjgo_10...@btinternet.com via Unicode
A person in England, who knows no German, wants to send the parcel to a 
person in Germany, who knows no English.


The person in England wants to send a message about the delivery to the 
person in Germany..



English: “The package will arrive at {time} on {date}.”


The person want to send the message by email.


German: “Das Paket wird am {date} um {time} geliefert.”


Where does the translation of the text take place please, and by whom or 
by which computer?


During the actual  transmission from the computer in England to the 
computer in Germany, is the text of the string in English, or German, or 
in a language-independent form please?




If the parcel were being sent from France to Germany by a person who 
knows only French, during the transmission of the message about the 
parcel, is the text of the string in French, or English, or German, or 
in a language-independent form please?


William Overington

Saturday 11 January 2020



Free emoji (from Re: Videos on YouTube)

2020-01-08 Thread wjgo_10...@btinternet.com via Unicode


Johannes Bergerhausen wrote:


West is located in the former US embassy, a brutalist building by 
Marcel Breuer (Bauhaus):

www.westdenhaag.nl 


On that web page is a link to the following web page.

http://www.westdenhaag.nl/exhibitions/20_02_Alphabetum_6

The title of the exhibition is

FREE EMOJI

There is some interesting text on the page.

How would such emoji be encoded?

I am wondering how this relates, if at all, to QID emoji.

The concept of QID emoji has been put forward, and it is far-reaching in 
its implications for the future.


However, it does mean that someone wanting a new emoji would need to go 
through the QID database process.


If, in the United Kingdom, someone writes a poem, or a novel, or indeed 
anything, and publishes it, whether in hardcopy or on the web, then no 
permission is needed to do so, though the content is subject to legal 
constraints. There are certain requirements relating to Legal Deposit.


https://www.bl.uk/legal-deposit

So what if such freedom were to apply to introducing a new emoji?

For example, if I produce an ebook and I want to include a reference 
code, I have the option, in the Serif PagePlus X7 desktop publishing 
software that I use, of using a UUID (Universally Unique Identifier), or 
an ISBN (International Standard Book Number), or something custom.


I have not produced any ebooks other than, as learning exercises, a few 
tests that I have not published.


I looked at UUID and it seems to me that a randomly generated UUID code 
is not unique at an absolute level. ISBN needs registration with payment 
being involved. Yet there is always custom.


So if there were to be free emoji as mentioned in the text for that 
exhibition, how could they be encoded for interoperability? Does the 
exhibition address that issue?


Maybe publish a PDF and send it for legal deposit with a code of some 
sort and then that is regarded as a precedent? Or what?


What would a custom code be like? Maybe the author's initials followed 
by a serial number, then interchange being by using a tag sequence after 
a (new, not yet encoded?) base character of the tag character version 
each of those characters that are in the custom code? Lots of potential 
problems there too.


What are the options?

If someone on this list is visiting the exhibition, a write up posted in 
this mailing list would be welcome please, at least, by me, and maybe by 
some other participants too.


William Overington

Wednesday 8 January 2020



Re: emojis for mouse buttons?

2019-12-31 Thread wjgo_10...@btinternet.com via Unicode


How about the following.

Expand Philippe's idea of the theta shape to having a three by three 
grid of cells, rounded at the two lower outside corners to suggest the 
shape of a mouse unit. The three columns left to right referring to the 
left button, the centre button and the right button respectively.


For each button column, there is an upper cell, a middle cell and a 
lower cell.


A filled upper cell to mean click,

a filled upper cell and a filled middle cell to mean double click,

a filled lower cell to mean mouse down,

a filled middle cell to mean mouse up,

a filled lower cell and a filled middle cell to mean mouse down then 
drag.


So, at present, fifteen new emoji characters.

In use a mouse down then drag symbol would be used followed by a mouse 
up symbol later.


The grid could be one colour and the cell fill another colour if 
desired, but the design would also be unambiguous in monochrome as a 
display default.


William Overington

Tuesday 31 December 2019


-- Original Message --
From: "Philippe Verdy via Unicode" 
To: "Shriramana Sharma" 
Cc: "unicode Unicode Discussion" 
Sent: Tuesday, 2019 Dec 31 At 15:49
Subject: Re: emojis for mouse buttons?

I say "emoji" because they would belong to the subsets of emojis, within 
characters, and existing mouse characters (but not button-specific) are 
already encoded as emojis (i.e. two styles: basic glyphs or color 
icons).


What is important is less the mouse than the identification of the 
button (left/center/right) for documenting keymaps in UI (the 
documentation usually indicate the default right-hand assignment, a user 
may still configure the mouse driver to swap the left/right buttons).



For now the alternative is to compose a localisable string like "L" or 
"R" or "C", followed by the generic mouse (when documenting keymaps, the 
surrounding square and shading may be done outside using styling, we 
just need the unique symbol in a more immediately readable way than just 
"click".



A generic clic (1st button) is sometimes represented as an arrow cursor 
or hand with a pointing finger, and some radial strokes near the tip of 
the arrow, but it is not very distinctive when we need to explicitly 
disinguish the buttons, so I suggest a basic empty shape (rounded 
rectangle or ovoid like a narrow theta "Θ"), with the top part split in 
three cells by horizontal and vertical strokes, and one of the three 
cells filled (representing the wire or the wireless waves is not 
necessary).




Le mar. 31 déc. 2019 à 14:57, Shriramana Sharma  > a écrit :


Why are these called "emojis" for mouse buttons rather than just 
"characters" for them?


On Tue, 31 Dec, 2019, 18:45 Philippe Verdy via Unicode, 
mailto:unicode@unicode.org> > wrote:


A lot of application need to document their keymap and want to display 
keys.


For now there are emojis for mouses (several variants: 1, 2 or 3 
buttons), independently of the button actually pressed.



However there's no simple emoji to represent the very common mouse click 
buttons used in lot of UI.




But it would be good to have emojis for the left, center, and right 
click (showing a mouse with the correct button filled in black), instead 
of writing "left click" in plain text.



Has it been proposed ?


See for example https://wiki.openstreetmap.org/wiki/ID/Shortcuts 










Re: emojis for mouse buttons?

2019-12-31 Thread wjgo_10...@btinternet.com via Unicode


I read Philippe's post and I remembered the following thread that I 
started in the High-Logic forum.
https://forum.high-logic.com/viewtopic.php?f=10=3818 


Are these of any interest as designs?
Best regards,
William Overington
Tuesday 31 December 2019

-- Original Message --
From: "Philippe Verdy via Unicode" 
To: "unicode Unicode Discussion" 
Sent: Tuesday, 2019 Dec 31 At 13:13
Subject: emojis for mouse buttons?

A lot of application need to document their keymap and want to display 
keys.


For now there are emojis for mouses (several variants: 1, 2 or 3 
buttons), independently of the button actually pressed.



However there's no simple emoji to represent the very common mouse click 
buttons used in lot of UI.




But it would be good to have emojis for the left, center, and right 
click (showing a mouse with the correct button filled in black), instead 
of writing "left click" in plain text.



Has it been proposed ?


See for example https://wiki.openstreetmap.org/wiki/ID/Shortcuts 








Videos on YouTube

2019-12-27 Thread wjgo_10...@btinternet.com via Unicode

I searched on YouTube for

Gutenberg Mainz

and filtered for

This week

and I found 12 videos uploaded 3 days ago about a symposium called 
Alphabetica 2019.


Apparently held in Amsterdam.

It seems that the videos were listed for that search as the notes 
include


"Presented in collaboration with the Institut Designlabor Gutenberg 
(Hochshule Mainz),"  … [and several other organizations]


so  both the words Gutenberg and Mainz were matched to the search.

So, a serendipitous discovery.

There is an interesting section in one video about Bliss and a new 
interesting development relating to the (possible) encoding of Bliss 
characters into Unicode.


https://www.youtube.com/watch?v=mwj2KilAXmo

Here are links to two videos of continuous walks through Mainz: each of 
them includes the Statue of Gutenberg and the outside of the Gutenberg 
Museum yet are otherwise almost non-overlapping in their routes.


https://www.youtube.com/watch?v=scjLxGh17rA

https://www.youtube.com/watch?v=izqBUQkfByw

William Overington

Frisday 27 December 2019




Re: HEAVY EQUALS SIGN

2019-12-20 Thread wjgo_10...@btinternet.com via Unicode
On the matter of my document proposing using Variation Selector 14 for 
requesting an italic glyph for a letter, Unicode Inc. has also published 
a Notice of Non-Approval.


https://www.unicode.org/alloc/nonapprovals.html

It is indeed interesting that the Notice of Non-Approval itself uses 
italics for emphasis in two places.


That text, at the present time, cannot be expressed in Unicode plain 
text with the emphasis that the Notice of Non-Approval includes.


Readers of my two original documents on the topic may like to observe 
that I did not in any way suggest that the specialised italic characters 
for some mathematical uses are a precedent for the proposal that I 
submitted.


Here is a link to a PDF (Portable Document Format) document produced 
earlier today of a song that I wrote earlier this year that mentions 
italics.


http://www.users.globalnet.co.uk/~ngo/a_song_of_typography.pdf

I still consider that the proposal is a good idea, but the decision has 
been emphatically made, so I have moved on.


William Overington

Friday 20 December 2019



Re: New Public Review on QID emoji

2019-11-13 Thread wjgo_10...@btinternet.com via Unicode

Asmus Freytag wrote as follows.

Just because a select group of people engages in communication about 
the arcane details of a proposed specification it doesn't mean that 
the outcome will benefit some entirely different and larger group 
communicate better.


This is logically true. However the same could have been said about 
people discussing the details of the then proposed Unicode specification 
over a quarter of a century ago, wanting to use 16 bits for each 
character used in ordinary English instead of just 8 bits. Yet Unicode 
has benefitted many many people around the world who may not know much 
about the underlying theory and technology.


I looked up the word 'arcane' and I opine that the details of the QID 
emoji proposal are not arcane. They are clear and available free to 
view, without registration, on the internet.


https://www.lexico.com/en/definition/arcane

https://www.unicode.org/review/pri408/


There's too much of the "might possibly" about this; …


It provides and opportunity for progress.

... and it is quite different from the early days of Unicode itself, 
where there was a groundswell of pent-up demand for a solution to the 
fragmented character encoding landscape; the discussions quickly 
became about the best way to do that, and about how to ensure that the 
result would be supported.


Yes, fine, and a good job was done and has benefitted many many people 
around the world. That was then and that was how things happened then 
for that situation. Now is now and this is a different approach for a 
different situation.


The current effort starts from an unrelated problem (Unicode not 
wanting to administer emoji applications) and in my analysis, 
seriously puts the cart before the horse.


Well I was not aware of that purported reason, but then I am not part of 
the inner loop so you may well therefore have more information about the 
motivation than is accessible to me.


William Overington

Wednesday 13 November 2019



Re: New Public Review on QID emoji

2019-11-12 Thread wjgo_10...@btinternet.com via Unicode

Asmus Freytag wrote as follows.

If leading standardization was such a good thing in communication, why 
don't we see more "dictionaries of words not yet in use"? After all, 
it would be a huge benefit for people coining new terms to have their 
definitions already worked out. Nothing inherent in the technology of 
dictionaries has directly prevented overtures in that direction, but 
it overwhelmingly remains a path not taken.



One wonders why.


The comparison is not of like with like.

In 1974 I invented a new concept in broadcasting. I coined the word 
telesoftware to denote my invention. I was able to use the word 
immediately, because the format for introducing a new word into English 
was already established. In 1976 I sent a letter to the editor of a 
trade magazine using the word. A gentleman who read the letter replied 
and that reply was published in a later issue of the magazine. 
Eventually, some years later, the word was added into the Oxford English 
Dictionary. At first into a volume of the supplement to the first 
edition and then, when it was published, in the second edition of the 
Oxford English Dictionary.


If someone wants to coin a new word something to do with character 
encoding then he or she can do so and just start using it, perhaps in a 
thread in this mailing list nd maybe other people will start using the 
new word too. Yet if a new emoji or some other symbol is desired to be 
introduced then the symbol cannot just be included in plain text. QID 
emoji can provide the capability to get something encoded promptly and 
used in plain text. I appreciate that there is then a font provision 
issue, yet with the way to encode the emoji or symbol available an 
attempt can be made to provide font support. Such font support 
possibility may well depend upon the platform.


I remember that when emoji were introduced into Unicode Doug Ewell 
predicted that the supporting of emoji on platforms would have the 
effect of providing support for other characters encoded in plane 1, 
when such support might have been much slower if emoji had not been 
encoded. Doug was right. Also colour font technology was developed and 
implemented and can today be used with any character, not just emoji.


So introducing QID emoji could possibly lead to the introduction of 
advances for other things than emoji as well as for emoji.


 > Just because you can write something that is a very detailed 
specification doesn't mean that it is, or ever should be, a standard.


Yes, but that does not mean that it should necessarily not become a 
standard. For communication to take place one needs to start somewhere. 
The QID emoji proposal is a start. It has been considered at (at least) 
two Unicode Technical Committee meetings and now there is a public 
review taking place.


Everyone has an opportunity to contribute comments and ideas to the 
public review and maybe progress will be made.


William Overington

Tuesday 12 November 2019



Re: New Public Review on QID emoji

2019-11-12 Thread wjgo_10...@btinternet.com via Unicode

Asmus Freytag wrote as follows.

While I have a certain understanding for the underlying concerns, it 
still is the case that this proposal promises to be a bad example of 
"leading standardization": throwing out a spec in the hopes it may be 
taken up and take off, instead of something that meets an expressed 
need of the stakeholders and that they are eagerly awaiting.


I suppose that it could be called "leading standardization" but I think 
that that is a good thing. Unicode has traditionally been locked into 
the past. If a symbol could be found carved in stone years ago than that 
was fine but anything for the future that could possibly become useful 
was a huge insuperable problem.


Yet for me "could possibly become useful" is a good reason for encoding, 
and QID emoji opens up great futuristic possibilities. For me the big 
problem with the proposal at present are the restrictions upon which QID 
items are valid to become encoded as QID emoji. So anything abstract is 
locked out. That to me is an unnecessary restriction, yet it could 
easily be removed. Yet abstract shapes are important in communication.


I regard QID emoji as a research project. The specification may need 
some alterations, maybe it is just the start of a whole new path of 
exploration in communication, much wider than emoji. I am a researcher 
and I try to find what is good in an idea and focus on that and think 
where a new idea can lead, applying critical consideration of ideas, yet 
trying to move forward rather than seizing on problems found as a reason 
for dismissing the whole idea. So find the problems, try to think round 
them, try to go forward. Look for what could be done and if it is good, 
try to do it. Try to go forward rather than quash.


That, then, finally undermines Unicode's implied guarantee as being 
the medium for unambiguous interchange. Giving up that guarantee seems 
a bad bargain.


Many recent emoji encoding proposals seem to delight, as if required, in 
providing multiple meanings for each newly proposed character.


There was a talk at the Unicode and Internationalization Conference a 
few years ago on what are the meanings of emoji. I was not there but 
there is a video available on YouTube.


https://www.youtube.com/watch?v=9ldSVbXbjl4

William Overington

Tuesday 12 November 2019






RE: New Public Review on QID emoji

2019-11-12 Thread wjgo_10...@btinternet.com via Unicode
WJGO  >>Yet if QID emoji are implemented by Unicode Inc. without also 
being implemented by ISO/IEC 10646 then that could lead to future 
problems, ...


Peter Constable wrote as follows.


Neither Unicode Inc. or ISO/IEC 10646 would _implement_ QID emoji.


That is correct. I should have made clear that I was referring to the 
specification for QID emoji rather than QID emoji. How quite to express 
precisely and concisely the formal acceptance of the specification by 
Unicode Inc. to become a published Unicode Inc. document giving the 
go-ahead for implementation by  anyone (not just software vendors) is 
somewhat difficult without using the word 'implement'.


Peter within his post also wrote  as follows.

The PRI doc mentions the possibility of a registry for QID sequences; 
a key benefit of a registry is that it may mitigate against these 
non-interop risks. But the current proposal does not in fact provide 
any mitigations for these issues other than the possibility that a QID 
sequence might be at some point become an RGI sequence.


I put forward on Friday 8 November 2019 a suggestion that might help 
towards solving the problem.


https://www.unicode.org/review/pri408/

William Overington

Tuesday 12 November 2019



Re: New Public Review on QID emoji

2019-10-30 Thread wjgo_10...@btinternet.com via Unicode

Hello everyone

I have been reading about QID emoji and what is proposed.

At present I have a question to which I cannot find the answer.

Is the QID emoji format, if approved by the Unicode Technical Committee 
going to be sent to the ISO/IEC 10646 committee for consideration by 
that committee?


As the QID emoji format is in a Unicode Technical Standard and does not 
include the encoding of any new _atomic_ characters, I am concerned that 
the answer to the above question may well be along the lines of "No" 
maybe with some reasoning as to why not.


Yet will a QID emoji essentially be _de facto_ a character even if not 
_de jure_ a character?


For a QID emoji will not just be "markup using existing characters from 
the ISO/IEC 10646 standard that is synchronized with Unicode", such as 
would be a markup that anyone could devise for use in his or her 
research and experimentation or indeed some public use, it will be a 
Unicode Inc. endorsed "whatever" that is very closely linked to The 
Unicode Standard even if not deemed to be part of it.


As I understand the situation, in some countries people take no (formal) 
notice as such of The Unicode Standard but rely solely on ISO/IEC 10646. 
Often this may well present no practical problems in information 
technology and its applications because the two standards are 
synchronized each with the other.


Yet if QID emoji are implemented by Unicode Inc. without also being 
implemented by ISO/IEC 10646 then that could lead to future problems, 
notwithstanding any _de jure_ situation that QID emoji are not 
characters, because they will be much more than Private Use characters 
yet less than characters that are in ISO/IEC 10646.


I am in favour of the encoding of the QID emoji mechanism and its 
practical application. However I wonder about what are the consequences 
for interoperability and communication if QID emoji become used - maybe 
quite widely - and yet the tag sequences are not discernable in meaning 
from ISO/IEC 10646 or any related ISO/IEC documents.


William Overington

Wednesday 30 October 2019








New Public Review on QID emoji

2019-10-29 Thread wjgo_10...@btinternet.com via Unicode

Hello everyone

I have recently learned that there is a new Public Review Issue on QID 
emoji.


https://www.unicode.org/review/pri408/

Also the closure date for PRI 405 has been given an extension.

http://www.unicode.org/review/pri405/

https://www.unicode.org/review/

William Overington

Tuesday 29 October 2019





QID emoji and screen readers

2019-09-25 Thread wjgo_10...@btinternet.com via Unicode

There is currently a Public Review, number 405.

http://www.unicode.org/review/pri405/

It is about the following document.

http://www.unicode.org/reports/tr51/tr51-17.html

The issue of screen readers is mentioned in the document.

I have thought of a possible solution.

However I am not expert on many of the details of what is allowed and 
what is not allowed in Unicode text, so I am posting the idea here so 
that depending upon any discussion that takes place, I might send in the 
idea as a formal response to the Public Review, or send in a modified 
form based on advice provided, or just abandon the idea as unworkable.


Here is the basic idea as I suggest it at the moment, please endorse, 
reject, discuss, improve the idea as you think best.


Decide what text, in any Unicode characters that you wish in any 
language you choose, is to be the text that the screen reader speaks.


Save that text as a UTF-8 byte sequence.

Encode that text in its UTF-8 form to produce a text string twice as 
long as that UTF-8 string such that, byte by byte, each UTF-8 byte is 
encoded as two hexadecimal "digits" each in the range 0..9, A..F and 
then use the tag version of each of those characters.


Add a U+0020 SPACE character at the front as the base character and add 
a cancel tag character at the end.


Include that string in the document after the QID emoji character.

 With my limited knowledge of the intricacies of Unicode it seems to me 
that that might well solve the problem.


Screen reader software could decode the tag characters into a string and 
try to speak it out.


Other software would just ignore the tag characters and display the 
space character.


William Overington

Wednesday 25 September 2019




Re: QID Emoji and their applications

2019-05-23 Thread wjgo_10...@btinternet.com via Unicode
There has been a development in that the following document has been 
published yesterday.


https://www.unicode.org/L2/L2019/19203-wd-uts51-17-draft.pdf

I refer to Annex C.2 of that document.

In that section the use of U+1F194 SQUARED ID is suggested as the base 
character for QID emoji.


I have thought of a mnemonic to help remember the code number - namely 1 
F then the number of letters in the phrase "a memorable code".


I have now produced a maquette font that uses that base character rather 
than the Private Use Area character that I used before.


Here is the substitution sequence that is within the new font.

sub u1F194 uE0051 uE0032 uE0031 uE0038 uE0035 uE0034 uE0033 uE007F -> 
glyph218543;


In order to try the experiment one needs to install the font.

So here is the sequence that one needs to enter in order to cause the 
display of the (stylized) glyph that represents the white crested tiger 
heron in these experiments.


u1F194 uE0051 uE0032 uE0031 uE0038 uE0035 uE0034 uE0033 uE007F

That is, the SQUARED ID character then tag characters 2 1 8 5 4 3 then 
the CANCEL TAG character.


Please note that for the glyph substitution to work the OpenType liga 
feature needs to be on in whichever OpenType-aware application that you 
use for the experiment .


The font fontQ218543maquette3 in the file fontQ218543maquette3.otf and 
is included in each of the following threads.


https://forum.affinity.serif.com/index.php?/topic/82885-can-you-find-the-white-crested-tiger-heron/

https://forum.high-logic.com/viewtopic.php?f=10=7941

You are welcome to download, install and use the font.



I noticed in the document published yesterday the following, on page 45 
of the PDF document.


quote

A subset of QIDs are associated with entities that would be valid for 
emoji. For example, risk
management (Q189447) and this (Q3109046) would not be valid. Of those 
that are valid,
Wikidata may not have associated images for the referenced entity, and 
such images would

rarely — if ever — be appropriate for use as images for emoji.

end quote

I have it in mind to suggest that there should not be that restriction 
and that all QID items should be valid for emoji and thus for 
interchange and interoperability in a plain text environment. Some may 
never be used yet I am thinking that to state that that some "would not 
be valid" would be a decision that could restrict progress and the 
implementation and beneficial application of new ideas in the future.


As it happens when we were discussing the possibility of abstract emoji 
some time ago in this mailing list I produced glyphs for "this" and for 
"that" as a gentleman had indirectly suggested the possibility. They are 
about 60% of the way down the following web page.


http://www.users.globalnet.co.uk/~ngo/abstract_emoji.htm

I accept that "this" as in "this and that" is not the same as "this" as 
used in some computer languages, yet maybe, just maybe, a glyph for 
"this" used in that context could be like my design for a glyph for 
"this" with a large round  dot, say in green, added in the lower right 
corner, so as to indicate a dot as used in listing the name of an object 
in some computer programming languages.


Restricting which QID items could be emoji also restricts the 
possibility of using the QID page data for text to speech. For example, 
risk management (Q189447) already has text in three languages. The 
encoding abstract items as QID items and thus as QID emoji could help 
communication through the language barrier, including possibly very 
helpfully in emergency situations.


I am thinking about a glyph for risk management.

I am wondering of a red jagged shape enclosed within a yellow rounded 
shape might work.


Shapes something like those in the following article.

https://en.wikipedia.org/wiki/Bouba/kiki_effect

William Overington

Thursday 23 May 2019



QID Emoji and their applications

2019-05-16 Thread wjgo_10...@btinternet.com via Unicode
There are two versions of a proposal document for QID emoji currently 
available, the original and a revised version.


https://www.unicode.org/L2/L2019/19082-qid-emoji.pdf

https://www.unicode.org/L2/L2019/19082r-qid-emoji.pdf

I sent in two comments about the original proposal and they are included 
in the following document.


https://www.unicode.org/L2/L2019/19124-pubrev.html

In the event, there is a response in the minutes of meeting #159 of the 
Unicode Technical Committee about my comments.


https://www.unicode.org/L2/L2019/19122.htm#159-A17

The response about the QID emoji proposal itself is listed in those 
minutes.


https://www.unicode.org/L2/L2019/19122.htm#159-A83

I tried some experimentation during early April 2019 and the three 
following threads may possibly be of interest to some readers. There is 
a font which readers are welcome to download and try if they so choose.


https://forum.affinity.serif.com/index.php?/topic/82885-can-you-find-the-white-crested-tiger-heron/

https://forum.high-logic.com/viewtopic.php?f=10t=7941

https://forum.high-logic.com/viewtopic.php?f=3t=7942

William Overington

Thursday 16 May 2019



Re: Spiral symbol

2019-02-19 Thread wjgo_10...@btinternet.com via Unicode
I seem to remember from reading a book many years ago, maybe around 
fifty years ago, something about one of the early chemists (Lavoisier?) 
having used two symbols, each a spiral, mirror images of each other, for 
two different things, maybe oxidation and reduction, in his manuscript 
but he had had to abandon the idea because when he wanted his text 
printed the printer did not have any metal sorts of a spiral and so he 
had to use some other format.


Does that ring a bell with anyone please?

I remember that where I read it that there were two spiral motifs 
displayed in the text.


At the time I was already into Private Press Printing using letterpress 
on a handpress using metal type and producing designs using single type 
border sorts, so that is perhaps why that has remained in my memory


William Overington
Tuesday 19 February 2019


Re: Encoding colour (from Re: Encoding italic)

2019-02-13 Thread wjgo_10...@btinternet.com via Unicode

Philippe Verdy replied to my post, including quoting me.

WJGO >>  Thinking about this further, for this application copies of the 
glyphs could be redesigned so as to be square and could be emoji-style 
and the meanings of the characters specifying which colour component is 
to be set could be changed so that they refer to the number previously 
entered using one or more of the special  digit characters. Thus the 
setting of colour components could be done in the same reverse notation 
way that the FORTH computer language works.


PV > FORTH is not relevant to this discussion.

I just mentioned FORTH because of the way that numbers are entered 
before the operators that act upon them. I have no intention to use a 
stack-based system: what I have in mind at present is much simpler than 
such a format.


Suppose that there are sixteen new characters, which are in plane 1 or 
maybe plane 14, but which for this mailing list post I will express 
using the digits 0 .. 9, Z, R, G, B, A, F.


There would be a virtual machine to set the colour, that would have 
registers h, r, g, b, a and a system service 
Set_Foreground_Colour(r,g,b,a).


Then the sixteen new characters would each have a default glyph, which 
could be displayed emoji-style, and, in an application environment that 
has the virtual machine available and switched on, would have the 
following effects in the virtual machine and their glyphs would not then 
be displayed. The virtual machine would be sandboxed.


Z h:=0;
0 h:=10*h ;
1 h:=10*h + 1;
2 h:=10*h + 2;
3 h:=10*h + 3;
4 h:=10*h + 4;
5 h:=10*h + 5;
6 h:=10*h + 6;
7 h:=10*h + 7;
8 h:=10*h + 8;
9 h:=10*h + 9;
R r:=h; h:=0;
G g:=h; h:=0;
B b:=h; h:=0;
A a:=h; h:=0;
F Set_Foreground_Colour(r,g,b,a);

Thus for example, remembering that these ordinary characters are just 
being used here for explanation in this post, and that the actual 
characters if encoded would probably be in plane 1 or plane 14:


So the sequence Z128R160G248B255AF could be used to set the foreground 
colour to an opaque blue colour.


It may be that upon investiation there could be specified a feature of 
the system service Set_Foreground_Colour(r,g,b,a) such that "if a=0 then 
a:=255;" so that total opacity of the colour is presumed unless 
otherwise set.


PV > You may create your "proof of concept" (tested on limited 
configurations) but it will just be private


Yes.

PV > [And so it should use PUA for full compatibility ...

Yes, I have in mind to use U+EA60 through to U+EA69 for the digits, as 
U+EA60 is Alt 6 so it makes it easier if some of the people who want 
to experiment want to enter characters using the Alt method.


William Overington
Monday 11 February 2019


Re: Vendor-assigned emoji (was: Encoding italic)

2019-02-13 Thread wjgo_10...@btinternet.com via Unicode

James Kass wrote:

Nobody disagreed and I think it’s a splendid suggestion.  If anyone is 
discussing drafting a proposal to accomplish this, please include me 
in the “cc”.


I too would like to receive copies of any discussions please.

In relation to the proposal, I opine that the facility should not allow 
a glyph that has been assigned to be changed at a later date.


Given that discussion is about a whole plane of code points being 
assigned, then even if the code points are assigned at fifty every month 
that would take over one hundred years to fill a whole plane. Certainly 
early months might have more than fifty allocations.


It is important to have stability as otherwise archived messages could 
have their meaning retrospectively changed with no easy way to find out 
the original meaning.


William Overington
Tuesday 12 February 2019




Re: Encoding italic

2019-02-11 Thread wjgo_10...@btinternet.com via Unicode

Doug Ewell wrote:


…, just as next to nobody is using the proposed VS14 mechanism …


Well, of course not because use of VS14 in a plain text document to 
record a request for an italic glyph version is not at the present time 
an official part of Unicode. The next scheduled Unicode Technical 
Committee meeting is due to start on 30 April 2019.


Here is a link to the proposal document.

https://www.unicode.org/L2/L2019/19063-italic-vs.pdf

VS14 is used to indicate a request for an italic glyph version in my 
VS14 Maquette font but that is clearly just a maquette font for 
experimental use to test the concept and show that it works. An 
application program that supports OpenType and that has the liga table 
switched on is needed in order to use the VS14 Maquette font to 
demonstrate that the use of VS14 in this way works.


https://forum.high-logic.com/viewtopic.php?f=10=7831

William Overington

Monday 11 February 2019



Encoding colour (from Re: Encoding italic)

2019-02-09 Thread wjgo_10...@btinternet.com via Unicode

Egmont Koblinger wrote:


Should this scheme be extended for colors, too? What to do with the

legacy 8/16 as well as the 256-color extensions wrt. the color
palette? Should Unicode go into the business of defining a fixed set
of colors, or allow to alter the palette colors using the OSC 4 and
friends escape sequences which supported by about half of the terminal
emulators out there?

Encoding colour is already a topic in relation to emoji and maybe could 
be extended to other characters.


A stateful method, though which might be useful for plain text streams 
in some applications, would be to encode as characters some of the 
glyphs for indicating colours and the digit characters to go with them 
from page 5 and from page 3 of the following publication.


http://www.users.globalnet.co.uk/~ngo/locse027.pdf

What to do with things that Unicode might also want to have, but 
doesn't exist in terminal emulators due to their nature, such as

switching to a different font size?

Well, if people were to want to do it, there could be a character 
encoded in the Specials section and then use that character as a base 
character and follow it with a sequence of tag characters.


William Overington

Saturday 9 February 2019


Re: Encoding colour (from Re: Encoding italic)

2019-02-09 Thread wjgo_10...@btinternet.com via Unicode

Previously I wrote:

A stateful method, though which might be useful for plain text streams 
in some applications, would be to encode as characters some of the 
glyphs for indicating colours and the digit characters to go with them 
from page 5 and from page 3 of the following publication.



http://www.users.globalnet.co.uk/~ngo/locse027.pdf


Thinking about this further, for this application copies of the glyphs 
could be redesigned so as to be square and could be emoji-style and the 
meanings of the characters specifying which colour component is to be 
set could be changed so that they refer to the number previously entered 
using one or more  of the special  digit characters. Thus the setting of 
colour components could be done in the same reverse notation way that 
the FORTH computer language works. Yet although the colour components 
thus set would be stateful until changed there would be no Escape 
sequence and if an application did not support interpretation of the 
characters as setting colours, they would just be displayed as glyphs, 
each either as a particular glyph or as a .notdef glyph.


William Overington
Saturday 9 February 2019



Re: Encoding italic

2019-02-08 Thread wjgo_10...@btinternet.com via Unicode

Andrew West wrote:


Just reminding you that "The initial character in a variation sequence

is never a nonspacing combining mark (gc=Mn) or a canonical
decomposable character" (The Unicode Standard 11.0 §23.4). This means
that a variation sequence cannot be defined for any precomposed
letters and diacritics, so for example you could not italicize the
word "fête" by simply adding VS14 after each letter because "ê" (in
NFC form) cannot act as the base for a variation sequence. You would
have to first convert any text to be italicized to NFD, then apply
VS14 to each non-combining character. This alone would make a VS
solution unacceptable in my opinion.

As it happens I was not aware of that before, and in fact I had already 
produced a PDF document for submission to the Unicode Technical 
Committee when I read your post.


https://www.unicode.org/L2/L2019/19063-italic-vs.pdf

So, it is an issue that needs to be resolved.

I am a researcher and I am looking for the best way to do this so as to 
get a good result that people can use, I am not trying to assert that my 
suggestion is necessarily the best way to do it. For example, I accepted 
the suggestion that James made.  The meeting of the Unicode Technical 
Committee is not due until April and hopefully some other people will 
send in documents and comments on the topic.


Hopefully the issue that Andrew mentions can be resolved in some way.

William Overington
Friday 8 February 2019




Re: Encoding italic

2019-02-05 Thread wjgo_10...@btinternet.com via Unicode

James Kass wrote:

William’s suggestion of floating a proposal for handling italics with 
VS14 might be an example of the old saying about “putting the cart 
before the horse”.


Well, a proposal just about using VS14 to indicate a request for an 
italic version of a glyph in plain text, including a suggestion of to 
which characters it could apply, would test whether such a proposal 
would be accepted to go into the Document Register for the Unicode 
Technical Committee to consider or just be deemed out of scope and 
rejected and not considered by the Unicode Technical Committee.


If the proposal were allowed to become included in the Document Register 
of the Unicode Technical Committee then if other people wish to submit 
comments and other proposals then that would be possible as it would 
have become established that such a topic is deemed acceptable for 
placing into the Document Register of the Unicode Technical Committee.


William Overington
Tuesday 5 February 2019





Re: Encoding italic

2019-01-31 Thread wjgo_10...@btinternet.com via Unicode
Is the way to try to resolve this for a proposal document to be produced 
for using Variation Selector 14 in order to produce italics and for the 
proposal document to be submitted to the Unicode Technical Committee?


If the proposal is allowed to go to the committee rather than being 
ruled out of scope, then we can know whether the Unicode Technical 
Committee will allow the encoding.


William Overington

Thursday 31 January 2019



Re: Encoding italic

2019-01-25 Thread wjgo_10...@btinternet.com via Unicode

Asmus Freytag wrote;

Other schemes, like a VS per code point, also suffer from being 
different in philosophy from "standard" rich text approaches. Best 
would be as standard extension to all the messaging systems (e.g. a 
common markdown language, supported by UI). A./


Yet that claim of what would be best would be stateful and statefulness 
is the very thing that Unicode seeks to avoid.


Plain text is the basic system and a Variation Selector mechanism after 
each character that is to become italicized is not stateful and can be 
implemented using existing OpenType technology.


If an organization chooses to develop and use a rich text format then 
that is a matter for that organization and any changing of formatting of 
how italics are done when converting between plain text and rich text is 
the responsibility of the organization that introduces its rich text 
format.


Twitter was just an example that someone introduced along the way, it 
was not the original request.


Also this is not only about messaging. Of primary importance is the 
conservation of texts in plain text format, for example, where a printed 
book has one word italicized in a sentence and the text is being 
transcribed into a computer.


William Overington
Friday 25 January 2019



Re: Encoding italic (was: A last missing link)

2019-01-24 Thread wjgo_10...@btinternet.com via Unicode

Andrew West wrote as follows:

… (note that the colored characters do not change the color of the 
emoji they are attached to [before or after, depending upon whether 
you are speaking French or English dialect of emoji], they are just 
intended as a visual indication of what colour you wish the emoji 
was).


I thought that the idea was that they could possibly be used for glyph 
substitution with an appropriate font, so that there could be, for 
example, a glyph of a polar bear.


I produced a proposal for some characters specifically intended each as 
a colour modifier character.


http://www.unicode.org/L2/L2018/18198-colour-mod-chars.pdf

I know that the document was once on the agenda for a UTC meeting but 
was not mentioned in the minutes, so I do not know whether consideration 
of the best plain text way to express a request for a particular colour 
for an emoji is still taking place and my document is just one of 
several possibilities being considered.


William Overington
Thursday 24 January 2019


-- Original Message --
From: "Andrew West via Unicode" 
To: "Mark E. Shoulson" 
Cc: "Unicode Discussion" 
Sent: Thursday, 2019 Jan 24 At 11:50
Subject: Re: Encoding italic (was: A last missing link)

On Thu, 24 Jan 2019 at 02:10, Mark E. Shoulson via Unicode
 wrote:


Unicode isn't here to encode cool new ideas that would be cool and
new.  It's here for writing what people already do.


http://www.unicode.org/L2/L2018/18141r2-emoji-colors.pdf

"Add 14 colored emoji characters for decorative and/or descriptive
uses. These may be used to indicate that an emoji has a different
color."

No evidence has been provided that anybody is currently using colored
blobs for this purpose (in fact emoji users have explicitly rejected
this method for indicating emoji color:
http://www.unicode.org/L2/L2018/18208-white-wine-rgi.pdf), just an
assertion that it would be a good idea if emoji users could add a
colored swatch to an existing emoji to indicate what color they want
it to represent (note that the colored characters do not change the
color of the emoji they are attached to [before or after, depending
upon whether you are speaking French or English dialect of emoji],
they are just intended as a visual indication of what colour you wish
the emoji was).

This proposal to add 14 additional colored circles, squares and hearts
is a perfect example of a cool new idea for something that the authors
think would be really useful, but for which there is no evidence of
existing use. The UTC should have rejected it as out of scope, but we
all know that rules and procedures do not apply to the Emoji
Subcommittee, so in fact this cool new idea will be included in
Unicode 12 in March.

Andrew



Re: Encoding italic (was: A last missing link)

2019-01-24 Thread wjgo_10...@btinternet.com via Unicode

Mark E. Shoulson wrote:


 It doesn't just take someone saying "out of scope."


It depends who it is. The theory is that people post in the mailing list 
as individuals, yet some people have very great influence.



 It also has to *be* out of scope!


Maybe, it depends who says what.

If someone chants the incantation, but I can persuasively argue that 
no, it IS in scope, then the spell fails.


Well, that may work for you, it does not work for me. Decision is by an 
unnamed gatekeeper and the Unicode Technical Committee does not get to 
discuss it, and discussing whether it is in scope or not is not allowed 
on the mailing list, because discussion of the topic is permanently 
banned.


Requesting the scope of Unicode be widened is not like other 
discussions being had here, so it makes sense that it should be 
treated differently, if treated at all.


Well, it does not make sense to me. If benefit could be produced by 
widening the scope of Unicode in some way, then it seems that it should 
be allowed to be discussed in the mailing list. And even if rejected at 
some time then still be allowed to be discussed at some future time as 
things may have changed.


There were discussions and agreements made as to the scope of Unicode, 
long ago.


Yes. Yet surely decisions made long ago should not lock out all progress 
as new ideas come along.


And just like you can't petition to change a character name, no matter 
how wrong it is, asking the Unicode consortium to redefine itself on 
your say-so is -not going to be taken seriously either.


Well, to me it is not like that. Yes, "a character name, no matter how 
wrong it is," is part of the stability guarantee and cannot be changed. 
Adding U+FFF7 as a base character for a tag digit sequence to uniquely 
and interoperably and stably define a code for a specific meaning for a 
localizable sentence would not, as far as I am aware, break any 
stability guarantees for Unicode. That might widen the scope of Unicode 
or it might be within the present scope, yet either way if it would be 
of benefit to end users then it would be reasonable to consider the idea 
and not block its discussion: and it is not a matter of my say-so at 
all, putting forward an idea for fair consideration is not at all the 
same as dictating that something should be done on someone's say-so. Was 
the scope of Unicode widened for emoji? First of all emoji were encoded 
for compatibility, but the Unicorn Face changed all that and now it an 
annual "could be useful" exercise of generating new characters based on 
people's ideas. For the avoidance of doubt I am not against that at all, 
it is fun and hopefully will continue.


I appreciate that the particular tag sequences to follow U+FFF7 might 
not be encoded by Unicode Inc., they might be encoded by an ISO 
committee, such as ISO/TC 37. Yet encoding U+FFF7 as the base character 
would allow a link as interoperable plain text rather than needing to 
use what amounts to a markup system.


Yet please remember that Unicode Inc. has defined and published base 
character plus tag sequences for the some flags, including the Welsh 
flag and the Scottish flag. Recently I was informed that they are not 
part of The Unicode Standard nor part of ISO/IEC 10646.


It appears that a Unicode Technical Note is being prepared with 
recommendations of how to express teletext control characters using 
Unicode characters, possibly using Escape sequences.


So a Unicode Inc. publication listing numbers and meanings together with 
a context guide for each to help translation of meanings for a 
localization file of code numbers and sentences into a target language 
seems not unreasonable.


As an example, the vertical line used as a separator, as a comma might 
be used within the sentence itself, so not using a  comma as a separator 
of fields.


812|Would you like to go to the day room?

Not all codes would be three digits, some would be longer. Codes where 
the first three digits are all different from the other two digits are 
three digits long. Codes where the first and third digit are the same 
have a length of 3 plus the value of the third digit. So, for example, 
codes starting 313 are six digits long and are a set of localizable 
sentences intended primarily for seeking information through the 
language barrier about relatives and friends after a disaster. The third 
digit being zero allows for even longer code numbers.


Discussing how to change the scope so that whatever-it-is IS in scope 
is a very large undertaking, …


Not necessarily. If the Unicode Technical Committee were to consider a 
proposal and, after consideration and discussion were to agree to 
proceed, it could all be done within a short discussion at a Unicode 
Technical Committee meeting and then the recommendation sent to the ISO 
committee.


I am not saying that it should be or that it will be, I am just trying 
to say that it is not necessarily a very large undertaking. The 

Re: Encoding italic (was: A last missing link)

2019-01-22 Thread wjgo_10...@btinternet.com via Unicode

Doug Ewell wrote:


And indeed, the forthcoming Unicode Technical Note we are going to be

writing to supplement the introduction of the characters in L2/19-025,
whether next year or later, will recommend ISO 6429 sequences like this
to implement features like background and foreground colors, inverse
video, and more, which are not available as plain-text characters.

Back in the late 1980s I had the opportunity for some time, from time to 
time, to use a colour terminal that was attached to a mainframe computer 
as if it were just another basic terminal attached to a mainframe. So it 
could be used just as a basic terminal attached to a mainframe, and it 
was often used in that manner.


Yet it also responded to Escape sequences which enabled it to do colour 
graphics, with, as best I remember now, commands to choose a colour and 
draw lines and so on.


I note with interest Doug's suggestion to use Escape routines.

However, these days systems tend to be more complicated at the 
underlying platform level and there is often communication between 
systems and so on and I wonder whether using Escape codes as such might 
be prone to strange problems in some circumstances before getting to the 
emulator software. With various platforms in common use I am wondering 
whether there might be problems in some cases. Maybe there is no issue 
and everything would be fine, yet I opine that that possibility of 
problems need to be looked at.


I wonder if a new character, say U+FFF6, in the Specials section, could 
be defined that could be regarded as just an ordinary printing character 
in many circumstances yet as having exactly the same meaning as the 
Escape character in some circumstances, such as in an emulator.


If that were done then the desired result could be achieved in a 
carefully structured manner rather than risk clashes over effectively 
sometimes trying to use the Escape character in two ways at the same 
time, perhaps with one of the ways being deep in the operating system 
and one in the terminal emulator with the way deep in the operating 
system usually winning.


William Overington

Tuesday 22 January 2019



Re: Encoding italic (was: A last missing link)

2019-01-19 Thread wjgo_10...@btinternet.com via Unicode

Asmus Freytag wote:

 This is an effort that's out of scope for Unicode to implement, or, I 
should say, if the Consortium were to take it on, it would be a 
separate technical standard from The Unicode Standard.


I note what you say, but what concerns me is that there seem to be an 
increasing number of matters where things are being done and neither The 
Unicode Standard nor ISO/IEC 10646 include them but they are in 
side-documents just at the Unicode website.


My understanding is that in some countries they will only use ISO/IEC 
19646 and not relate (is that the word?) to Unicode.


There are already issues over emoji ZWJ sequences that produce new 
meanings such as man ZWJ rocket producing the new meaning of astronaut 
and the 'base character plus tag characters' sequences to indicate a 
Welsh flag and a Scottish flag and if something is now done for italics 
(depending upon what it is that is done) the divergence between the two 
'groups of documents' widens even if at a precise 'definition of scope' 
meaning ISO/IEC and The Unicode Standard do not diverge.


PS: I really hate the creeping expansion of pseudo-encoding via VS 
characters.


Well, a variation sequence character is being used for requesting emoji 
display (is that a control code?), so it seems there is no lack of 
precedent to use one for italics. It seems that someone only has to say 
'out of scope' and then that is the veto for any consideration of a new 
idea for ISO/IEC 10646 or The Unicode Standard. There seems to be no way 
for a request to the committee to consider a widening of the scope to 
even be put before the committee if such a request is from someone 
outside the inner circle.



The only worse thing is adding novel control functions.


For example? Would you be including things like changing the colour of 
the jacket that an emojiperson is wearing?


It seems to me that it would be useful to have some codes that are 
ordinary characters in some contexts yet are control codes in others, 
for example for drawing simple line graphic diagrams within a document, 
such that they are just ordinary characters in a text document but, say, 
draw an image when included within a PDF (Portable Text Format) 
document. Their use would be optional so that people who did not want to 
use them could just ignore them and applications that did not use them 
as control codes could just display a glyph for each character. Yet 
there could be great possibilities for them if the chance to get them 
into ISO/IEC 10646 and The Unicode Standard were possible.


William Overington
Saturday 19 January 2019


William Over


Re: Encoding italic (was: A last missing link)

2019-01-18 Thread wjgo_10...@btinternet.com via Unicode

Mark E. Shoulson wrote:


…, since italic fonts generally are narrower than roman).


I remember reading years ago that that was why italic type was invented 
in the first place in the fifteenth century, so that more text could be 
got into small format books that could conveniently be carried around. 
That is, used for all of the text of a book. So not invented for 
expressing emphasis.


The only modern use of all italics text that I can remember seeing in 
printed books is when poems are typeset in italics.


William Overington
Friday 18 January 2019




Re: Encoding italic (was: A last missing link)

2019-01-15 Thread wjgo_10...@btinternet.com via Unicode

Hi

You are the gentleman who kindly made the Gentium typeface open source.

 Thank you for your generous gift to the world.

 > Use of variation selectors, a single character modifier, or 
combining characters also seem to be less useful options, as they act at 
the individual character level and are highly impractical. They also 
violate the key concept that italics are a way of marking a span of text 
as 'special' - not individual letters. Matched punctuation works the 
same way and is a good fit for italic.


Italics works differently from matched punctuation marks in that with 
italics there is a change to each glyph whereas with matched punctuation 
there is no change to the glyphs between the matched punctuation marks.


That difference leads to the significant difficult that there are thus 
two competing forces here.


One of those forces is what you have stated about the nature of italics. 
The other of those forces is that Unicode is not stateful.


Years ago I encoded some Private Use Area codes for such features as 
italics, with a start character and an end character to surround a span 
of text that would then be rendered in italics.  As a result of 
discussion and advice I learned that such characters are not acceptable 
for encoding into regular Unicode because the effect would be stateful. 
So yes, the method that I suggested and for which James Kass suggested 
an enhancement is peculiar when viewed against the theory of the way 
that italics are used, but neither the method nor the enhanced method is 
stateful and that is an important feature of them.


Now it would be possible for a software application program to have a 
feature for composing plain text where a span of text may be highlighted 
by a user of the software application program and every character 
(except perhaps spaces?) within that span of text has, at the click of a 
button, a VS14 character inserted after it.


I remember that when handsetting metal type the same space sorts were 
used with italics as with roman.


There could also be a button that could remove all VS14 characters, if 
any, from within a highlighted span of text.


So, for someone typesetting plain text and viewing plain text the effect 
could look to be in accordance with how you consider italics should be 
encoded, though for plain text interchange the encoding would still be 
by using a VS14 character after each character that one wishes to become 
displayed italicized.


William Overington
Tuesday 15 January 2019



Re: A last missing link for interoperable representation

2019-01-15 Thread wjgo_10...@btinternet.com via Unicode

Martin J. Dürst wrote:

So rich text technology is already way ahead when it comes to styled 
text. Do we want to encode background-color variant selectors in 
Unicode? If yes, how many?


Yes.

You would only need one.

Background colour was a feature of teletext in the United Kingdom from 
1976. It was very effective in its application.


In teletext, there were seven choices of foreground colour (red, green, 
yellow, blue, magenta, cyan, white), the default background was black.


The New Background control character caused the background colour to 
become the same as the current foreground colour in which text was being 
displayed. One could then change the foreground colour.


There was also a Black Background control code. This was necessary 
because neither text nor graphics could be black in teletext.


In teletext those control codes were stateful and applied until a change 
or to the end of the line of text, whichever came first.


So, given that Unicode is starting to encode colour choices for emoji 
and black is in the set of colours - and that might possibly extend to 
choosing colour for text - if Unicode were to encode CHANGE BACKGROUND 
COLOUR then the background colour could become the current foreground 
colour, even if that chosen foreground colour had just been selected and 
not actually used to colour text.


The implementation in Unicode need not be stateful.


[Hint: The last two questions are rhetorical.]


Maybe that was the intention, but the questions were asked and the 
concept is an interesting possibility for implementation.


William Overington

Tuesday 15 January 2019




Re: A last missing link for interoperable representation

2019-01-12 Thread wjgo_10...@btinternet.com via Unicode

James Kass wrote:

For the V.S. option there should be a provision for consistency and 
open-endedness to keep it simple.  Start with VS14 and work backwards 
for italic, …


I have now made, tested and published a font, VS14 Maquette, that uses 
VS14 to indicate italic.


https://forum.high-logic.com/viewtopic.php?f=10=7831=37561#p37561

William Overington
Saturday 12 January 2019



-- Original Message --
From: "James Kass via Unicode" 
To: unicode@unicode.org
Sent: Friday, 2019 Jan 11 At 01:48
Subject: Re: A last missing link for interoperable representation


Richard Wordingham responded,


... simply using an existing variation
selector character to do the job.


Actually, this might be a superior option.


For the V.S. option there should be a provision for consistency and 
open-endedness to keep it simple.  Start with VS14 and work backwards 
for italic, fraktur, antiqua...  (whatever the preferred order works out 
to be).  Or (better yet) start at VS17 and move forward (and change the 
rule that seventeen and up is only for CJK).


Is it true that many of the CJK variants now covered were previously 
considered by the Consortium to be merely stylistic variants?





Re: A last missing link for interoperable representation

2019-01-10 Thread wjgo_10...@btinternet.com via Unicode

Yesterday I wrote as follows.

I suggest that a solution to the problem would be to encode a 
COMBINING ITALICIZER character, such that it only applies to the 
character that it immediately follows. So, for example, to make the 
word apricot become displayed in italics one would use seven COMBINING 
ITALICIZER characters, one after each letter of the word apricot.


I have now made a test font. I used a Private Use Area code point and a 
visible glyph for this test. It works well.


https://forum.high-logic.com/viewtopic.php?f=10=7831

Would it be a good idea to encode such a character into Unicode?

William Overington
Thursday 10 January 2019



Re: A last missing link for interoperable representation

2019-01-09 Thread wjgo_10...@btinternet.com via Unicode
I suggest that a solution to the problem would be to encode a COMBINING 
ITALICIZER character, such that it only applies to the character that it 
immediately follows. So, for example, to make the word apricot become 
displayed in italics one would use seven COMBINING ITALICIZER 
characters, one after each letter of the word apricot. The display could 
be sorted out using an OpenType font by treating each pair of a letter 
and a COMBINING ITALICIZER as a ligature. If, say, the glyph name of 
COMBINING ITALICIZER were italic then the glyph for c italic could be 
c_italic and so plain text might well be copyable from a PDF (Portable 
Document Format) document and pasted to WordPad as plain text retaining 
the COMBINING ITALICIZER character, depending upon which application 
program is used to produce the PDF document and which PDF reader is in 
use.


This would seem a workable solution. Many years ago I suggested having 
characters that would have been comparable in use in plain text as to 
how italics is switched on and off in HTML (Hypertext Markup Language) 
yet was advised that such an encoding would make plain text stateful and 
thus would not be agreed for encoding. That objection might well still 
be the case today. So using a COMBINING ITALICIZER character would avoid 
that objection and would also provide a solution that could be 
straightforwardly implemented using existing OpenType technology.


William Overington
Wednesday 9 January 2019