Re: Encoding of old compatibility characters

2017-04-05 Thread Asmus Freytag



I have  got MS Word 2002 and MS Excel 2000.
Maybe, later versions bring an amended version of Arial Unicode MS.


Maybe.

A./








Re: Encoding of old compatibility characters

2017-04-05 Thread Otto Stolz

Helo,

Am 31.03.2017 um 09:57 schrieb Eli Zaretskii:

Arial Unicode MS supports that character [U+23E8], FWIW.


From: Otto Stolz
Date: Tue, 4 Apr 2017 15:21:02 +0200

Not on my good ole Wndows XP SP3 system.


On 4/4/2017 7:58 AM, Eli Zaretskii wrote:

This here is also XP SP3.  Maybe some package I have installed updated
the font?


Am 04.04.2017 um 18:51 schrieb Asmus Freytag:

AFAIK, this font is / was installed by MS Office.


I have  got MS Word 2002 and MS Excel 2000.
Maybe, later versions bring an amended version of Arial Unicode MS.

Cheers,
   otto





Re: Encoding of old compatibility characters

2017-04-04 Thread Asmus Freytag

  
  
On 4/4/2017 7:58 AM, Eli Zaretskii
  wrote:


  
From: Otto Stolz 
Date: Tue, 4 Apr 2017 15:21:02 +0200

Am 31.03.2017 um 09:57 schrieb Eli Zaretskii:


  Arial Unicode MS supports that character [U+23E8], FWIW.



Not on my good ole Wndows XP SP3 system.

  
  
This here is also XP SP3.  Maybe some package I have installed updated
the font?




AFAIK, this font is / was installed by MS Office.
A./

  



Re: Encoding of old compatibility characters

2017-04-04 Thread Eli Zaretskii
> From: Otto Stolz 
> Date: Tue, 4 Apr 2017 15:21:02 +0200
> 
> Am 31.03.2017 um 09:57 schrieb Eli Zaretskii:
> > Arial Unicode MS supports that character [U+23E8], FWIW.
> 
> Not on my good ole Wndows XP SP3 system.

This here is also XP SP3.  Maybe some package I have installed updated
the font?


Re: Encoding of old compatibility characters

2017-04-04 Thread Otto Stolz

Am 31.03.2017 um 09:57 schrieb Eli Zaretskii:

Arial Unicode MS supports that character [U+23E8], FWIW.


Not on my good ole Wndows XP SP3 system.

Best wishes,
   Otto


Re: Encoding of old compatibility characters

2017-03-30 Thread Philippe Verdy
Probably you've installed the Noto collection on your Windows XP, or
installed some software that added fonts to the system (pmossibly with
updates to the Uniscribe library, suc has an old version of Office).
Anyway I would no longer trust XP for doing correct rendering for many
scripts, even with Uniscribe which is not needed for this simple character
mapped in the BMP. Now minimal support in XP is essentially by third party
software providers. Most have resigned, except Mozilla and some security
suites that attempt to fill the gaps abandonned now by Microsoft (but still
maintain it... because there are still various banks using it for example
in their ATM: you know it when you frequently see the ATM rebooting of
sometimes unusable as it has crashed with a "BSOD" displayed).


2017-03-30 22:17 GMT+02:00 António Martins-Tuválkin :

> On 2017.03.29 05:41, Leo Broukhis asked:
>
> Are you still using Windows 7 or RedHat 5, or something equally old?
>> Newer systems have ⏨ out of the box.
>>
>
> I’m using Windows XP and "⏨" renders perfectly as "₁₀". Maybe fonts can
> be installed without “upgrading” the whole operating system? Who knew?!
>
> --  .
> António MARTINS-Tuválkin   |  ()|
>    ||
> PT-1500-239 LisboaNão me invejo de quem tem |
> PT-2695-010 Bobadela LRS  carros, parelhas e montes |
> +351 934 821 700, +351 212 463 477só me invejo de quem bebe |
> facebook.com/profile.php?id=744658416 a água em todas as fontes |
> -
> De sable uma fonte e bordadura escaqueada de jalde e goles por timbre
> bandeira por mote o 1º verso acima e por grito de guerra "Mi rajtas!"
> -
>
>


Re: Encoding of old compatibility characters

2017-03-30 Thread António Martins-Tuválkin

On 2017.03.29 05:41, Leo Broukhis asked:


Are you still using Windows 7 or RedHat 5, or something equally old?
Newer systems have ⏨ out of the box.


I’m using Windows XP and "⏨" renders perfectly as "₁₀". Maybe fonts can
be installed without “upgrading” the whole operating system? Who knew?!

--  .
António MARTINS-Tuválkin   |  ()|
   ||
PT-1500-239 LisboaNão me invejo de quem tem |
PT-2695-010 Bobadela LRS  carros, parelhas e montes |
+351 934 821 700, +351 212 463 477só me invejo de quem bebe |
facebook.com/profile.php?id=744658416 a água em todas as fontes |
-
De sable uma fonte e bordadura escaqueada de jalde e goles por timbre
bandeira por mote o 1º verso acima e por grito de guerra "Mi rajtas!"
-



Re: Encoding of old compatibility characters

2017-03-28 Thread Leo Broukhis
On Tue, Mar 28, 2017 at 6:09 AM, Asmus Freytag  wrote:

> On 3/28/2017 4:00 AM, Ian Clifton wrote:
>
> I’ve used ⏨ a couple of times, without explanation, in my own
> emails—without, as far as I’m aware, causing any misunderstanding.
>
> Works especially well, whenever it renders as a box with 23E8 inscribed!
>
Are you still using Windows 7 or RedHat 5, or something equally old?
Newer systems have ⏨ out of the box.

Leo


Re: Encoding of old compatibility characters

2017-03-28 Thread Mark E. Shoulson

On 03/28/2017 09:09 AM, Asmus Freytag wrote:

On 3/28/2017 4:00 AM, Ian Clifton wrote:

I’ve used ⏨ a couple of times, without explanation, in my own
emails—without, as far as I’m aware, causing any misunderstanding.


Works especially well, whenever it renders as a box with 23E8 inscribed!

A./


I ⬚ Unicode.

~mark



Re: Encoding of old compatibility characters

2017-03-28 Thread Mark E. Shoulson
I don't think I want my text renderer to be *that* smart.  If I want ⏨, 
I'll put ⏨.  If I want a multiplication sign or something, I'll put 
that.  Without the multiplication sign, it's still quite understandable, 
more so than just "e".


It is valid for a text rendering engine to render "g" with one loop or 
two.  I don't think it's valid for it to render "g" as "xg" or "-g" or 
anything else.  The ⏨ character looks like it does.  You don't get to 
add multiplication signs to it because you THINK you know what I'm 
saying with it.  And using 20⏨ to mean "twenty base ten" sounds 
perfectly reasonable to me also.


~mark

On 03/28/2017 05:33 AM, Philippe Verdy wrote:
Ideally a smart text renderer could as well display that glyph with a 
leading multiplication sign (a mathematical middle dot) and implicitly 
convert the following digits (and sign) as real superscript/exponent 
(using contextual substitution/positioning like for Eastern 
Arabic/Urdu), without necessarily writing the 10 base with smaller 
digits.
Without it, people will want to use 20⏨ to mean it is the decimal 
number twenty and not hexadecimal number thirty two.


2017-03-28 11:18 GMT+02:00 Frédéric Grosshans 
>:


Le 28/03/2017 à 02:22, Mark E. Shoulson a écrit :

Aw, but ⏨ is awesome!  It's much cooler-looking and more
visually understandable than "e" for exponent notation. In
some code I've been playing around with I support it as a
valid alternative to "e".


I Agree 1⏨3 times with you on this !

Frédéric






Re: Encoding of old compatibility characters

2017-03-28 Thread Asmus Freytag

  
  
On 3/28/2017 4:00 AM, Ian Clifton
  wrote:


  I’ve used ⏨ a couple of times, without explanation, in my own
emails—without, as far as I’m aware, causing any misunderstanding.

Works especially well, whenever it renders
as a box with 23E8 inscribed!
A./

  



Re: Encoding of old compatibility characters

2017-03-28 Thread Ian Clifton
Philippe Verdy  writes:

> Ideally a smart text renderer could as well display that glyph with a
> leading multiplication sign (a mathematical middle dot) and implicitly
> convert the following digits (and sign) as real superscript/exponent
> (using contextual substitution/positioning like for Eastern
> Arabic/Urdu), without necessarily writing the 10 base with smaller
> digits.

Actually, I would see this as putting unnecessary clutter back in! I
would say the advantage of the ⏨ notation, introduced with Algol 60, is
that it subsumes and makes implicit the multiplication and
exponentiation operators, resulting in a visually compact denotation of
a real number in “scientific notation”, and it does so with a single
symbol that hints at its own meaning.

I’ve used ⏨ a couple of times, without explanation, in my own
emails—without, as far as I’m aware, causing any misunderstanding.

> Without it, people will want to use 20⏨ to mean it is the decimal
> number twenty and not hexadecimal number thirty two.

Yes, this ambiguity is a drawback. Hopefully, the use cases should be
sufficiently different that real confusion would be unlikely (and of
course, normally, U+23E8 should never be used to denote decimal number
base).

-- 
Ian Clifton ⚗ ℡: +44 1865 275677
Chemistry Research Laboratory ℻: +44 1865 285002
Oxford University : ian.clif...@chem.ox.ac.uk
Mansfield Road   Oxford OX1 3TA   UK




Re: Encoding of old compatibility characters

2017-03-28 Thread Philippe Verdy
Ideally a smart text renderer could as well display that glyph with a
leading multiplication sign (a mathematical middle dot) and implicitly
convert the following digits (and sign) as real superscript/exponent (using
contextual substitution/positioning like for Eastern Arabic/Urdu), without
necessarily writing the 10 base with smaller digits.
Without it, people will want to use 20⏨ to mean it is the decimal number
twenty and not hexadecimal number thirty two.

2017-03-28 11:18 GMT+02:00 Frédéric Grosshans 
:

> Le 28/03/2017 à 02:22, Mark E. Shoulson a écrit :
>
>> Aw, but ⏨ is awesome!  It's much cooler-looking and more visually
>> understandable than "e" for exponent notation.  In some code I've been
>> playing around with I support it as a valid alternative to "e".
>>
>
> I Agree 1⏨3 times with you on this !
>
> Frédéric
>
>


Re: Encoding of old compatibility characters

2017-03-28 Thread Frédéric Grosshans

Le 28/03/2017 à 02:22, Mark E. Shoulson a écrit :
Aw, but ⏨ is awesome!  It's much cooler-looking and more visually 
understandable than "e" for exponent notation.  In some code I've been 
playing around with I support it as a valid alternative to "e". 


I Agree 1⏨3 times with you on this !

Frédéric



Re: Encoding of old compatibility characters

2017-03-27 Thread Mark E. Shoulson

On 03/27/2017 05:46 PM, Frédéric Grosshans wrote:
An example of a legacy character successfully  encoded recently is ⏨ 
U+23E8 DECIMAL EXPONENT SYMBOL, encoded in Unicode 5.2.
It came from the Soviet standard GOST 10859-64 and the German standard 
ALCOR. And was proposed by Leo Broukhis in this proposal 
http://www.unicode.org/L2/L2008/08030r-subscript10.pdf . It follows a 
discussion on this mailing list here 
http://www.unicode.org/mail-arch/unicode-ml/y2008-m01/0123.html, where 
Ken Whistler was already sceptical about the usefulness of this encoding. 
Aw, but ⏨ is awesome!  It's much cooler-looking and more visually 
understandable than "e" for exponent notation.  In some code I've been 
playing around with I support it as a valid alternative to "e".


~mark


RE: Encoding of old compatibility characters

2017-03-27 Thread Jonathan Rosenne
GROUP MARK

Best Regards,

Jonathan Rosenne
-Original Message-
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Fr?d?ric 
Grosshans
Sent: Tuesday, March 28, 2017 1:05 AM
To: unicode
Subject: Re: Encoding of old compatibility characters

Another example, about to be encoded, it the GOUP MARK, used on old IBM 
computers (proposal: ML threads: 
http://www.unicode.org/mail-arch/unicode-ml/y2015-m01/0040.html , and 
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0367.html )

Le 27/03/2017 à 23:46, Frédéric Grosshans a écrit :
> An example of a legacy character successfully  encoded recently is ⏨
> U+23E8 DECIMAL EXPONENT SYMBOL, encoded in Unicode 5.2.
> It came from the Soviet standard GOST 10859-64 and the German standard 
> ALCOR. And was proposed by Leo Broukhis in this proposal 
> http://www.unicode.org/L2/L2008/08030r-subscript10.pdf . It follows a 
> discussion on this mailing list here 
> http://www.unicode.org/mail-arch/unicode-ml/y2008-m01/0123.html, where 
> Ken Whistler was already sceptical about the usefulness of this encoding.
>
>
> Le 27/03/2017 à 16:44, Charlotte Buff a écrit :
>> I’ve recently developed an interest in old legacy text encodings and 
>> noticed that there are various characters in several sets that don’t 
>> have a Unicode equivalent. I had already started research into these 
>> encodings to eventually prepare a proposal until I realised I should 
>> probably ask on the mailing list first whether it is likely the UTC 
>> will be interested in those characters before I waste my time on a 
>> project that won’t achieve anything in the end.
>>
>> The character sets in question are ATASCII, PETSCII, the ZX80 set, 
>> the Atari ST set, and the TI calculator sets. So far I’ve only 
>> analyzed the ZX80 set in great detail, revealing 32 characters not in 
>> the UCS. Most characters are pseudo-graphics, simple pictographs or 
>> inverted variants of other characters.
>>
>> Now, one of Unicode’s declared goals is to enable round-trip 
>> compatibility with legacy encodings. We’ve accumulated a lot of weird 
>> stuff over the years in the pursuit of this goal. So it would be 
>> natural to assume that the unencoded characters from the mentioned 
>> sets would also be eligible for inclusion in the UCS. On the other 
>> hand, those encodings are for the most part older than Unicode and so 
>> far there seems to have been little interest in them from the UTC or 
>> WG2, or any of their contributors. Something tells me that if these 
>> character sets were important enough to consider for inclusion, they 
>> would have been encoded a long time ago along with all the other 
>> stuff in Block Elements, Box Drawings, Miscellaneous Symbols etc.
>>
>> Obviously the character sets in question don’t receive much use 
>> nowadays (and some weren’t even that relevant in their time, either), 
>> which leads to me wonder whether further putting work into this 
>> proposal would be worth it.
>
>




Re: Encoding of old compatibility characters

2017-03-27 Thread Frédéric Grosshans
Another example, about to be encoded, it the GOUP MARK, used on old IBM 
computers (proposal: ML threads: 
http://www.unicode.org/mail-arch/unicode-ml/y2015-m01/0040.html , and 
http://unicode.org/mail-arch/unicode-ml/y2007-m05/0367.html )


Le 27/03/2017 à 23:46, Frédéric Grosshans a écrit :
An example of a legacy character successfully  encoded recently is ⏨ 
U+23E8 DECIMAL EXPONENT SYMBOL, encoded in Unicode 5.2.
It came from the Soviet standard GOST 10859-64 and the German standard 
ALCOR. And was proposed by Leo Broukhis in this proposal 
http://www.unicode.org/L2/L2008/08030r-subscript10.pdf . It follows a 
discussion on this mailing list here 
http://www.unicode.org/mail-arch/unicode-ml/y2008-m01/0123.html, where 
Ken Whistler was already sceptical about the usefulness of this encoding.



Le 27/03/2017 à 16:44, Charlotte Buff a écrit :
I’ve recently developed an interest in old legacy text encodings and 
noticed that there are various characters in several sets that don’t 
have a Unicode equivalent. I had already started research into these 
encodings to eventually prepare a proposal until I realised I should 
probably ask on the mailing list first whether it is likely the UTC 
will be interested in those characters before I waste my time on a 
project that won’t achieve anything in the end.


The character sets in question are ATASCII, PETSCII, the ZX80 set, 
the Atari ST set, and the TI calculator sets. So far I’ve only 
analyzed the ZX80 set in great detail, revealing 32 characters not in 
the UCS. Most characters are pseudo-graphics, simple pictographs or 
inverted variants of other characters.


Now, one of Unicode’s declared goals is to enable round-trip 
compatibility with legacy encodings. We’ve accumulated a lot of weird 
stuff over the years in the pursuit of this goal. So it would be 
natural to assume that the unencoded characters from the mentioned 
sets would also be eligible for inclusion in the UCS. On the other 
hand, those encodings are for the most part older than Unicode and so 
far there seems to have been little interest in them from the UTC or 
WG2, or any of their contributors. Something tells me that if these 
character sets were important enough to consider for inclusion, they 
would have been encoded a long time ago along with all the other 
stuff in Block Elements, Box Drawings, Miscellaneous Symbols etc.


Obviously the character sets in question don’t receive much use 
nowadays (and some weren’t even that relevant in their time, either), 
which leads to me wonder whether further putting work into this 
proposal would be worth it.







Re: Encoding of old compatibility characters

2017-03-27 Thread Frédéric Grosshans
An example of a legacy character successfully  encoded recently is ⏨ 
U+23E8 DECIMAL EXPONENT SYMBOL, encoded in Unicode 5.2.
It came from the Soviet standard GOST 10859-64 and the German standard 
ALCOR. And was proposed by Leo Broukhis in this proposal 
http://www.unicode.org/L2/L2008/08030r-subscript10.pdf . It follows a 
discussion on this mailing list here 
http://www.unicode.org/mail-arch/unicode-ml/y2008-m01/0123.html, where 
Ken Whistler was already sceptical about the usefulness of this encoding.



Le 27/03/2017 à 16:44, Charlotte Buff a écrit :
I’ve recently developed an interest in old legacy text encodings and 
noticed that there are various characters in several sets that don’t 
have a Unicode equivalent. I had already started research into these 
encodings to eventually prepare a proposal until I realised I should 
probably ask on the mailing list first whether it is likely the UTC 
will be interested in those characters before I waste my time on a 
project that won’t achieve anything in the end.


The character sets in question are ATASCII, PETSCII, the ZX80 set, the 
Atari ST set, and the TI calculator sets. So far I’ve only analyzed 
the ZX80 set in great detail, revealing 32 characters not in the UCS. 
Most characters are pseudo-graphics, simple pictographs or inverted 
variants of other characters.


Now, one of Unicode’s declared goals is to enable round-trip 
compatibility with legacy encodings. We’ve accumulated a lot of weird 
stuff over the years in the pursuit of this goal. So it would be 
natural to assume that the unencoded characters from the mentioned 
sets would also be eligible for inclusion in the UCS. On the other 
hand, those encodings are for the most part older than Unicode and so 
far there seems to have been little interest in them from the UTC or 
WG2, or any of their contributors. Something tells me that if these 
character sets were important enough to consider for inclusion, they 
would have been encoded a long time ago along with all the other stuff 
in Block Elements, Box Drawings, Miscellaneous Symbols etc.


Obviously the character sets in question don’t receive much use 
nowadays (and some weren’t even that relevant in their time, either), 
which leads to me wonder whether further putting work into this 
proposal would be worth it.





Re: Encoding of old compatibility characters

2017-03-27 Thread Philippe Verdy
TI caculators are not antique tools, and when I see how most calculators
for Android or Windows 10 are now, they are not as usable as the scientific
calculators we had in the past.

I know at least one excellent calculator that works with Android and
Windows and finally has the real look and feel of a true calculator, and
that display correct labels and excellent formulas (with the conventional
2D layout), my favorite is now "HyperCalc" (it has a free version and a
paid version). The Android version is a bit more advanced. The paid version
has only a few additional features not so needed (such as themes). The
interface is clear, and there are several input modes for expressions. When
you look at the default Calculator of Windows 10 it has never been worse
than what it is now (it was much better in Windows 7 or before, even if it
had many limitations).

Also entering expressions in Excel is really antique, and many functions
have stupid limitations (in addition, spreadsheets are not even portable
across versions of Office or don't render the same, and sometimes
unexpectedly produce different results).

But this is not at all a problem of character encoding: we don't need
Unicode at all to create a convenient UI in such applications. Even with a
web_based interface, you can do a lot with HTML canvas and SVG and have a
scalable UI without having to use dirty text tricks or using PUA fonts.


2017-03-27 19:18 GMT+02:00 Ken Whistler :

>
> On 3/27/2017 7:44 AM, Charlotte Buff wrote:
>
>> Now, one of Unicode’s declared goals is to enable round-trip
>> compatibility with legacy encodings. We’ve accumulated a lot of weird stuff
>> over the years in the pursuit of this goal. So it would be natural to
>> assume that the unencoded characters from the mentioned sets [ATASCII,
>> PETSCII, the ZX80 set, the Atari ST set, and the TI calculator sets] would
>> also be eligible for inclusion in the UCS.
>>
>
> Actually, it wouldn't be.
>
> The original goal was to ensure round-trip compatibility with *important*
> legacy character encodings, *for which there was a need to convert legacy
> data, and/or an ongoing need to representation of text for interchange*.
>
> From Unicode 1.0: "The Unicode standard includes the character content of
> all major International Standards approved and published before December
> 31, 1990... [long list ensues] ... and from various industry standards in
> common use (such as code pages and character sets from Adobe, Apple, IBM,
> Lotus, Microsoft, WordPerfect, Xerox and others)."
>
> Even as long ago as 1990, artifacts such as the Atari ST set were
> considered obsolete antiquities, and did not rise to the level of the kind
> of character listings that we considered when pulling together the original
> repertoire.
>
> And there are several observations to be made about the "weird stuff" we
> have accumulated over the years in the pursuit of compatibility. A lot of
> stuff that was made up out of whole cloth, rather than being justified by
> existing, implemented character sets used in information interchange at the
> time, came from the 1991/1992 merger process between the Unicode Standard
> and the ISO/IEC 10646 drafts. That's how Unicode acquired blocks full of
> Arabic ligatures, for example.
>
> Other, subsequent additions of small (or even largish) sets of oddball
> "characters" that don't fit the prototypical sets of characters for scripts
> and/or well-behaved punctuation and symbols, typically have come in with
> argued cases for the continued need in current text interchange, for
> complete coverage. For example, that is how we ended up filling out Zapf
> dingbats with some glyph pieces that had been omitted in the initial
> repertoire for that block. More recently, of course, the continued
> importance of Wingdings and Webdings font encodings on the Windows platform
> led the UTC to filling out the set of graphical dingbats to cover those
> sets. And of course, we first started down the emoji track because of the
> need to interchange text originating from widely deployed Japanese carrier
> sets implemented as extensions to Shift-JIS.
>
> I don't think the early calculator character sets, or sets for the Atari
> ST and similar early consumer computer electronics fit the bill, precisely
> because there isn't a real text data interchange case to be made for
> character encoding. Many of the elements you have mentioned, for example,
> like the inverse/negative squared versions of letters and symbols, are
> simply idiosyncratic aspects of the UI for the devices, in an era when font
> generators were hard coded and very primitive indeed.
>
> Documenting these early uses, and pointing out parts of the UI and
> character usage that aren't part of the character repertoire in the Unicode
> Standard seems an interesting pursuit to me. But absent a true textual data
> interchange issue for these long-gone, obsolete devices, I don't really see
> a case to be made for 

Re: Encoding of old compatibility characters

2017-03-27 Thread Michael Everson
On 27 Mar 2017, at 17:49, Markus Scherer  wrote:
> 
> I think the interest has been low because very few documents survive in these 
> encodings, and even fewer documents using not-already-encoded symbols.

That doesn’t mean that the few people who may need the characters now or in the 
centuries to come shouldn’t have them. If we’ve encoded some characters like 
these for compatibility, it’s only fair to be thorough. 

> In my opinion, this is a good use of the Private Use Area among a very small 
> group of people.

I’d say not, since they’d be using some encoded characters and having to 
augment it with some PUA characters.

> See also https://en.wikipedia.org/wiki/ConScript_Unicode_Registry

That’s not for this sort of thing at all at all. The UCS is for this sort of 
thing.

Michael Everson

> ​PS: I had a ZX 81, then a Commodore 64, then an Atari ST, and at school used 
> a Commodore PET...

Lucky man. :-)


Re: Encoding of old compatibility characters

2017-03-27 Thread Ken Whistler


On 3/27/2017 7:44 AM, Charlotte Buff wrote:
Now, one of Unicode’s declared goals is to enable round-trip 
compatibility with legacy encodings. We’ve accumulated a lot of weird 
stuff over the years in the pursuit of this goal. So it would be 
natural to assume that the unencoded characters from the mentioned 
sets [ATASCII, PETSCII, the ZX80 set, the Atari ST set, and the TI 
calculator sets] would also be eligible for inclusion in the UCS.


Actually, it wouldn't be.

The original goal was to ensure round-trip compatibility with 
*important* legacy character encodings, *for which there was a need to 
convert legacy data, and/or an ongoing need to representation of text 
for interchange*.


From Unicode 1.0: "The Unicode standard includes the character content 
of all major International Standards approved and published before 
December 31, 1990... [long list ensues] ... and from various industry 
standards in common use (such as code pages and character sets from 
Adobe, Apple, IBM, Lotus, Microsoft, WordPerfect, Xerox and others)."


Even as long ago as 1990, artifacts such as the Atari ST set were 
considered obsolete antiquities, and did not rise to the level of the 
kind of character listings that we considered when pulling together the 
original repertoire.


And there are several observations to be made about the "weird stuff" we 
have accumulated over the years in the pursuit of compatibility. A lot 
of stuff that was made up out of whole cloth, rather than being 
justified by existing, implemented character sets used in information 
interchange at the time, came from the 1991/1992 merger process between 
the Unicode Standard and the ISO/IEC 10646 drafts. That's how Unicode 
acquired blocks full of Arabic ligatures, for example.


Other, subsequent additions of small (or even largish) sets of oddball 
"characters" that don't fit the prototypical sets of characters for 
scripts and/or well-behaved punctuation and symbols, typically have come 
in with argued cases for the continued need in current text interchange, 
for complete coverage. For example, that is how we ended up filling out 
Zapf dingbats with some glyph pieces that had been omitted in the 
initial repertoire for that block. More recently, of course, the 
continued importance of Wingdings and Webdings font encodings on the 
Windows platform led the UTC to filling out the set of graphical 
dingbats to cover those sets. And of course, we first started down the 
emoji track because of the need to interchange text originating from 
widely deployed Japanese carrier sets implemented as extensions to 
Shift-JIS.


I don't think the early calculator character sets, or sets for the Atari 
ST and similar early consumer computer electronics fit the bill, 
precisely because there isn't a real text data interchange case to be 
made for character encoding. Many of the elements you have mentioned, 
for example, like the inverse/negative squared versions of letters and 
symbols, are simply idiosyncratic aspects of the UI for the devices, in 
an era when font generators were hard coded and very primitive indeed.


Documenting these early uses, and pointing out parts of the UI and 
character usage that aren't part of the character repertoire in the 
Unicode Standard seems an interesting pursuit to me. But absent a true 
textual data interchange issue for these long-gone, obsolete devices, I 
don't really see a case to be made for spending time in the UTC defining 
a bunch of compatibility characters to encode for them.


--Ken



Re: Encoding of old compatibility characters

2017-03-27 Thread Michael Everson
On 27 Mar 2017, at 18:08, Garth Wallace  wrote:
> 
> Apple IIs also had inverse-video letters, and some had "MouseText" 
> pseudographics used to simulate a Mac-like GUI in text mode.
> 
> I know that a couple of fonts from Kreative put these in the PUA and 
> Nishiki-Teki follows their lead.

I think it’s better to be inclusive rather than exclusive. PUA isn’t stable, 
and marginal as this stuff may be, we stuff encoded that is far more marginal… 
nothing more frustrating than expecting something and finding it missing. 

Michael Everson


Re: Encoding of old compatibility characters

2017-03-27 Thread Garth Wallace
Apple IIs also had inverse-video letters, and some had "MouseText"
pseudographics used to simulate a Mac-like GUI in text mode.

I know that a couple of fonts from Kreative put these in the PUA and
Nishiki-Teki follows their lead.

On Mon, Mar 27, 2017 at 9:25 AM Charlotte Buff <
irgendeinbenutzern...@gmail.com> wrote:

> > It’s hard to say without knowing what the characters are.
>
> For the ZX80, the missing characters include five block elements (top and
> bottom halfs of MEDIUM SHADE, as well as their inverse counterparts), and
> inverse/negative squared variants of European digits and the following
> symbols: " £ $ : ? ( ) - + * / = < > ; , .
> Negative squared digits may be unifiable with negative circled digits.
>
> ATASCII includes inverse variants of box drawing characters. I have to
> check whether some other pictographs are unifiable with existing characters.
>
> PETSCII includes some box drawings and vertical scan lines that are
> probably not unifiable.
>
> Atari ST includes two simple pictographs that were used as graphical UI
> elements. They look like a negative, low diagonal stroke and a negative
> diamond respectively. It also has six characters that together form logos
> which I wasn’t going to propose.
>
> TI calculators include a single character for a superscript minus 1. I
> don’t have a lot of information available about this set at the moment.
>


Re: Encoding of old compatibility characters

2017-03-27 Thread Markus Scherer
I think the interest has been low because very few documents survive in
these encodings, and even fewer documents using not-already-encoded symbols.

In my opinion, this is a good use of the Private Use Area among a very
small group of people.
See also https://en.wikipedia.org/wiki/ConScript_Unicode_Registry

Best regards,
markus
​
PS: I had a ZX 81, then a Commodore 64, then an Atari ST, and at school
used a Commodore PET...


Re: Encoding of old compatibility characters

2017-03-27 Thread Charlotte Buff
> It’s hard to say without knowing what the characters are.

For the ZX80, the missing characters include five block elements (top and
bottom halfs of MEDIUM SHADE, as well as their inverse counterparts), and
inverse/negative squared variants of European digits and the following
symbols: " £ $ : ? ( ) - + * / = < > ; , .
Negative squared digits may be unifiable with negative circled digits.

ATASCII includes inverse variants of box drawing characters. I have to
check whether some other pictographs are unifiable with existing characters.

PETSCII includes some box drawings and vertical scan lines that are
probably not unifiable.

Atari ST includes two simple pictographs that were used as graphical UI
elements. They look like a negative, low diagonal stroke and a negative
diamond respectively. It also has six characters that together form logos
which I wasn’t going to propose.

TI calculators include a single character for a superscript minus 1. I
don’t have a lot of information available about this set at the moment.


Re: Encoding of old compatibility characters

2017-03-27 Thread Michael Everson
On 27 Mar 2017, at 15:44, Charlotte Buff  
wrote:
> 
> I’ve recently developed an interest in old legacy text encodings and noticed 
> that there are various characters in several sets that don’t have a Unicode 
> equivalent. I had already started research into these encodings to eventually 
> prepare a proposal until I realised I should probably ask on the mailing list 
> first whether it is likely the UTC will be interested in those characters 
> before I waste my time on a project that won’t achieve anything in the end.

It’s hard to say without knowing what the characters are. 

Michael Everson