Re: Another take on the English Apostrophe in Unicode

Marcel Schneider Wed, 17 Jun 2015 08:51:04 -0700

On Tue, Jun 16, 2015, Philippe Verdy  wrote:

> When ISO 8859-1 was designed (in fact in an early version by Digital for its 
> own version of Unix), allowing a bijective compatibility with 8-bit EBCDIC 
> and its C1 controls was still a priority.


> Microsoft abandoned its own develomment of Unix to develop DOS and extend it 
> with Windows in parallel of its work with IBM that had wanted DOS to be a 
> very lightweight version of CP/M, but without a scheduler in order to run 
> softwares on personal computers that could be used in small organisations 
> that could not buy its mainframes, but had to prepare documents and data that 
> could be reused on IBM mainframes...

 

Thank you Philippe for the information. It was a very good idea to build a 
system without need of C1 and to remap the two ranges to completing characters, 
which are indispensable, notably in French, and to start with the single quotes.


 

Marcel 

> Message du 16/06/15 21:08
> De : "Philippe Verdy" 
> A : "Marcel Schneider" 
> Copie à : "Doug Ewell" , "Unicode Mailing List" 
> Objet : Re: Another take on the English Apostrophe in Unicode
> 
>
>


>
2015-06-16 19:02 GMT+02:00 Marcel Schneider :
>

> On Mon, Jun 15, 2015, 17:12, Doug Ewell  wrote:
> 
> > Marcel Schneider wrote:
> [...]
> >> Microsoft’s choice of mashing up apostrophe and close-quote to end up
> >> with an unprocessable hybrid was wrong. Very wrong.
> 
> > Windows-1252 and the other Windows code pages were developed during the
> > 1980s, before Unicode, when almost all non-Asian character sets were
> > limited to 256 code points. The distinctions between apostrophe and
> > right-single-quote, weighed against the confusion caused by encoding two
> > identical-looking characters, would never have been sufficient back then
> > to justify separate encoding in this limited space.
> 
> I replied:
> 
> > The problem is not about code pages [...]
> 
> I thank you for your answers and I'll come back upon some of them below. 
> There's some new fact to bring first. 

> I concede that my last reply yesterday in the evening was incorrect. 

> Additionally to Microsoftʼs action in the late nineties urging Unicode to 
> give up its useful apostrophe recommendation (U+02BC), the design of code 
> page Windows-1252 is in my scope, indeed.
> 
> Since I learned there are very good and outweighing reasons to use U+02BC in 
> English, and that Unicodeʼs respective recommendation has been withdrawn with 
> respect to a widespread practice founded on CP Windows-1252, I soon suspected 
> there would have been means to get the apostrophe into this code page. Here I 
> need to recall that I always liked Windows-1252 for its completing the ISO 
> 8859-1 charset (which was so useless* it had to be replaced with ISO 8859-15).
> * Please read this paper (in French):
> http://cahiers.gutenberg.eu.org/cg-bin/article/CG_1996___25_65_0.pdf
> 
> Now that I examined closely CP1252ʼs layout, I found five empty code points, 
> five code points left out, in the C1 ranges that Microsoft allocated to 
> complete ISO 8859−1. Further, in this range, I found two MODIFIER LETTERS, 
> CIRCUMFLEX ACCENT (136, 0x88, later U+02C6) and SMALL TILDE (152, 0x98, 
> U+02DC). Obviously these two were added to disambiguate the extensively used 
> spacing characters ^ (94, 0x5E) and ~ (126, 0x7E) on one side, and the 
> diacritics on the other side. There is to say that when Windows was first 
> released, the left and right single quotes were the only printable characters 
> in these two ranges. All other characters plus × and ÷ came later. However, 
> CP1252 remained stable since Windows 98, for which € and the žŽ pair were 
> added. And five places were left empty.
> 
> From this on I got convinced that it would have been very easy to place the 
> letter apostrophe for example at code point 144 (0x90), near the single 
> turned comma quotation mark 0x91 and the single comma quotation mark 
> (right-single-quote) 0x92 which Microsoft recommended for use as apostrophe.
> 
> About the “confusion” everybody refers to, there is to say that the only way 
> to get people confused, is to do things and not to explain anything to 
> anybody. 
> 
> The core problem would have been that code pages were designed with 
> glyph-based *character* encoding in mind, not semantics-based *text* 
> encoding. 
> 
> I repeat that others had done even worse. Others, that is some of the 
> so-called expert members of the ISO WG designing 8859-1, as two of them not 
> even aimed at encoding all needed characters, by refusing deliberately to 
> encode the lower- and uppercase Œ digraph, and even the uppercase Ÿ. 
> Microsoftʼs big merit has been to produce a ready remedy to this bungling, 
> that as far as belongs to the OE digraph, was meant to match defective 
> peripherics.
> 
> Unfortunately, Microsoft visibly didnʼt finish this job, by aiming at 
> encoding characters only, and thus not allocating more than one code point to 
> that squiggle, whilst several places were left.
> 
> Well, all that are errors of the past. If I donʼt see a need, I wonʼt meet 
> it. By leaving œ and Œ off the charset, they got × and ÷ in, at least. Where 
> things ran really bad, was when Unicode was on, and code pages Procrustesʼ 
> beds were out. At least, they should have been. Whence that survival of 
> CP1252-based confusion?
> 
> Briefly, todayʼs text processing is suffering from the apostrophe-close-quote 
> confusion. This confusion is firstly out of date, and secondly it was 
> unnecessary from the beginning on. Avoiding this confusion at a trivial level 
> (by not getting users confused to have to use two similar squiggles), is 
> shifting it at process level, where the damage it causes is far bigger. Trust 
> me, users who find themselves unable to set apart the apostrophes when 
> theyʼre going to replace single quotes, wonʼt bless Microsoft for the input 
> simplicity! Ted Clancyʼs blog post is here to prove.
> https://tedclancy.wordpress.com/2015/06/03/which-unicode-character-should-represent-the-english-apostrophe-and-why-the-unicode-committee-is-very-wrong/
> 
> 
> It was time to get rid of that confusion when Unicode recommended U+02BC for 
> apostrophe. Microsoftʼs choice not to comply was wrong again. Very wrong.

>  

> Let's come back to some of your replies.
> 

>  

> On Mon, Jun 15, 2015, 20:14, Doug Ewell  wrote:

> 
> > I'd guess there are very few users who consciously see the use of U+2019
> > as both apostrophe and right-single-quote as a vestige of code pages, or
> > as a conscious effort by Evil Microsoft™ to force them into anything.

>  

> Quite sure. These are habits, not constraints. I'm not sharing such views 
> about a battle between Google and Microsoft and about ethical prefixes to 
> allocate to companies. The problem is that when the result proves to be bad, 
> the idea was, too.
> 

>  

> The mismatch between apostrophe and close-quote is now part of our culture. 
> We must get back pragmatic and see the advantages and disadvantages of each 
> option (ambiguating, disambiguating), not say "I believe there are no 
> disadvantages in ambiguating" or "there is no reason to disambiguate" or 
> "people will get confused, let them alone" or the like. These all are 
> statements. We must look at real people and listen to what they say to us. 
> Ted Clancy is one of them. When he's worried about that malfunctioning of 
> text-processing, who will keep smiling and stay saying "There's no problem, 
> there's no reason to fix that, it's all OK like it is"? 

> That's to despise people, that's to spit at their face.
> 

>  

> > Perhaps a UTC member can confirm whether this is fact or speculation.
> > Markus Kuhn's comment from 1999 about "couldn't Unicode follow
> > Microsoft...?" doesn't prove that Unicode was in fact strong-armed by
> > Microsoft.

>  

> Yes, please let us know.
> 

> 
> 
> 
> 
> Marcel Schneider




>

Re: Another take on the English Apostrophe in Unicode

Reply via email to