On Tue, Jun 16, 2015, Philippe Verdy wrote: > When ISO 8859-1 was designed (in fact in an early version by Digital for its > own version of Unix), allowing a bijective compatibility with 8-bit EBCDIC > and its C1 controls was still a priority.
> Microsoft abandoned its own develomment of Unix to develop DOS and extend it > with Windows in parallel of its work with IBM that had wanted DOS to be a > very lightweight version of CP/M, but without a scheduler in order to run > softwares on personal computers that could be used in small organisations > that could not buy its mainframes, but had to prepare documents and data that > could be reused on IBM mainframes... Thank you Philippe for the information. It was a very good idea to build a system without need of C1 and to remap the two ranges to completing characters, which are indispensable, notably in French, and to start with the single quotes. Marcel > Message du 16/06/15 21:08 > De : "Philippe Verdy" > A : "Marcel Schneider" > Copie à : "Doug Ewell" , "Unicode Mailing List" > Objet : Re: Another take on the English Apostrophe in Unicode > > > > 2015-06-16 19:02 GMT+02:00 Marcel Schneider : > > On Mon, Jun 15, 2015, 17:12, Doug Ewell wrote: > > > Marcel Schneider wrote: > [...] > >> Microsoft’s choice of mashing up apostrophe and close-quote to end up > >> with an unprocessable hybrid was wrong. Very wrong. > > > Windows-1252 and the other Windows code pages were developed during the > > 1980s, before Unicode, when almost all non-Asian character sets were > > limited to 256 code points. The distinctions between apostrophe and > > right-single-quote, weighed against the confusion caused by encoding two > > identical-looking characters, would never have been sufficient back then > > to justify separate encoding in this limited space. > > I replied: > > > The problem is not about code pages [...] > > I thank you for your answers and I'll come back upon some of them below. > There's some new fact to bring first. > I concede that my last reply yesterday in the evening was incorrect. > Additionally to Microsoftʼs action in the late nineties urging Unicode to > give up its useful apostrophe recommendation (U+02BC), the design of code > page Windows-1252 is in my scope, indeed. > > Since I learned there are very good and outweighing reasons to use U+02BC in > English, and that Unicodeʼs respective recommendation has been withdrawn with > respect to a widespread practice founded on CP Windows-1252, I soon suspected > there would have been means to get the apostrophe into this code page. Here I > need to recall that I always liked Windows-1252 for its completing the ISO > 8859-1 charset (which was so useless* it had to be replaced with ISO 8859-15). > * Please read this paper (in French): > http://cahiers.gutenberg.eu.org/cg-bin/article/CG_1996___25_65_0.pdf > > Now that I examined closely CP1252ʼs layout, I found five empty code points, > five code points left out, in the C1 ranges that Microsoft allocated to > complete ISO 8859−1. Further, in this range, I found two MODIFIER LETTERS, > CIRCUMFLEX ACCENT (136, 0x88, later U+02C6) and SMALL TILDE (152, 0x98, > U+02DC). Obviously these two were added to disambiguate the extensively used > spacing characters ^ (94, 0x5E) and ~ (126, 0x7E) on one side, and the > diacritics on the other side. There is to say that when Windows was first > released, the left and right single quotes were the only printable characters > in these two ranges. All other characters plus × and ÷ came later. However, > CP1252 remained stable since Windows 98, for which € and the žŽ pair were > added. And five places were left empty. > > From this on I got convinced that it would have been very easy to place the > letter apostrophe for example at code point 144 (0x90), near the single > turned comma quotation mark 0x91 and the single comma quotation mark > (right-single-quote) 0x92 which Microsoft recommended for use as apostrophe. > > About the “confusion” everybody refers to, there is to say that the only way > to get people confused, is to do things and not to explain anything to > anybody. > > The core problem would have been that code pages were designed with > glyph-based *character* encoding in mind, not semantics-based *text* > encoding. > > I repeat that others had done even worse. Others, that is some of the > so-called expert members of the ISO WG designing 8859-1, as two of them not > even aimed at encoding all needed characters, by refusing deliberately to > encode the lower- and uppercase Œ digraph, and even the uppercase Ÿ. > Microsoftʼs big merit has been to produce a ready remedy to this bungling, > that as far as belongs to the OE digraph, was meant to match defective > peripherics. > > Unfortunately, Microsoft visibly didnʼt finish this job, by aiming at > encoding characters only, and thus not allocating more than one code point to > that squiggle, whilst several places were left. > > Well, all that are errors of the past. If I donʼt see a need, I wonʼt meet > it. By leaving œ and Œ off the charset, they got × and ÷ in, at least. Where > things ran really bad, was when Unicode was on, and code pages Procrustesʼ > beds were out. At least, they should have been. Whence that survival of > CP1252-based confusion? > > Briefly, todayʼs text processing is suffering from the apostrophe-close-quote > confusion. This confusion is firstly out of date, and secondly it was > unnecessary from the beginning on. Avoiding this confusion at a trivial level > (by not getting users confused to have to use two similar squiggles), is > shifting it at process level, where the damage it causes is far bigger. Trust > me, users who find themselves unable to set apart the apostrophes when > theyʼre going to replace single quotes, wonʼt bless Microsoft for the input > simplicity! Ted Clancyʼs blog post is here to prove. > https://tedclancy.wordpress.com/2015/06/03/which-unicode-character-should-represent-the-english-apostrophe-and-why-the-unicode-committee-is-very-wrong/ > > > It was time to get rid of that confusion when Unicode recommended U+02BC for > apostrophe. Microsoftʼs choice not to comply was wrong again. Very wrong. > > Let's come back to some of your replies. > > > On Mon, Jun 15, 2015, 20:14, Doug Ewell wrote: > > > I'd guess there are very few users who consciously see the use of U+2019 > > as both apostrophe and right-single-quote as a vestige of code pages, or > > as a conscious effort by Evil Microsoft™ to force them into anything. > > Quite sure. These are habits, not constraints. I'm not sharing such views > about a battle between Google and Microsoft and about ethical prefixes to > allocate to companies. The problem is that when the result proves to be bad, > the idea was, too. > > > The mismatch between apostrophe and close-quote is now part of our culture. > We must get back pragmatic and see the advantages and disadvantages of each > option (ambiguating, disambiguating), not say "I believe there are no > disadvantages in ambiguating" or "there is no reason to disambiguate" or > "people will get confused, let them alone" or the like. These all are > statements. We must look at real people and listen to what they say to us. > Ted Clancy is one of them. When he's worried about that malfunctioning of > text-processing, who will keep smiling and stay saying "There's no problem, > there's no reason to fix that, it's all OK like it is"? > That's to despise people, that's to spit at their face. > > > > Perhaps a UTC member can confirm whether this is fact or speculation. > > Markus Kuhn's comment from 1999 about "couldn't Unicode follow > > Microsoft...?" doesn't prove that Unicode was in fact strong-armed by > > Microsoft. > > Yes, please let us know. > > > > > > Marcel Schneider >

