On 27 Jan 2019, at 05:21, Richard Wordingham <richard.wording...@ntlworld.com> 
wrote:

>>> I’ll be publishing a translation of Alice into Ancient Greek in due
>>>> course. I will absolutely only use U+2019 for the apostrophe. It
>>>> would be wrong for lots of reasons to use U+02BC for this.  
>>> 
>>> Please list them.  
>> 
>> The Greek use is of an apostrophe. Often a mark elision (as here),
>> that’s what 2019 is for.
>> 
>> 02BC is a letter. Usually a glottal stop. 
> 
> So it would seem that the 'lots of reasons' is just that it goes against the 
> *recommendation* of TUS.

I have no idea what TUS says about this. I did not look it up. I know a lot 
about characters, though. 

> Incidentally, I believe the principal use of U+2019 RIGHT SINGLE QUOTATION 
> MARK is as a quotation mark.

You can believe what you like, but that isn’t likely true. In books which 
prefer “this kind” of quotation marks for primary quotations and ’this kind’ 
for nested quotations, 2019 is primarily used for the apostrophe in words like 
I’m, can’t, isn’t, don’t etc. In books which prefer ’this kind’ for primary 
quotations 2019 the statistics will be different. But 2019 is still the correct 
character for both.

> As you have noted in the text left in below, U+02BC started out as the 
> apostrophe.

Lead-type typesetters used that sort, yes. And that sort was used for both 
apostrophe and single quotation marks. 

> The closing single inverted comma has a different origin to the apostrophe.

No, it doesn’t, but you are welcome to try to prove your assertion. 

> My argument for U+02BC is that this apostrophe is an integral part of the 
> word.

It is a letter. In “can’t” the apostrophe isn’t a letter. It’s a mark of 
elision.  I can double-click on the three words in this paragraph which have 
the apostrophe in them, and they are all whole-word selected. 

> The main constituent of a prototypical word are letters and their attendant 
> marks. Now, the word-breaking algorithm in TR27 allows for various generally 
> overloaded elements to join elements of a word. However, this apostrophe does 
> not mark the boundary of constituents. Accordingly it makes sense to treat it 
> as a letter.

The behaviour of 2019 it not broken. I use it every day. I’ve typeset many many 
books in English and Cornish and Irish, all of which use single quotation marks 
and double quotation marks and lots and lots of apostrophes, and I have no 
trouble with them. 2019 has for decades been treated correctly in software that 
I use. 

> Treating the Greek apostrophe as a letter (U+02BC) gives better word-breaking.

Why do you claim this? I did not read the beginning of this thread and I am not 
going to try to find it. What is the problem you claim to have? In what 
software? On what platform?

> I don't see any downside in treating it like a Polynesian glottal stop.

I do. And to try to replace the apostrophe in English can’t and don’t and all 
is doomed to fail. Doomed. 

Moreover there are good practical reasons to change the glyph for the 
Polynesian letter.

When I typeset Greek, I will use 2019 for the apostrophe. 

> Is someone going to tell me there is an advantage in treating "men's” as one 
> word but "dogs'" as two?  As I've said, the argument for encoding English 
> apostrophes as U+2019 is that even with adequate keyboards, users cannot be 
> relied upon to distinguish U+02BC and U+2019 - especially with no feedback. A 
> writing system should choose one and stick with it.  User unreliability 
> forces a compromise.

Polynesian users need to 02BC to be visually distinguished from 2019. European 
users don’t need the apostrophe to be visually distinguished from 2019. The 
edge case of “dogs’” doesn’t convince me. In all my years of typesetting I have 
never once noticed this, much less considered it a problem that needed fixing.

> Now, if text processors were to enable a difference, then the arguments would 
> change.  I for one find it helpful that Microsoft Word is willing to display 
> visible symbols for spaces and tab characters so that I know what white space 
> is composed of.

Most word-processing typesetting programs will do this. Quark and InDesign do. 
Word and LibreOffice and Apple Pages do. 

>> I didn’t follow the beginning of this. Evidently it has something to do with 
>> word selection of d’ + a space + what follows. If that’s so, then there’s no 
>> argument at all for 02BC. It’s a question of the space, and that’s got 
>> nothing to do with the identity of the apostrophe.
> 
> The word selection issue is that except before a letter, the standard 
> word-breaking algorithm says that there is a word boundary between the delta 
> and apostrophe.

Well, that’s the expected behaviour for a character which is polyvalent. If you 
have problems double-clicking “d’ Artagnan” you should probably just write 
“d’Artagnan”. 

> 
>>> Will your coding decision be machine readable for the readership?  
>> 
>> I don’t know what you mean by “readable”.
> 
> Will the difference between U+02BC and U+2019 be discernible by the readers?

They should be, in Polynesian languages. Otherwise the text isn't easily 
legible. 

> If one could copy a phrase to a general application and select a word by 
> double-clicking, then the difference would be visible.

If you know what the behaviour is then you can take it into account when you 
are copying a word. You can’t fix this by character encoding. Certainly not by 
screwing with 02BC.

> If the result of the publishing is simply a printed book, then your choice of 
> U+2019 or U+02BC will depend only on font differences.

That non-argument can be applied to everything. 

> Not that it makes much difference to the issue,  but isn't the correct 
> encoding for the ʻokina U+02BB MODIFIER LETTER TURNED COMMA? 

Yes, but both 02BB and 02BC are used in linguistic transcriptions and in 
Polynesian languages, and the graphic identity with 2018 and 2019 is 
problematic and unnecessary.

Using 02BC for the apostrophe is a mistake, in my view.

Michael Everson

Reply via email to