On Tue, Feb 5, 2019 at 12:23 AM James Kass via Unicode
wrote:
> Text a man has JOINED together, let not algorithm put asunder.
>
I was hoping so much that ὃ οὖν ὁ θεὸς συνέζευξεν ἄνθρωπος μὴ χωριζέτω
would have an apostrophe but alas no.
On 2019-01-28 8:58 PM, Richard Wordingham wrote:
> On Mon, 28 Jan 2019 03:48:52 +
> James Kass via Unicode wrote:
>
>> It’s been said that the text segmentation rules seem over-complicated
>> and are probably non-trivial to implement properly. I tried your
>> suggestion of WORD JOINER U+20
On Mon, 28 Jan 2019 20:55:39 -0500
"Mark E. Shoulson via Unicode" wrote:
> On 1/28/19 2:31 AM, Mark Davis ☕️ via Unicode wrote:
> >
> > But the question is how important those are in daily life. I'm not
> > sure why the double-click selection behavior is so much more of a
> > problem for Ancien
On Mon, 28 Jan 2019 21:10:19 -0500
"Mark E. Shoulson via Unicode" wrote:
> On 1/28/19 3:58 PM, Richard Wordingham via Unicode wrote:
> > Interestingly, bringing this word breaker into line with TUS in the
> > UK may well be in breach of the Equality Act 2010.
> >
> > Richard.
>
> OK, I've got
On Mon, Jan 28, 2019 at 10:58 PM James Kass via Unicode
wrote:
>
> On 2019-01-29 1:55 AM, Mark E. Shoulson via Unicode wrote:
> > I guess "Suck it up and deal with it." And that may indeed be the
> answer.
>
> It would certainly make for shorter and simpler FAQ pages, anyway.
>
Except people wi
On 2019-01-29 1:55 AM, Mark E. Shoulson via Unicode wrote:
I guess "Suck it up and deal with it." And that may indeed be the answer.
It would certainly make for shorter and simpler FAQ pages, anyway.
On 1/28/19 3:58 PM, Richard Wordingham via Unicode wrote:
Interestingly, bringing this word breaker into line with TUS in the UK
may well be in breach of the Equality Act 2010.
Richard.
OK, I've got to ask: how would that be? How would this impinge on
anyone's equality on the basis of "age,
On 1/28/19 2:31 AM, Mark Davis ☕️ via Unicode wrote:
But the question is how important those are in daily life. I'm not
sure why the double-click selection behavior is so much more of a
problem for Ancient Greek users than it is for the somewhat larger
community of English users. Word selecti
On 1/27/19 4:30 PM, Philippe Verdy via Unicode wrote:
For Volapük, it looks much more like U+02BE (right half ring modifier
letter)
than like U+02BC (apostrophe "modifier" letter).
according to the PDF on
https://archive.org/details/cu31924027111453/page/n12
No, I don't think it's 02BE (espe
On Mon, 28 Jan 2019 08:31:40 +0100
Mark Davis ☕️ via Unicode wrote:
> But the question is how important those are in daily life. I'm not
> sure why the double-click selection behavior is so much more of a
> problem for Ancient Greek users than it is for the somewhat larger
> community of English u
On Mon, 28 Jan 2019 03:48:52 +
James Kass via Unicode wrote:
> It’s been said that the text segmentation rules seem over-complicated
> and are probably non-trivial to implement properly. I tried your
> suggestion of WORD JOINER U+2060 after tau ( γένοιτ’ ἄν ), but it
> only added yet anot
: Kalvesmaki, Joel
Cc: Mark Davis ☕️; unicode@unicode.org; Richard Wordingham
Subject: Re: Ancient Greek apostrophe marking elision
On Mon, Jan 28, 2019 at 10:21 AM Kalvesmaki, Joel
mailto:kalvesma...@doaks.org>> wrote:
In publishing critical editions of ancient/medieval Greek texts, I reg
On Mon, Jan 28, 2019 at 10:21 AM Kalvesmaki, Joel
wrote:
> In publishing critical editions of ancient/medieval Greek texts, I
> regularly deals with editions that mix elision and closing single-quotation
> marks.
>
You have my sympathies :-)
But you use U+2019 for both, right? (just checking as
Mark Davis ☕️ via
Unicode
Sent: Monday, January 28, 2019 3:37:54 AM
To: James Tauber
Cc: Richard Wordingham; Unicode Mailing List
Subject: Re: Ancient Greek apostrophe marking elision
It would certainly be possible (and relatively simple) to change ’ into a word
character for languages that
> On Jan 28, 2019, at 1:51 AM, James Tauber via Unicode
> wrote:
>
> when I'm entering U+2019 in a Greek context (via option-n) the keyboard is
> fully aware I'm in that Greek context.
Could you explain what you mean by the keyboard being “aware” of the Greek
context?
The hell I do, Julian.
http://evertype.com/polynesian.html
> On 27 Jan 2019, at 21:00, Julian Bradfield via Unicode
> wrote:
>
> You have a very low opinion of Polynesian users.
On Mon, Jan 28, 2019 at 2:54 AM James Kass via Unicode
wrote:
> at the keyboard driver level. It's a presumption that Greek classicists
> are already specifying fonts and using dedicated keyboard drivers.
> Based on the description provided by James Tauber, it should be
> relatively simple to ma
On Mon, Jan 28, 2019 at 3:38 AM Mark Davis ☕️ wrote:
> So does modern Greek use ’ for in trailing environments where people
> wouldn't expect it to be included in word selection?
>
>
Unfortunately, I can't speak for Modern Greek at all.
James
That is a fair point; if you could get everyone to use keyboards that
inserted such a character, and also get people with current data (eg
Thesaurus Linguae Graecae to process their text), then it would behave as
expected.
Mark
On Mon, Jan 28, 2019 at 8:55 AM James Kass via Unicode
wrote:
>
>
It would certainly be possible (and relatively simple) to change ’ into a
word character for languages that don't use ’ for any other purpose. And if
no languages using a particular script use ’ for another purpose, then it
is particularly easy. (If you depend on language tagging, then any software
On 2019-01-28 7:31 AM, Mark Davis ☕️ via Unicode wrote:
Expecting people to type in hard-to-find invisible characters just to
correct double-click is not a realistic expectation.
True, which is why such entries, when consistent, are properly handled
at the keyboard driver level. It's a pres
On Mon, Jan 28, 2019 at 2:31 AM Mark Davis ☕️ wrote:
> But the question is how important those are in daily life. I'm not sure
> why the double-click selection behavior is so much more of a problem for
> Ancient Greek users than it is for the somewhat larger community of English
> users. Word sel
Note that this is no different than the reasonably common cases in English
such as «the boys’ books».
(you can try various combinations in
http://unicode.org/cldr/utility/list-unicodeset.jsp)
There are certainly cases that are suboptimal in word selection. As another
example, «re-iterate» seems li
On 2019-01-27 11:38 PM, Richard Wordingham via Unicode wrote:
On Sun, 27 Jan 2019 19:57:37 +
James Kass via Unicode wrote:
On 2019-01-27 7:09 PM, James Tauber via Unicode wrote:
In my original post, I asked if a language-specific tailoring of
the text segmentation algorithm was the solu
On Sun, 27 Jan 2019 19:57:37 +
James Kass via Unicode wrote:
> On 2019-01-27 7:09 PM, James Tauber via Unicode wrote:
> > In my original post, I asked if a language-specific tailoring of
> > the text segmentation algorithm was the solution but no one here
> > has agreed so far.
> If there a
On Sun, 27 Jan 2019 14:09:31 -0500
James Tauber via Unicode wrote:
> On Sun, Jan 27, 2019 at 1:22 PM Richard Wordingham via Unicode <
> unicode@unicode.org> wrote:
> > However LibreOffice treats "don't" as a single word for U+0027,
> > U+02BC and U+2019, but "dogs'" as a single word only for U
For Volapük, it looks much more like U+02BE (right half ring modifier
letter)
than like U+02BC (apostrophe "modifier" letter).
according to the PDF on
https://archive.org/details/cu31924027111453/page/n12
The half ring makes a clear distinction with the regular apostrophe (for
elisions) or quotati
On 2019-01-27, Michael Everson via Unicode wrote:
> On 27 Jan 2019, at 05:21, Richard Wordingham
> wrote:
>> The closing single inverted comma has a different origin to the apostrophe.
> No, it doesn’t, but you are welcome to try to prove your assertion.
As far as I can tell from the easily ac
> On Jan 27, 2019, at 12:09 PM, James Tauber via Unicode
> wrote:
>
> γένοιτ’ ἄν
>
> Double-clicking on the first word should select the U+2019 as well.
> Interestingly on macOS Mojave it does in Pages[1] but not in Notes
On my ipad/iphone, Word does it correctly but Pages and Notes do not.
On 2019-01-27 7:09 PM, James Tauber via Unicode wrote:
In my original post, I asked if a language-specific tailoring of the
text segmentation algorithm was the solution but no one here has
agreed so far.
If there are likely to be many languages requiring exceptions to the
segmentation algorit
On Sun, Jan 27, 2019 at 1:22 PM Richard Wordingham via Unicode <
unicode@unicode.org> wrote:
> Except the Uniocde-compliant processes aren't required to follow the
> scheme of TR27 Unicode Text Segmentation. However, it is only required
> to select the whole word because the U+2019 is followed by
On Sun, 27 Jan 2019 16:11:12 +
Michael Everson via Unicode wrote:
> Yes, yes. It doesn’t matter. The discussion applies to both the two
> quotation marks and the two modifier letters.
Actually, there is a difference. As the ʻokina doesnʹt occur at the
end of a word in Hawaiian, one only str
On Sun, 27 Jan 2019 12:38:39 -0500
"Mark E. Shoulson via Unicode" wrote:
> On 1/27/19 11:08 AM, Michael Everson via Unicode wrote:
> > It is a letter. In “can’t” the apostrophe isn’t a letter. It’s a
> > mark of elision. I can double-click on the three words in this
> > paragraph which have the
On 1/27/19 11:08 AM, Michael Everson via Unicode wrote:
It is a letter. In “can’t” the apostrophe isn’t a letter. It’s a mark of
elision. I can double-click on the three words in this paragraph which have
the apostrophe in them, and they are all whole-word selected.
That doesn't work when I
Well, sure; some languages work better with some fonts. There's nothing
wrong with saying that 02BC might look the same as 2019... but it's
nice, when writing Hawaiian (or Klingon for that matter) to use a bigger
glyph. That's why they pay typesetters the big bucks (you wish): to make
things l
Yes, yes. It doesn’t matter. The discussion applies to both the two quotation
marks and the two modifier letters.
> On 27 Jan 2019, at 15:08, Tom Gewecke via Unicode wrote:
>
>
>> On Jan 26, 2019, at 11:08 PM, Richard Wordingham via Unicode
>> wrote:
>>
>> It may be a matter of literacy in
On 27 Jan 2019, at 05:21, Richard Wordingham
wrote:
>>> I’ll be publishing a translation of Alice into Ancient Greek in due
course. I will absolutely only use U+2019 for the apostrophe. It
would be wrong for lots of reasons to use U+02BC for this.
>>>
>>> Please list them.
>>
>>
On 2019-01-27 3:08 PM, Tom Gewecke via Unicode wrote:
I think the Unicode Hawaiian ʻokina is supposed to be U+02BB (instead
of U+02BC).
notes for U+02BB
* typographical alternate for 02BD or 02BF
* used in Hawai'ian orthorgraphy as 'okina (glottal stop)
> On Jan 26, 2019, at 11:08 PM, Richard Wordingham via Unicode
> wrote:
>
> It may be a matter of literacy in Hawaiian. If the test readership
> doesn't use ʼokina,
I think the Unicode Hawaiian ʻokina is supposed to be U+02BB (instead of
U+02BC).
On Sunday, 27 January 2019, Asmus Freytag via Unicode
wrote:
>
> Choice of quotation marks is language-based and for novels, many times
> there are
> additional conventions that may differ by publisher.
>
> Wonder why the publisher is forcing single quotes on them
>
In theory quotation marks are
On 1/26/2019 10:08 PM, Richard
Wordingham via Unicode wrote:
On Sat, 26 Jan 2019 21:11:36 -0800
Asmus Freytag via Unicode wrote:
On 1/26/2019 5:43 PM, Richard Wordingham via Unicode wrote:
That appears to c
On Sat, 26 Jan 2019 21:11:36 -0800
Asmus Freytag via Unicode wrote:
> On 1/26/2019 5:43 PM, Richard Wordingham via Unicode wrote:
>> That appears to contradict Michael Everson's remark about a
>> Polynesian
>> need to distinguish the two visually.
> Why do you need to distinguish them? To code
On 1/26/2019 7:53 PM, Richard
Wordingham via Unicode wrote:
On Sun, 27 Jan 2019 01:55:29 +
James Kass via Unicode wrote:
Richard Wordingham replied to Asmus Freytag,
>> To make matters worse, users for languages that "should" use
>> U+02BC a
On 1/26/2019 6:25 PM, Michael Everson
via Unicode wrote:
On 27 Jan 2019, at 01:37, Richard Wordingham via Unicode wrote:
I’ll be publishing a translation of Alice into Ancient Greek in due
course. I will absolutely only use U+20
On 1/26/2019 5:43 PM, Richard
Wordingham via Unicode wrote:
On Sat, 26 Jan 2019 17:11:49 -0800
Asmus Freytag via Unicode wrote:
To make matters worse, users for languages that "should" use U+02BC
aren't actually consistent; much data uses U+2019 or
On Sun, 27 Jan 2019 01:55:29 +
James Kass via Unicode wrote:
> Richard Wordingham replied to Asmus Freytag,
>
> >> To make matters worse, users for languages that "should" use
> >> U+02BC aren't actually consistent; much data uses U+2019 or
> >> U+0027. Ordinary users can't tell the diffe
Fair enough, but I didn’t wait.
> On 27 Jan 2019, at 01:59, James Kass via Unicode wrote:
>
>
> Richard Wordingham responded to Michael Everson,
>
> >> I’ll be publishing a translation of Alice into Ancient Greek in due
> >> course. I will absolutely only use U+2019 for the apostrophe. It
> >>
On 27 Jan 2019, at 01:37, Richard Wordingham via Unicode
wrote:
>
>> I’ll be publishing a translation of Alice into Ancient Greek in due
>> course. I will absolutely only use U+2019 for the apostrophe. It
>> would be wrong for lots of reasons to use U+02BC for this.
>
> Please list them.
The G
Polynesians are using 0027 as a fallback, and this has to do with education,
keyboarding, and training.
The typography of the fallback is of no consequence. It’s a fallback.
> On 27 Jan 2019, at 01:43, Richard Wordingham via Unicode
> wrote:
>
> On Sat, 26 Jan 2019 17:11:49 -0800
> Asmus Frey
Richard Wordingham responded to Michael Everson,
>> I’ll be publishing a translation of Alice into Ancient Greek in due
>> course. I will absolutely only use U+2019 for the apostrophe. It
>> would be wrong for lots of reasons to use U+02BC for this.
>
> Please list them.
Let's see the list of
Richard Wordingham replied to Asmus Freytag,
>> To make matters worse, users for languages that "should" use U+02BC
>> aren't actually consistent; much data uses U+2019 or U+0027. Ordinary
>> users can't tell the difference (and spell checkers seem not
>> successful in enforcing the practice).
On Sat, 26 Jan 2019 17:11:49 -0800
Asmus Freytag via Unicode wrote:
> To make matters worse, users for languages that "should" use U+02BC
> aren't actually consistent; much data uses U+2019 or U+0027. Ordinary
> users can't tell the difference (and spell checkers seem not
> successful in enforcin
On Sun, 27 Jan 2019 00:32:43 +
Michael Everson via Unicode wrote:
> I’ll be publishing a translation of Alice into Ancient Greek in due
> course. I will absolutely only use U+2019 for the apostrophe. It
> would be wrong for lots of reasons to use U+02BC for this.
Please list them.
Will your
On Sat, 26 Jan 2019 15:45:54 +
James Kass via Unicode wrote:
> Perhaps I'm not understanding, but if the desired behavior is to
> prohibit both line and word breaks in the example string, then...
>
> In Notepad, replacing U+0020 with U+00A0 removes the line-break.
I believe the problem is
On 1/26/2019 3:02 AM, Mark Davis ☕️ via
Unicode wrote:
> breaking
selection for "d'Artagnan" or "can't" into two is overly
fussy.
True, and that is not what U+2019
I’ll be publishing a translation of Alice into Ancient Greek in due course. I
will absolutely only use U+2019 for the apostrophe. It would be wrong for lots
of reasons to use U+02BC for this.
Moreover, implementations of U+02BC need to be revised. In the context of
Polynesian languages, it is i
Well, *my* desire it to simple know whether to tell people doing digital
editions of Ancient Greek texts whether to use U+2019 or U+02BC for the
apostrophe marking elision (or at least accurately describe the trade-offs
of each).
On Sat, Jan 26, 2019 at 10:50 AM James Kass via Unicode
wrote:
>
Perhaps I'm not understanding, but if the desired behavior is to
prohibit both line and word breaks in the example string, then...
In Notepad, replacing U+0020 with U+00A0 removes the line-break.
U+0020 ( δ’ αρχαια )
U+00A0 ( δ’ αρχαια )
U+202F ( δ’ αρχαια )
It also changes the advancement of
Mark Davis responded to Asmus Freytag,
>> breaking selection for "d'Artagnan" or "can't" into two is overly fussy.
>
> True, and that is not what U+2019 does; it does not break medially.
Mark Davis earlier posted this example,
> So something like "δ’ αρχαια" (picking a phrase at random) would
> breaking selection for "d'Artagnan" or "can't" into two is overly fussy.
True, and that is not what U+2019 does; it does not break medially.
Mark
On Fri, Jan 25, 2019 at 11:07 PM Asmus Freytag via Unicode <
unicode@unicode.org> wrote:
> On 1/25/2019 9:39 AM, James Tauber via Unicode wrote:
>
On 2019-01-25 10:06 PM, Asmus Freytag via Unicode wrote:
James, by now it's unclear whether your ' is 2019 or 02BC.
The example word "aren't" in previous message used U+2019. Sorry if I
was unclear.
On Fri, Jan 25, 2019 at 9:41 PM Richard Wordingham via Unicode <
unicode@unicode.org> wrote:
> To quote TUS:
>
> "A few may modify the following letter, and some may serve as a
> independent letters".
>
> Bear in mind that one of the uses of U+02BC is the scholarly
> representation of a glottal st
On Fri, 25 Jan 2019 17:02:25 -0500
James Tauber via Unicode wrote:
> I guess U+02BC is category Lm not Mn, but doesn't that still mean it
> modifies the previous character (i.e. is really part of the same
> grapheme cluster) and so isn't appropriate as either a vowel or an
> indication of an omit
On 1/25/2019 10:05 AM, James Kass via
Unicode wrote:
For U+2019, there's a note saying 'this is the preferred character
to use for apostrophe'.
Mark Davis wrote,
> When it is between letters it doesn't cause a wor
On 1/25/2019 9:39 AM, James Tauber via
Unicode wrote:
Thank you, although the word break does still
affect things like double-clicking to select.
And people do seem to want to use U+02BC for this reason
(and I'm try
I guess U+02BC is category Lm not Mn, but doesn't that still mean it
modifies the previous character (i.e. is really part of the same grapheme
cluster) and so isn't appropriate as either a vowel or an indication of an
omitted vowel?
On Fri, Jan 25, 2019 at 4:30 PM Richard Wordingham via Unicode
On Fri, 25 Jan 2019 12:39:47 -0500
James Tauber via Unicode wrote:
> Thank you, although the word break does still affect things like
> double-clicking to select.
>
> And people do seem to want to use U+02BC for this reason (and I'm
> trying to articulate why that isn't what U+02BC is meant for)
For U+2019, there's a note saying 'this is the preferred character to
use for apostrophe'.
Mark Davis wrote,
> When it is between letters it doesn't cause a word break, ...
Some applications don't seem to get that. For instance, the
spellchecker for Mozilla Thunderbird flags the string "a
Thank you, although the word break does still affect things like
double-clicking to select.
And people do seem to want to use U+02BC for this reason (and I'm trying to
articulate why that isn't what U+02BC is meant for).
James
On Fri, Jan 25, 2019 at 12:34 PM Mark Davis ☕️ wrote:
> U+2019 is n
U+2019 is normally the character used, except where the ’ is considered a
letter. When it is between letters it doesn't cause a word break, but
because it is also a right single quote, at the end of words there is a
break. Thus in a phrase like «tryin’ to go» there is a word break after the
n, beca
There seems some debate amongst digital classicists in whether to use
U+2019 or U+02BC to represent the apostrophe in Ancient Greek when marking
elision. (e.g. δ’ for δέ preceding a word starting with a vowel).
It seems to me that U+2019 is the technically correct choice per the
Unicode Standard b
71 matches
Mail list logo