Re: Ancient Greek apostrophe marking elision

2019-01-25 Thread James Kass via Unicode



On 2019-01-25 10:06 PM, Asmus Freytag via Unicode wrote:

James, by now it's unclear whether your ' is 2019 or 02BC.
The example word "aren't" in previous message used U+2019.  Sorry if I 
was unclear.


Re: Encoding italic

2019-01-25 Thread James Kass via Unicode



On 2019-01-26 12:18 AM, Asmus Freytag (c) responded:

On 1/25/2019 3:49 PM, Andrew Cunningham wrote:
Assuming some mechanism for italics is added to Unicode,  when 
converting between the new plain text and HTML there is insufficient 
information to correctly convert to HTML. many elements may have 
italic stying and there would be no meta information in Unicode to 
indicate the appropriate HTML element.




So, we would be creating an interoperability issue.



What happens now when we convert plain-text to HTML?


Re: Ancient Greek apostrophe marking elision

2019-01-25 Thread James Tauber via Unicode
On Fri, Jan 25, 2019 at 9:41 PM Richard Wordingham via Unicode <
unicode@unicode.org> wrote:

> To quote TUS:
>
> "A few may modify the following letter, and some may serve as a
> independent letters".
>
> Bear in mind that one of the uses of U+02BC is the scholarly
> representation of a glottal stop, especially in Arabic names.
>

Okay, so this legitimises the use of U+02BC (with its better
word-breaking properties) for the apostrophe marking elision in Ancient
Greek even though U+2019 is stated as the preferred character _in
general_ for the apostrophe.

On balance, this would seem to suggest U+02BC can (and perhaps
should) be used for the specific purpose in Ancient Greek.

(Of course, the other character that comes up is U+1FBD, but there
the consensus seems strong that this is just plain wrong.)

Thank you all.

James


Re: Ancient Greek apostrophe marking elision

2019-01-25 Thread Richard Wordingham via Unicode
On Fri, 25 Jan 2019 17:02:25 -0500
James Tauber via Unicode  wrote:

> I guess U+02BC is category Lm not Mn, but doesn't that still mean it
> modifies the previous character (i.e. is really part of the same
> grapheme cluster) and so isn't appropriate as either a vowel or an
> indication of an omitted vowel?

To quote TUS:

"A few may modify the following letter, and some may serve as a
independent letters".

Bear in mind that one of the uses of U+02BC is the scholarly
representation of a glottal stop, especially in Arabic names.

Richard.


Re: Encoding italic

2019-01-25 Thread Asmus Freytag (c) via Unicode

On 1/25/2019 3:49 PM, Andrew Cunningham wrote:
Assuming some mechanism for italics is added to Unicode,  when 
converting between the new plain text and HTML there is insufficient 
information to correctly convert to HTML. many elements may have 
italic stying and there would be no meta information in Unicode to 
indicate the appropriate HTML element.




So, we would be creating an interoperability issue.

A./





On Friday, 25 January 2019, wjgo_10...@btinternet.com 
 via Unicode > wrote:


Asmus Freytag wrote;

Other schemes, like a VS per code point, also suffer from
being different in philosophy from "standard" rich text
approaches. Best would be as standard extension to all the
messaging systems (e.g. a common markdown language, supported
by UI).     A./


Yet that claim of what would be best would be stateful and
statefulness is the very thing that Unicode seeks to avoid.

Plain text is the basic system and a Variation Selector mechanism
after each character that is to become italicized is not stateful
and can be implemented using existing OpenType technology.

If an organization chooses to develop and use a rich text format
then that is a matter for that organization and any changing of
formatting of how italics are done when converting between plain
text and rich text is the responsibility of the organization that
introduces its rich text format.

Twitter was just an example that someone introduced along the way,
it was not the original request.

Also this is not only about messaging. Of primary importance is
the conservation of texts in plain text format, for example, where
a printed book has one word italicized in a sentence and the text
is being transcribed into a computer.

William Overington
Friday 25 January 2019



--
Andrew Cunningham
lang.supp...@gmail.com 







Re: Encoding italic

2019-01-25 Thread Andrew Cunningham via Unicode
Assuming some mechanism for italics is added to Unicode,  when converting
between the new plain text and HTML there is insufficient information to
correctly convert to HTML. many elements may have italic stying and there
would be no meta information in Unicode to indicate the appropriate HTML
element.




On Friday, 25 January 2019, wjgo_10...@btinternet.com via Unicode <
unicode@unicode.org> wrote:

> Asmus Freytag wrote;
>
> Other schemes, like a VS per code point, also suffer from being different
>> in philosophy from "standard" rich text approaches. Best would be as
>> standard extension to all the messaging systems (e.g. a common markdown
>> language, supported by UI). A./
>>
>
> Yet that claim of what would be best would be stateful and statefulness is
> the very thing that Unicode seeks to avoid.
>
> Plain text is the basic system and a Variation Selector mechanism after
> each character that is to become italicized is not stateful and can be
> implemented using existing OpenType technology.
>
> If an organization chooses to develop and use a rich text format then that
> is a matter for that organization and any changing of formatting of how
> italics are done when converting between plain text and rich text is the
> responsibility of the organization that introduces its rich text format.
>
> Twitter was just an example that someone introduced along the way, it was
> not the original request.
>
> Also this is not only about messaging. Of primary importance is the
> conservation of texts in plain text format, for example, where a printed
> book has one word italicized in a sentence and the text is being
> transcribed into a computer.
>
> William Overington
> Friday 25 January 2019
>
>

-- 
Andrew Cunningham
lang.supp...@gmail.com


Re: Ancient Greek apostrophe marking elision

2019-01-25 Thread Asmus Freytag via Unicode

  
  
On 1/25/2019 10:05 AM, James Kass via
  Unicode wrote:


  
  For U+2019, there's a note saying 'this is the preferred character
  to use for apostrophe'.
  
  
  Mark Davis wrote,
  
  
  > When it is between letters it doesn't cause a word break, ...
  
  
  Some applications don't seem to get that.  For instance, the
  spellchecker for Mozilla Thunderbird flags the string "aren" for
  correction in the word "aren’t", which suggests that users trying
  to use preferred characters may face uphill battles.
  
  
  



James, by now it's unclear whether your ' is 2019 or 02BC.
Spellcheckers are truly dumb sometimes when "user perceived
  words" don't match what the fussy prescriptionistas ordain.
And then you get parts of perfectly valid "words" rejected, and
  can't even fix them with overrides, because the override doesn't
  accept the whole _expression_.

A./

  



Re: Ancient Greek apostrophe marking elision

2019-01-25 Thread Asmus Freytag via Unicode

  
  
On 1/25/2019 9:39 AM, James Tauber via
  Unicode wrote:


  
  Thank you, although the word break does still
affect things like double-clicking to select.


And people do seem to want to use U+02BC for this reason
  (and I'm trying to articulate why that isn't what U+02BC is
  meant for).


  

For normal edition operations, breaking selection for
  "d'Artagnan" or "can't" into two is overly fussy.
No wonder people get frustrated.

A./


  
James
  
  
  
On Fri, Jan 25, 2019 at 12:34
  PM Mark Davis ☕️  wrote:


  

  
U+2019 is normally the
  character used, except where the ’ is considered a
  letter. When it is between letters it doesn't cause a
  word break, but because it is also a right single
  quote, at the end of words there is a break. Thus in a
  phrase like «tryin’ to go» there is a word break after
  the n, because one can't tell.


So something like "δ’ αρχαια"
  (picking a phrase at random) would have a word break
  after the delta. 



Word break: 

  

  δ’ αρχαια 

  



However, there is no line
break between them (which is the more important
  operation in normal usage). Probably not worth
  tailoring the word break.


Line break:

  

  
δ’ αρχαια 
  

  




  

  

  

  

  

Mark
  
  

  

  

  

  

  

  


  

  
  
  
On Fri, Jan 25, 2019 at 1:10 PM James Tauber
  via Unicode 
  wrote:


  

  There seems some debate amongst digital
classicists in whether to use U+2019 or U+02BC to
represent the apostrophe in Ancient Greek when
marking elision. (e.g. δ’ for δέ preceding a word
starting with a vowel).
  
  
  It seems to me that U+2019 is the technically
correct choice per the Unicode Standard but it is
not without at least one problem: default word
breaking rules.
  
  
  I'm trying to provide guidelines for digital
classicists in this regard.
  
  
  Is it correct to say the following:
  
  
  1) U+2019 is the correct character to use for the
apostrophe in Ancient Greek when marking elision. 
  2) U+02BC is a misuse of a modifier for this
purpose
  3) However, use of U+2019 (unlike U+02BC) means
the default Word Boundary Rules in UAX#29 will
(incorrectly) exclude the apostrophe from the word
token
  4) And use of U+02BC (unlike U+2019) means Glyph
Cluster Boundary Rules in UAX#29 will (incorrectly)
include the apostrophe as part of a glyph cluster
with the previous letter
  5) The correct solution is to tailor the Word
Boundary Rules in the case of Ancient Greek to treat
U+2019 as not breaking a word (which 

Re: Ancient Greek apostrophe marking elision

2019-01-25 Thread James Tauber via Unicode
I guess U+02BC is category Lm not Mn, but doesn't that still mean it
modifies the previous character (i.e. is really part of the same grapheme
cluster) and so isn't appropriate as either a vowel or an indication of an
omitted vowel?



On Fri, Jan 25, 2019 at 4:30 PM Richard Wordingham via Unicode <
unicode@unicode.org> wrote:

> On Fri, 25 Jan 2019 12:39:47 -0500
> James Tauber via Unicode  wrote:
>
> > Thank you, although the word break does still affect things like
> > double-clicking to select.
> >
> > And people do seem to want to use U+02BC for this reason (and I'm
> > trying to articulate why that isn't what U+02BC is meant for).
>
> It's a bit tricky when the reason is that it was too hard to get users
> of English to make a distinction between U+02BC and U+2019.  And for
> Larry Niven's elephant-like aliens in _Footfall__, is _fi'_, the
> singular of _fithp_, better written with U+02BC or U+2019?  And does
> the phonetically faithful spelling of Estuarine English _fi'_ for
> _fit_ depend on whether the glottal stop is dropped?
>
> The science-fiction ethnonym _Vl'harg_ is also tricky.  Does its elegant
> encoding depend on whether the apostrophe is a vowel symbol (so
> U+02BC) or the indication of an omitted vowel (so U+2019)?
>
> Richard.
>


-- 
*James Tauber*
Greek Linguistics: https://jktauber.com/
Music Theory: https://modelling-music.com/
Digital Tolkien: https://digitaltolkien.com/

Twitter: @jtauber


Re: Encoding italic

2019-01-25 Thread Asmus Freytag (c) via Unicode

On 1/25/2019 1:06 AM, wjgo_10...@btinternet.com wrote:

Asmus Freytag wrote;

Other schemes, like a VS per code point, also suffer from being 
different in philosophy from "standard" rich text approaches. Best 
would be as standard extension to all the messaging systems (e.g. a 
common markdown language, supported by UI). A./


Yet that claim of what would be best would be stateful and 
statefulness is the very thing that Unicode seeks to avoid. 


All rich text is stateful, and rich text is very widely used and 
cut tends to work rather well among applications that support it, 
as do conversions of entire documents. Trying to duplicate it with "yet 
another mechanism" is a doubtful achievement, even if it could be made 
"stateless".


A./



Re: Ancient Greek apostrophe marking elision

2019-01-25 Thread Richard Wordingham via Unicode
On Fri, 25 Jan 2019 12:39:47 -0500
James Tauber via Unicode  wrote:

> Thank you, although the word break does still affect things like
> double-clicking to select.
> 
> And people do seem to want to use U+02BC for this reason (and I'm
> trying to articulate why that isn't what U+02BC is meant for).

It's a bit tricky when the reason is that it was too hard to get users
of English to make a distinction between U+02BC and U+2019.  And for
Larry Niven's elephant-like aliens in _Footfall__, is _fi'_, the
singular of _fithp_, better written with U+02BC or U+2019?  And does
the phonetically faithful spelling of Estuarine English _fi'_ for
_fit_ depend on whether the glottal stop is dropped?

The science-fiction ethnonym _Vl'harg_ is also tricky.  Does its elegant
encoding depend on whether the apostrophe is a vowel symbol (so
U+02BC) or the indication of an omitted vowel (so U+2019)?

Richard.


Re: Ancient Greek apostrophe marking elision

2019-01-25 Thread James Kass via Unicode



For U+2019, there's a note saying 'this is the preferred character to 
use for apostrophe'.


Mark Davis wrote,

> When it is between letters it doesn't cause a word break, ...

Some applications don't seem to get that.  For instance, the 
spellchecker for Mozilla Thunderbird flags the string "aren" for 
correction in the word "aren’t", which suggests that users trying to use 
preferred characters may face uphill battles.




Re: Ancient Greek apostrophe marking elision

2019-01-25 Thread James Tauber via Unicode
Thank you, although the word break does still affect things like
double-clicking to select.

And people do seem to want to use U+02BC for this reason (and I'm trying to
articulate why that isn't what U+02BC is meant for).

James

On Fri, Jan 25, 2019 at 12:34 PM Mark Davis ☕️  wrote:

> U+2019 is normally the character used, except where the ’ is considered a
> letter. When it is between letters it doesn't cause a word break, but
> because it is also a right single quote, at the end of words there is a
> break. Thus in a phrase like «tryin’ to go» there is a word break after the
> n, because one can't tell.
>
> So something like "δ’ αρχαια" (picking a phrase at random) would have a
> word break after the delta.
>
> Word break:
> δ’ αρχαια
>
> However, there is no *line break* between them (which is the more
> important operation in normal usage). Probably not worth tailoring the word
> break.
>
> Line break:
> δ’ αρχαια
>
> Mark
>
>
> On Fri, Jan 25, 2019 at 1:10 PM James Tauber via Unicode <
> unicode@unicode.org> wrote:
>
>> There seems some debate amongst digital classicists in whether to use
>> U+2019 or U+02BC to represent the apostrophe in Ancient Greek when marking
>> elision. (e.g. δ’ for δέ preceding a word starting with a vowel).
>>
>> It seems to me that U+2019 is the technically correct choice per the
>> Unicode Standard but it is not without at least one problem: default word
>> breaking rules.
>>
>> I'm trying to provide guidelines for digital classicists in this regard.
>>
>> Is it correct to say the following:
>>
>> 1) U+2019 is the correct character to use for the apostrophe in Ancient
>> Greek when marking elision.
>> 2) U+02BC is a misuse of a modifier for this purpose
>> 3) However, use of U+2019 (unlike U+02BC) means the default Word Boundary
>> Rules in UAX#29 will (incorrectly) exclude the apostrophe from the word
>> token
>> 4) And use of U+02BC (unlike U+2019) means Glyph Cluster Boundary Rules
>> in UAX#29 will (incorrectly) include the apostrophe as part of a glyph
>> cluster with the previous letter
>> 5) The correct solution is to tailor the Word Boundary Rules in the case
>> of Ancient Greek to treat U+2019 as not breaking a word (which shouldn't
>> have the same ambiguity problems with the single quotation mark as in
>> English as it should not be used as a quotation mark in Ancient Greek)
>>
>> Many thanks in advance.
>>
>> James
>>
>

-- 
*James Tauber*
Greek Linguistics: https://jktauber.com/
Music Theory: https://modelling-music.com/
Digital Tolkien: https://digitaltolkien.com/

Twitter: @jtauber


Re: Ancient Greek apostrophe marking elision

2019-01-25 Thread Mark Davis ☕️ via Unicode
U+2019 is normally the character used, except where the ’ is considered a
letter. When it is between letters it doesn't cause a word break, but
because it is also a right single quote, at the end of words there is a
break. Thus in a phrase like «tryin’ to go» there is a word break after the
n, because one can't tell.

So something like "δ’ αρχαια" (picking a phrase at random) would have a
word break after the delta.

Word break:
δ’ αρχαια

However, there is no *line break* between them (which is the more important
operation in normal usage). Probably not worth tailoring the word break.

Line break:
δ’ αρχαια

Mark


On Fri, Jan 25, 2019 at 1:10 PM James Tauber via Unicode <
unicode@unicode.org> wrote:

> There seems some debate amongst digital classicists in whether to use
> U+2019 or U+02BC to represent the apostrophe in Ancient Greek when marking
> elision. (e.g. δ’ for δέ preceding a word starting with a vowel).
>
> It seems to me that U+2019 is the technically correct choice per the
> Unicode Standard but it is not without at least one problem: default word
> breaking rules.
>
> I'm trying to provide guidelines for digital classicists in this regard.
>
> Is it correct to say the following:
>
> 1) U+2019 is the correct character to use for the apostrophe in Ancient
> Greek when marking elision.
> 2) U+02BC is a misuse of a modifier for this purpose
> 3) However, use of U+2019 (unlike U+02BC) means the default Word Boundary
> Rules in UAX#29 will (incorrectly) exclude the apostrophe from the word
> token
> 4) And use of U+02BC (unlike U+2019) means Glyph Cluster Boundary Rules in
> UAX#29 will (incorrectly) include the apostrophe as part of a glyph cluster
> with the previous letter
> 5) The correct solution is to tailor the Word Boundary Rules in the case
> of Ancient Greek to treat U+2019 as not breaking a word (which shouldn't
> have the same ambiguity problems with the single quotation mark as in
> English as it should not be used as a quotation mark in Ancient Greek)
>
> Many thanks in advance.
>
> James
>


Re: Encoding italic

2019-01-25 Thread wjgo_10...@btinternet.com via Unicode

Asmus Freytag wrote;

Other schemes, like a VS per code point, also suffer from being 
different in philosophy from "standard" rich text approaches. Best 
would be as standard extension to all the messaging systems (e.g. a 
common markdown language, supported by UI). A./


Yet that claim of what would be best would be stateful and statefulness 
is the very thing that Unicode seeks to avoid.


Plain text is the basic system and a Variation Selector mechanism after 
each character that is to become italicized is not stateful and can be 
implemented using existing OpenType technology.


If an organization chooses to develop and use a rich text format then 
that is a matter for that organization and any changing of formatting of 
how italics are done when converting between plain text and rich text is 
the responsibility of the organization that introduces its rich text 
format.


Twitter was just an example that someone introduced along the way, it 
was not the original request.


Also this is not only about messaging. Of primary importance is the 
conservation of texts in plain text format, for example, where a printed 
book has one word italicized in a sentence and the text is being 
transcribed into a computer.


William Overington
Friday 25 January 2019



Ancient Greek apostrophe marking elision

2019-01-25 Thread James Tauber via Unicode
There seems some debate amongst digital classicists in whether to use
U+2019 or U+02BC to represent the apostrophe in Ancient Greek when marking
elision. (e.g. δ’ for δέ preceding a word starting with a vowel).

It seems to me that U+2019 is the technically correct choice per the
Unicode Standard but it is not without at least one problem: default word
breaking rules.

I'm trying to provide guidelines for digital classicists in this regard.

Is it correct to say the following:

1) U+2019 is the correct character to use for the apostrophe in Ancient
Greek when marking elision.
2) U+02BC is a misuse of a modifier for this purpose
3) However, use of U+2019 (unlike U+02BC) means the default Word Boundary
Rules in UAX#29 will (incorrectly) exclude the apostrophe from the word
token
4) And use of U+02BC (unlike U+2019) means Glyph Cluster Boundary Rules in
UAX#29 will (incorrectly) include the apostrophe as part of a glyph cluster
with the previous letter
5) The correct solution is to tailor the Word Boundary Rules in the case of
Ancient Greek to treat U+2019 as not breaking a word (which shouldn't have
the same ambiguity problems with the single quotation mark as in English as
it should not be used as a quotation mark in Ancient Greek)

Many thanks in advance.

James


Re: Encoding italic

2019-01-25 Thread David Starner via Unicode
On Thu, Jan 24, 2019 at 11:16 PM Tex via Unicode  wrote:
> Twitter was offered as an example, not the only example just one of the most 
> ubiquitous. Many messaging apps and other apps would benefit from italics. 
> The argument is not based on adding italics to twitter.

And again, color me skeptical. If italics are just added to Unicode
and not to the relevant app or interface, they will not see much use,
in the same way that most non-ASCII characters for proper English--the
quotes, the dashes, the accents--are often ignored because they're too
hard to enter. But if you're going to add italics, having it in
Unicode doesn't make it significantly easier, particularly when they
need to support systems that predate Unicode adding italics.

> The biggest burden would be to the apps that would benefit, to add 
> italicizing and editing capabilities.

If they would benefit or if they'd accept the burden, they'd have
already added italics, via HTML or Markdown or escape sequences or
whatever.

-- 
Kie ekzistas vivo, ekzistas espero.