second attempt (was: A sign/abbreviation for "magister")

2018-10-30 Thread Janusz S. Bień via Unicode


My previous attempt to send this mail was rejected by the list as
spam. If this one will not appear on the list, would you be so kind to
forward it to the list and the listmaster?

On Mon, Oct 29 2018 at 12:20 -0700, Doug Ewell via Unicode wrote:

[...]

> The abbreviation in the postcard, rendered in
> plain text, is "Mr".

The relevant fragment of the postcard in a loose translation is

Use the following address:   ...
 is the abbreviation of magister.

I don't think your rendering

   Mr is the abbreviation of magister.

has the same meaning.

Please note that I didn't asked *whether* to encode the abbreviation. I
asked *how* to do it.

If you think it is impossible to encode it in Unicode (without using
PUA), just say this explicitely.

BTW, I find it strange that nobody refers to an old thread

https://www.unicode.org/mail-arch/unicode-ml/y2016-m12/0117.html

Best regards

Janusz

-- 
 ,   
Janusz S. Bien
emeryt (emeritus)
https://sites.google.com/view/jsbien


Re: A sign/abbreviation for "magister"

2018-10-30 Thread Hans Åberg via Unicode


> On 30 Oct 2018, at 22:50, Ken Whistler via Unicode  
> wrote:
> 
> On 10/30/2018 2:32 PM, James Kass via Unicode wrote:
>> but we can't seem to agree on how to encode its abbreviation. 
> 
> For what it's worth, "mgr" seems to be the usual abbreviation in Polish for 
> it.

That seems to be the contemporary usage, but the postcard is from 1917, cf. the 
OP. Also, the transcription in the followup post suggests that the Polish 
script at the time, or at least of the author, differed from the commonly 
taught D'Nealian cursive [1], cf. the "z". A variation of the latter has ended 
up as the Unicode MATHEMATICAL SCRIPT letters, which is closer to the Swedish 
cursive [2] for some letters.

1. https://en.wikipedia.org/wiki/D'Nealian
2. https://sv.wikipedia.org/wiki/Skrivstil





Re: A sign/abbreviation for "magister"

2018-10-30 Thread Khaled Hosny via Unicode
On Tue, Oct 30, 2018 at 10:02:43PM +0100, Marcel Schneider wrote:
> On 30/10/2018  at 21:34, Khaled Hosny via Unicode wrote:
> > 
> > On Tue, Oct 30, 2018 at 04:52:47PM +0100, Marcel Schneider via Unicode 
> > wrote:
> > > E.g. in Arabic script, superscript is considered worth 
> > > encoding and using without any caveat, whereas when Latin script is on, 
> > > superscripts are thrown into the same cauldron as underscoring.
> > 
> > Curious, what Arabic superscripts are encoded in Unicode?
>  
> First, ARABIC LETTER SUPERSCRIPT ALEPH U+0671.
> But it is a vowel sign. Many letters put above are called superscript 
> when explaining in English.

As you say, this is a vowel sign not a superscript letter, so the name
is a misnomer at best. It should have been called COMBINING ARABIC
LETTER ALEF ABOVE, similar to COMBINING LATIN SMALL LETTER A. In Arabic
it is called small or dagger alef.

> There is the range U+FC5E..U+FC63 (presentation forms).

That is a backward compatiplity block no one is supposed to use, there
are many such backward comatipility presentation forms even of Latin
script (U+FB00..U+FB4F).

So I don’t see what makes you think, based on this, that Unicode is
favouring Arabic or other scripts over Latin.

Regards,
Khaled


Re: A sign/abbreviation for "magister"

2018-10-30 Thread Ken Whistler via Unicode



On 10/30/2018 2:32 PM, James Kass via Unicode wrote:
but we can't seem to agree on how to encode its abbreviation. 


For what it's worth, "mgr" seems to be the usual abbreviation in Polish 
for it.


--Ken



[getting OT] Re: A sign/abbreviation for "magister"

2018-10-30 Thread Doug Ewell via Unicode
Marcel Schneider replied to Khaled Hosny:

>>> E.g. in Arabic script, superscript is considered worth encoding and
>>> using without any caveat, [...]
>>
>> Curious, what Arabic superscripts are encoded in Unicode?
>
> [...] There is the range U+FC5E..U+FC63 (presentation forms).

Arabic presentation forms are never an example of anything, and their
use is full of caveats.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org




Re: A sign/abbreviation for "magister"

2018-10-30 Thread Marcel Schneider via Unicode
On 30/10/2018  at 21:34, Khaled Hosny via Unicode wrote:
> 
> On Tue, Oct 30, 2018 at 04:52:47PM +0100, Marcel Schneider via Unicode wrote:
> > E.g. in Arabic script, superscript is considered worth 
> > encoding and using without any caveat, whereas when Latin script is on, 
> > superscripts are thrown into the same cauldron as underscoring.
> 
> Curious, what Arabic superscripts are encoded in Unicode?
 
First, ARABIC LETTER SUPERSCRIPT ALEPH U+0671.
But it is a vowel sign. Many letters put above are called superscript 
when explaining in English.
 
There is the range U+FC5E..U+FC63 (presentation forms).
 
Best regards,
 
Marcel

Re: A sign/abbreviation for "magister"

2018-10-30 Thread Doug Ewell via Unicode
Julian Bradfield wrote:
 
>> in the 17ᵗʰ or 18ᵗʰ century to keep it only for ordinals. Should
>> Unicode
>
> What do you mean, for ordinals? If you mean 1st, 2nd etc., then there
> is not now (when superscripting looks very old-fashioned) and never
> has been any requirement to superscript them, as far as I know -
> though since the OED doesn't have an entry for "1st", I can't easily
> check.
 
The English Wikipedia article "Ordinal number (linguistics)" does not
show numbers such as 1st, 2nd, etc. with superscripts, though as a
rich-text Web page, it could easily.
 
The article "English numerals" does include a bullet point: "The
suffixes -th, -st, -nd and -rd are occasionally written superscript
above the number itself." Note the word "occasionally."
 
--
Doug Ewell | Thornton, CO, US | ewellic.org




Re: A sign/abbreviation for "magister"

2018-10-30 Thread Julian Bradfield via Unicode
On 2018-10-30, Marcel Schneider via Unicode  wrote:
> Dr Bradfield just added on 30/10/2018 at 14:21 something that I didn’t 
> know when replying to Dr Ewell on 29/10/2018 at 21:27:

>> The English abbreviation Mr was also frequently superscripted in the
>> 15th-17th centuries, and that didn't mean anything special either - it
>> was just part of a general convention of superscripting the final
>> segment of abbreviations, probably inherited from manuscript practice.
>
> So English dropped the superscript requirement for common abbreviations 

Who said anything about requirement? I didn't.
The practice of using superscripts to end abbreviations is alive and
well in manuscript - I do it myself in writting notes for myself. For
example, "condition" I will often write as "condn", and
"equation" as "eqn".

> in the 17ᵗʰ or 18ᵗʰ century to keep it only for ordinals. Should Unicode 

What do you mean, for ordinals? If you mean 1st, 2nd etc., then there
is not now (when superscripting looks very old-fashioned) and never
has been any requirement to superscript them, as far as I know -
though since the OED doesn't have an entry for "1st", I can't easily
check.


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A sign/abbreviation for "magister"

2018-10-30 Thread Khaled Hosny via Unicode
On Tue, Oct 30, 2018 at 04:52:47PM +0100, Marcel Schneider via Unicode wrote:
>   E.g. in Arabic script, superscript is considered worth 
> encoding and using without any caveat, whereas when Latin script is on, 
> superscripts are thrown into the same cauldron as underscoring.

Curious, what Arabic superscripts are encoded in Unicode?

Regards,
Khaled


Logical Order (was: A sign/abbreviation for "magister")

2018-10-30 Thread Richard Wordingham via Unicode
On Tue, 30 Oct 2018 02:47:25 +0100
Philippe Verdy via Unicode  wrote:

> We are here at the line between what is pure visual encoding (e.g.
> using superscript letters), and logical encoding (as done eveywhere
> else in unicode with combining sequences; the most well known
> exceptions being for Thai script which uses the visual model).

For your information, Thai uses the logical encoding, almost by
definition.  The logical order is the order used in the backing store
(See Section 2.2, Unicode Design Principles
).  In the Thai
‘combining sequences’ you have in mind, the vowel symbols you have in
mind are classified as letters, so we do not have combining sequences!
There were ill-defined preposed logically following combining marks (in
the charts, but not the tables) in Unicode 1.0, but the problems with
implementing them in the Thai monosyllable เพลา were so great that I
wonder if any one succeeded at the time  -  with invisible PHINTHU, as opposed towith visible PHINTHU!

The official disinformation source,  http://www.unicode.org/glossary,
misdefines logical order
 to be ‘the order in
which text is typed on a keyboard’.  So much for suggestions that one
should design keyboard interfaces to convert visual order to storage
order!

A striking example is New Tai Lue, whose standard ordering was changed
from phonetic order to visual order because it was found that the
logical order, even using the Unicode *character* encoding, was visual
order rather than phonetic order.

Richard.



Re: A sign/abbreviation for "magister"

2018-10-30 Thread Richard Wordingham via Unicode
On Tue, 30 Oct 2018 11:43:14 +
James Kass via Unicode  wrote:

> Now what if we were future historians given the task of encoding both
> of those strings, from two different sources, and had no idea what
> those two strings were supposed to represent?  Wouldn't it be best to
> preserve both strings intact, as they were originally written?

In general, it is not possible to encode text in Unicodeif one has no
knowledge of what the text itself represents.  Some English typewriters
did not distinguish digit ‘0’ from capital letter ‘O’ or digit ‘1’ from
small letter ‘l’.

Richard.



Re: A sign/abbreviation for "magister"

2018-10-30 Thread Marcel Schneider via Unicode
Rather than a dozen individual e-mails, I’m sending this omnibus reply 
for the record, because even if here and in CLDR (SurveyTool forum and 
Trac) everything has already been discussed and fixed, there is still 
a need to stay acknowledging, so as not to fail following up, with 
respect to the oncoming surveys, next of which is to start in 30 days.

First here: On 29/10/2018 at 12:43, Dr Freytag via Unicode wrote:

[…]
> The use of superscript is tricky, because it can be optional in some
> contexts; if I write "3rd" in English, it will definitely be
> understood no different from "3rd". 

[Note that this second instance was actually intended to read "3ʳᵈ", 
but it was formatted using a higher-level protocol.]

[…]
> In TeX the two transition fluidly. If I was going to transcribe such
> texts in TeX, I would construct a macro […]
[…]
> Nevertheless, I think the use of devices like combining underlines
> and superscript letters in plain text are best avoided.

While most other scripts from Arabic to Duployan are generously granted 
all and everything they need for accurate representation, starting with 
preformatted superscripts and ending with superscripting or subscripting 
format controls, Latin script is often quite deliberately pulled down 
in order to make it unusable outside high-end DTP software, from 
TeX to Adobe InDesign, with the notable exception of sparsely and 
parsimoniously encoded preformatted characters for phoneticists and 
medievalists. E.g. in Arabic script, superscript is considered worth 
encoding and using without any caveat, whereas when Latin script is on, 
superscripts are thrown into the same cauldron as underscoring.

Obviously Unicode don’t apply to Latin script the same principle they 
do to all other scripts, i.e. to free preformatted letters as suitable 
if they are part of a standard representation and in some cases are 
needed to ensure unambiguity. Mediterranean locales had preformatted 
ordinal indicators even in the Latin-1-only era, despite "1a" and "2o" 
may be understood no different from "1ª" and 2º". The degree sign, that 
is on French keyboards, is systematically hijacked to represent the 
"n°" abbreviation, unless a string is limited to ASCII-only. Several 
Latin-script-using locales have standard representations and strong 
user demands for superscripts, which instead of being satisfied on 
Unicode level as would be done for any other of the world’s scripts, 
are obstinately rebuffed when not intended for phonetics, or in 
some cases, for palaeography.

I wasn’t digging down to find out about those UTC members who on a 
regular basis are aggressively contradicting ballot comments about 
encoding palaeographic Latin letters, while proving unable to sustain 
any open and honest discussion on this List or elsewhere. Referring to 
what Dr Everson via Unicode wrote on 28/10/2018 at 21:49:

> I like palaeographic renderings of text very much indeed, and in fact
> remain in conflict with members of the UTC (who still, alas, do NOT
> communicate directly about such matters, but only in duelling ballot
> comments) about some actually salient representations required for
> medievalist use.


That said: On 29/10/2018 at 09:09, James Kass via Unicode wrote:
[…]
> If I were entering plain text data from an old post card, I'd try
> to keep the data as close to the source as possible. Because that
> would be my purpose. Others might have different purposes. 
> As you state, it depends on the intention. But, if there were an
> existing plain text convention I'd be inclined to use it. 
> Conventions allow for the possibility of interchange, direct
> encoding would ensure it.

The goal of discouraging Latin superscripts is obviously to ensure 
that reliable document interchange is limited to the PDF. 

If Unicode were allowed to emit an official recommendation to use 
preformatted superscripts in Latin script, too, then font designers 
would implement comprehensive support of combining diacritics, and 
any plain text including superscripted abbreviations could use the 
preformatted characters, in order to gather the interoperability 
that Unicode was designed for. Referring to what Dr Verdy via Unicode 
wrote on 28/10/2018 at 19:01:

[…]
> However it is still not very elegant if we stil need to use only
> the limited set of superscript letters (this still reduces the
> number of abbreviations, such as those commonly used in French
> that needs a superscript "é")

The use of combining diacritics with preformatted superscripts is 
also the reason why Unicode is limiting encoding support to base 
letters, even for preformatted superscript letters. The rule that 
no *new* precomposed letters with acute accent are encoded anymore 
applies to superscripts too. A Unicode-conformant way to represent 
such abbreviations would IMO use U+1D49 followed by U+0301: ,ᵉ́,.
Other representations may require OpenType support, which in Latin 
script is often turned off, supposedly in order to 

Re: A sign/abbreviation for "magister"

2018-10-30 Thread Julian Bradfield via Unicode
On 2018-10-30, James Kass via Unicode  wrote:
> (Still responding to Ken Whistler's post)

> Do you know the difference between H₂SO₄ and H2SO4?  One of them is a 
> chemical formula, the other one is a license plate number. T̲h̲a̲t̲ is 
> not a stylistic difference /in my book/.  (Emphasis added.)

Yes. In chemical notation, sub/superscripting is semantically
significant.
That's not the case for abbreviations: the choice of Mr or any of its
superscripted and decorated variations is not semantically
significant.
The English abbreviation Mr was also frequently superscripted in the
15th-17th centuries, and that didn't mean anything special either - it
was just part of a general convention of superscripting the final
segment of abbreviations, probably inherited from manuscript practice.

> But suppose both those strings were *intended* to represent the chemical 
> formula?  Then one of them would be optimally correct; the other one... meh.
>
> Now what if we were future historians given the task of encoding both of 
> those strings, from two different sources, and had no idea what those 
> two strings were supposed to represent?  Wouldn't it be best to preserve 
> both strings intact, as they were originally written?

Indeed - and that means an image, not any textual representation. The
typeface might be significant too.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A sign/abbreviation for "magister"

2018-10-30 Thread James Kass via Unicode



(Still responding to Ken Whistler's post)

> The fact that I could also implement superscripting and subscripting on a
> mechanical typewriter via turning the platen up and down half a line, 
also

> does not make *those* aspects of text styling plain text. either.

Do you know the difference between H₂SO₄ and H2SO4?  One of them is a 
chemical formula, the other one is a license plate number. T̲h̲a̲t̲ is 
not a stylistic difference /in my book/.  (Emphasis added.)


But suppose both those strings were *intended* to represent the chemical 
formula?  Then one of them would be optimally correct; the other one... meh.


Now what if we were future historians given the task of encoding both of 
those strings, from two different sources, and had no idea what those 
two strings were supposed to represent?  Wouldn't it be best to preserve 
both strings intact, as they were originally written?




Re: A sign/abbreviation for "magister"

2018-10-30 Thread James Kass via Unicode



Ken Whistler replied,

>> could be typed on old-style mechanical
>> typewriters.  Quintessential plain-text, that.
>
> Nope. Typewriters were regularly used for
> underscoring and for strikethrough, both of which
> are *styling* of text, and not plain text. The
> mere fact that some visual aspect of graphic
> representation on a page of paper can be
> implemented via a mechanical typewriter does not,
> ipso facto, mean that particular feature is plain
> text. The fact that I could also implement
> superscripting and subscripting on a mechanical
> typewriter via turning the platen up and down half
> a line, also does not make *those* aspects of text
> styling plain text. either.

Sorry if we disagree.

I've never used a typewriter for producing anything other than text.  
Just plain old unadorned text.  Plain text.  Colloquially speaking 
rather than speaking technically.  Text existed before the computer age.


A typewriter puts text on paper.  Pressing the "M" key while holding the 
"Shift" key puts "M" on the sheet.  Rolling the platen appropriately and 
striking "r" puts a superscript "r" on the sheet. Hitting the backspace 
key, rolling the platen a bit in the other direction and typing the 
"equals" key finishes this abbreviation in the text on the page.  Then 
the user rolls the platen to its earlier position and resumes typing.  
(It's way easier to do than to describe.)


If the typist didn't intend to put a superscript "r" on that page with a 
double underline, the typist wouldn't have bothered with all that jive.


It's about the importance one places on respecting authorial intent.

Anything reasonable done on a mechanical typewriter can be replicated in 
an electronic data display.  If necessary I'd use a kludge before I'd 
hold my breath waiting for direct encoding when the desired result is 
for the displayed text on the screen to match the handwritten text in 
the source as closely as possible.  (I've used lots of kludges while 
awaiting the real M=ͨCoy.)


Sure, underscoring was used for s̲t̲r̲e̲s̲s̲, but it wasn't used *as* a 
stylistic difference as much as it was used *in lieu* of the ability to 
make a stylistic difference, such as bolding or italicizing.  It's the 
"plain text" convention of that time, predating the asterisks or slashes 
used in the modern convention. Underscoring might be stripped without 
messing with the legibility, but so could tatweels and lots of other 
stuff.  If nothing should mung the asterisks and slashes used in the 
modern convention, then the earlier convention's underscoring is every 
bit as worthy of being preserved.  (If I'm not mistaken, there was also 
some kind of underscoring convention for titles which was used instead 
of placing titles in quotes.)


Strikethrough isn't stylistic if it's done to type a character which 
isn't present on one of the keys.  For example, letters with strokes 
used for minority languages, like "Ŧ".  I don't see strikethrough as 
"style" if the typist didn't want to waste White Out on a draft, either.


Perhaps I should have referred to typewritten text as seminal plain text 
rather than quintessential plain text, but quintessential scans better.


Speaking of text, computer age or otherwise, the O.E.D. definition of 
text as related to computers appears outdated and/or incomplete:

https://en.oxforddictionaries.com/definition/text
(definition 1.3)



Re: A sign/abbreviation for "magister"

2018-10-30 Thread Richard Wordingham via Unicode
On Mon, 29 Oct 2018 12:20:49 -0700
Doug Ewell via Unicode  wrote:

> Richard Wordingham wrote:

> > I think this is one of the few cases where Multicode may have
> > advantages over Unicode. In a mathematical contest, aⁿ would be
> > interpreted as _a_ applied _n_ times. As to "fⁿ", ambiguity may be
> > avoided by the superscript being inappropriate for an exponent. What
> > is redundant in one context may be significant in another.  
>  
> Are you referring to the encoding described in the 1997 paper by
> Mudawwar, which "address[es] Unicode's principal drawbacks" by
> switching between language-specific character sets? Kind of like ISO
> 2022, but less extensible?

More precisely to the principle.  What is an irrelevant, optional
feature in one writing system may be significant in another.  I'm
currently trying to work out the rules for writing Pali in the Sinhala
script - I have to worry about the difference between touching letters
and conjuncts.  A simple ISCII-like encoding for Sinhala Pali would
delegate such matters to the font.

Richard.