Re: A sign/abbreviation for "magister"

Marcel Schneider via Unicode Thu, 01 Nov 2018 13:53:29 -0700

On 01/11/2018 01:21, Asmus Freytag via Unicode wrote:

On 10/31/2018 3:37 PM, Marcel Schneider via Unicode wrote:

On 31/10/2018 19:42, Asmus Freytag via Unicode wrote:

[…]

It is a fallacy that all text output on a computer should match the convention
of "fine typography".


Much that is written on computers represents an (unedited) first draft. Giving
such texts the appearance of texts, which in the day of hot metal typography,
was reserved for texts that were fully edited and in many cases intended for
posterity is doing a disservice to the reader.

The disconnect is in many people believing the user should be <del>disabled to 
write</del>
[prevented from writing]


Thank you for correcting.

his or her language without disfiguring it by lack of decent keyboarding, and
that such input should be considered standard for user input. Making such text
usable for publishing needs extra work, that today many users cannot afford,
while the mass of publishing has increased exponentially over the past decades.
The result is garbage, following the rule of “garbage in, garbage out.”


No argument that there are some things that users cannot key in easily and that 
the common
fallbacks from the days of typewritten drafts are not really appropriate in 
many texts that
otherwise fall short of being "fine typography".


The goal I wanted to reach by discussing and invalidating the biased and 
misused concept
of “fine typography” is that this thread could get rid of it, but I’m 
definitely unfortunate.
It’s hard for you to understand that relegating abbreviation indicators into 
the realm of
“fine typography” recalls me what I got to hear (undisclosed for privacy) when 
asking that
the French standard keyboard layouts (plural) support punctuation spacing with
NARROW NO-BREAK SPACE, and that is closely related to the issue about social 
media that
you pointed below.

Don’t worry about users not being able to “key in easily” what is needed for 
the digital
representation of their language, as long as:

1. Unicode has encoded what is needed;

2. Unicode does not prohibit the use of the needed characters.

The rest is up to keylayout designers. Keying in anything else is not an issue 
so far.

The real
disservice to the reader is not to enable the inputting user to write his or her
language correctly. A draft whose backbone is a string usable as-is for 
publishing
is not a disservice, but a service to the reader, paying the reader due respect.
Such a draft is also a service to the user, enabling him or her to streamline 
the
workflow. Such streamlining brings monetary and reputational benefit to the 
user.


I see a huge disconnect between "writing correctly" and "usable as-is for 
publishing". These
two things are not at all the same.

Publishing involves making many choices that simply aren't necessary for more "rough 
& ready"
types of texts. Not every twitter or e-mail message needs to be "usable as-is for 
publishing", but
should allow "correctly written" text as far as possible.


Not every message, especially not those whose readers expect a quick response.
The reverse is true with new messages (tweets, thread lauchers, requests, 
invitations).
As already discussed, there are several levels of correctness. We’re talking 
only about
the accurate digital representation of human languages, which includes correct 
punctuation.
E.g. in languages using letter apostrophe, hashtags made of a word including an 
apostrophe
are broken when ASCII or punctuation apostrophe (close quote) is used, as we’ve 
been told.

Supposedly part of this discussion would be streamlined if one could experience 
how easy
it can be to type in one’s language’s accurate digital representation. But  
it’s better
to be told what goes on, and what “strawmen” we’re confused with, since, again,
informed discussion brings advancement.


When "desktop publishing" as it was called then, became available, too many 
people started to
obsess with form over content. You would get these beautifully laid out 
documents, the contents
of which barely warranted calling them a first draft.


Typing in one’s language’s accurate digital representation is not being 
obsessed with form
over content, provided that appropriate keyboarding is available. E.g. the 
punctuation
apostrophe is on level 1 where the ASCII apostrophe is when digits are locked 
on level 1
on the French keyboard I’ve in use; else, digits are on level 3 where is also 
superscript e
for ready input of most of the ordinals (except 1ᵉʳ/1ʳᵉ, 2ⁿᵈ for ranges, and 
plural with ˢ):
2ᵉ 3ᵉ 4ᵉ 5ᵉ 6ᵉ 7ᵉ 8ᵉ 9ᵉ 10ᵉ 11ᵉ 12ᵉ. Hopefully that demo makes clear what is 
intended.
Users not needing accurate repsesentation in a given string are free to type in 
otherwise.

The goal of this discussion is that Unicode allow accurate representation, not 
impose it.
Actually Unicode is still imposing inaccurate representation to some languages 
due to TUS
prohibiting the use of precomposed superscript letters in text representing 
human languages
with standard orthography, which is what “ordinary text” seems to boil down to.

That disconnect seems to originate from the time where the computer became a 
tool
empowering the user to write in all of the world’s languages thanks to Unicode.


No, this has nothing to do with Unicode / multi-script support.


Why not? Accurate interoperable digital representation of French was totally 
impossible
before version 3.0 of Unicode (bringing the *new* NARROW NO-BREAK SPACE), while 
before,
the Standard was prevented to have such a character by misdefining the 
line-break
property of U+2008 PUCTUATION SPACE, that has the right width and serves no 
purpose
only because unlike related U+2007 FIGURE SPACE (but not U+2012 FIGURE DASH, 
mistakenly
added to the list in my previous e-mail), it is not non-breakable. Useful 
punctuation
spacing was dismissed as being too “fine” a typography for being universally 
available
and interoperable, while the opposite is true: It’s the only way of writing 
French
without being at risk of conveying the impression of poor craftmanship (see 
below).

The concept of “fine typography” was then used to draw a borderline between what
the user is supposed to input, and what he or she needs to get for publication.


This same dividing line applies in English (or any of the other individual 
languages).


Yes of course. The four lines above only intended to set the scene. AFAICS, the
disconnect of an encoding standard designed for accuracy and interoperability, 
the use and
the usefulness of which is intentionally throttled down in order to get 
non-accurate and
non-interoperable digital representations of some languages, is unprecedented, 
and it
originates from the time the Unicode Standard was set up. Spacing has been 
fixed,
ordinal indicators are being fixed, and now, other abbreviation indicators 
still need
fixing.

In the same move, that concept was extended in a way that it should include the
quality of the string, additionally to what _fine typography_ really is: fine
tuning of the page layout, such as vertical justification, slight variations in
the width of non-breakable spaces, and of course, discretionary ligatures.


Certain elements of styling are also part of fine typography. In some cases, readying a 
"string"
for publication also means applying spelling conventions or grammatical 
conventions (for those
cases where there are ambiguities in the common language, or applying preferred 
word choices
or ways of formulating things that may be particular to individual publishers 
or types of publications.


None of these is a reason not to be able to input abbreviation indicators in 
plain text.
But for the rest, I cannot see that applying style guides’ orthographies is 
part of fine
typography, just of publishing. These parameters are at the discretion of the 
management.
That does not preclude the input of superscript on a keyboard, and as a side 
note, the
intake of publishers is mainly at least rich text or another markup convention, 
most
currently TeX (for scientific publications). But Unicode promises accurate 
interoperable
representation of all of the world’s languages in plain text. Hence, authors 
are advised
that a good way to make TeX more human-readable is to use more Unicode.


Using HYPHEN-MINUS instead of "EN DASH" or "HYPHEN" is perfectly OK for early 
stages of
drafting a text. Attempting to follow those and similar conventions during that 
phase forces
the author to pay attention to the wrong thing - his or her focus should be on 
the ideas and
the content, not the form of the document.


There is some good point in that. But a close look at just these two 
conventions leads
to significantly lessen the advantage of not using accurate punctuation in 
one’s drafts.

1. HYPHEN-MINUS vs EN DASH or, should be added, EM DASH: That is not possible 
in locales
using no spacing around EM DASH. Right, SPACE, HYPHEN-MINUS, SPACE is easily 
replaced with
SPACE, EN DASH, SPACE or any other dashing convention at a later stage. But not 
using
a correct dash out of U+2013, U+2014 and U+2015 is not nearly useful if all 
these are
on level 2 of three digit keys (1, 2, 3 or another range). Additionally that 
brings the
advantage of being able to differenciate while thinking at the content. Nobody 
else can
do that job later with a comparable efficiency.

2. HYPHEN-MINUS vs HYPHEN: That has much of a non-starter. As already discussed 
in
detail on this List, HYPHEN is a useless duplicate encoding of HYPHEN-MINUS, 
which
in almost all fonts has the glyph of HYPHEN and is used for the system hyphen 
from
the automated hyphenation when a .docx is exported as a .pdf file. Using fonts
designed otherwise requires either a special keyboard layout or weird 
replacements
because the HYPHEN-MINUS in URLs and e-mail addresses must not be replaced. So 
using
HYPHEN-MINUS everywhere a HYPHEN is intended is OK even in publishing. Only some
fonts may need fixing (I don’t know more than a single one).

Producing a plain text string usable for publishing was then put out of reach
of most common mortals, by using the lever of deficient keyboarding, but also
supposedly by an “encoding error” (scare quotes) in the line break property of
U+2008 PUNCTUATION SPACE, that should be non-breakable like its siblings
U+2007 FIGURE SPACE (still—as per UAX #14—recommended for use in numbers) 
<del>and
U+2012 FIGURE DASH</del> to gain the narrow non-breaking space needed to space 
the

[corrected, see above]

triads in numbers using space as a group separator, and to space big punctuation
in a Latin script using locale, where JTC1/SC2/WG2 had some meetings for the 
UCS:
French.


Those details should be handled in a post-processing phase for documents that 
are intended
for publication.


Not at all, as already stated above. Making a mess of any text file that is not 
print-ready,
is an insult to the reader. And any *French* text not spacing punctuations with 
NNBSP is at
risk of ending up as a mess.

One of the big problem in current architectures is that things like 
"autocorrect"
which attempt to overcome the limitations of the current keyboards,


That is another disconnect, already pointed out repeatedly. Current keyboards 
have no
intrinsic “limitations”, and referring to outdated keyboard layouts as a 
fatality is
in disconnect with the reality, since all OS vendors offer facilities to 
complete,
enhance or change the keyboard layout.

are applied at input time
only; and authors need to constantly interact with these helpers to make sure 
they don't mis-
fire.


Correct; that is also where originated what was called “the apostrophe 
catastrophe.”

Much text that is laboriously prepared this way, will not survive future 
revisions during
the editing process needed to get the *content* to publication quality.


That only applies to files fed in an editing process. Many people are directly 
publishing
out-of-the-keyboard, and that is where complete and readily available Unicode 
support matters
most. Anything else can be made up by the rendering engine, as you already 
noted. The force
of Unicode being interoperability and data exchange, I can see no technical 
reason not to
type in Unicode on one’s keyboard, including abbrevation indicators of any kind.


All because users have no convenient tool to "touch-up" these dashes, quotes, 
and spaces
in a later phase; at the same time they apply copy-editing, for example.


Because once you are in a WYSIWYG environment, you cannot simply transfer the 
text to
your text editor to apply regexes, and people need to write macros in VBA to 
get things
done I figure out. Autocorrect is consistent with WYSIWYG. People not 
interested in seeing
what they’re typing may wish to use LaTeX, where they can see it in another 
window.
What I cannot see is why these important issues should preclude users from 
typing
preformatted superscripts on their keyboard, be it via a ‹superscript› dead key.
Such a dead key is already standardized, but again, Karl Pentzlin’s proposal to
encode the missing characters has been rejected, while in this thread we could
see there is an interest for what could be called a UnicodeChem notation, a
nearly plain text encoding of chemical elements, compounds and processes.

For everybody having beneath his or her hands a keyboard whose layout driver is
programmed in a fully usable way, the disconnect implodes. At encoding and input
levels (the only ones that are really on-topic in this thread) the sorcery 
called
fine typography sums then up to nothing else than having the keyboard inserting
fully diacriticized letters, right punctuation, accurate space characters, and
superscript letters as ordinal indicators and abbreviation endings, depending
on the requirements.


In the days of typewritten manuscripts you had to follow certain conventions 
that allowed the
typesetter to select the intended symbols and styled letters. I'm not arguing 
that we should
return to where such fallbacks are used. And certainly not arguing that we 
should be using
ASCII fallbacks for letters with diacritics, such as "oe" for "ö".

But many issues around selecting the precise type of space or dash are not so 
much issues
of correct content but precisely issues of typography.


That is right so far as the French national printing office recommends to use 
NBSP with the
colon, while the industry widely uses NNBSP for colon, too, Philippe Verdy 
reported on this
List. It also states that the same should be done for angle quotation marks, 
but does not so.
Here is indeed matter for fine-tuning, but as stated above and below, NBSP does 
not work in
every environment, even not in most of the most common ones where users are 
typing text. I
still call a string publication ready where big punctuations are spaced with 
NNBSP uniformely.


Some occupy an intermediate level, where it would be  quite appropriate to 
apply them to
many automatically generated texts. (I am aware of your efforts in CLDR to that 
effect).


Thank you for the occasion to invite everyone to join in and contribute to the 
oncoming
surveys of Unicode’s Common Locale Data Repository. Much needs to be done in 
French and
in many locales already present, even if the stress should naturally be on 
adding *new*
locales still not in CLDR.

But I still believe that they have no place in content focused writing.


That is only the effect of an error of perception, that is widely fueled by the 
deficient
keyboard design not supporting automated punctuation spacing for French. See 
ticket in Trac.

Now was I talking about “all text output on a computer”? No, I wasn’t.

The computer is able to accept input of publishing-ready strings, since we have
Unicode. Precluding the user from using the needed characters by setting up
caveats and prohibitions in the Unicode Standard seems to me nothing else than
an outdated operating mode. U+202F NARROW NO-BREAK SPACE, encoded in 1999 for
Mongolian [1][2], has been readily ripped off by the French graphic industry.
In 2014, TUS started mentioning its use in French [3]; in 2018, it put it on
top [4].
That seems to me a striking example of how things encoded for other purposes
are reused (or following a certain usage, “abused”, “hacked”, “hijacked”) in
locales like French. If it wasn’t an insult to minority languages, that
language could be called, too, “digitally disfavored” in a certain sense.

On the other hand, I'm a firm believer in applying certain styling attributes
to things like e-mail or discussion papers. Well-placed emphasis can make such
texts more readable (without requiring that they pay attention to all other
facets of "fine typography".)

The parenthesized sidenote (that is probably the intended main content…) makes
this paragraph wrong. I’d buy it if either the parenthesis is removed or if it
comes after the following.


Now you are copy-editing my e-mails. :)

:)


I don't read or write French on the level that I can evaluate your contention 
that the language
is digitally disadvantaged.


It was heavily disadvantaged until U+202F NARROW NO-BREAK SPACE was encoded and 
widely
implemented. Implementation would have been speedy and straightforward if only 
it had
been present from the beginning on, as U+2008 PUNCTUATION SPACE. Even the 
character name
would have matched the purpose. Perhaps the Frenchmen implied were hindered in 
fixing
that bug while being aware of its gravity.

Then it was still disadvantaged by lack of ordinal indicators, but that is now 
fixed
thanks to CLDR Technical Committee, past summer. Many thanks.

Ultimately it is part of the languages using superscript as the abbreviation 
indicator,
and not allowed by Unicode to use even the already encoded superscript letters. 
That was
not fixed in CLDR for v34 because the browsers used to display the data, 
notably in the
SurveyTool implemented as a web interface, still are not using decent fonts 
having
Unicode conformant glyphs for all superscript letters and even digits as seen 
in some
webmail interfaces. The resulting ransome note effect made it impossible to 
responsively
back the use of those letters in natural languages as abbreviation indicators, 
because
unlike phonetics using these letters in isolation, natural languages may have 
abbreviation
endings encompassing more than the final letter.

For the abbreviation of Magister like on the Polish postcard, that is not a 
problem.


To some extent, software will always reflect the biases of its creators, and in 
some subtle ways
these will end up in conflict with conventions in other languages. In some 
cases, conventions
applied by human typesetters cannot easily be duplicated by software that 
cannot recognize
the meaning of the text,


Very good point. That is exactly the reason why the author should be enabled to 
take full control
over his or her text, and that is best and most universally done by correctly 
programming the
layout driver of the keyboard used.

and in some cases we have seen languages abandoning these
conventions in recent reforms in favor of a set of rules that are a bit more 
"mechanistic"
if you will.

In German, it used to be necessary to understand the word division to know 
whether or not
to apply a ligature. Some of the rules for combining words into compounds were 
changed
and that may have made that process more regular as well.


That is a fine step forward for good typography.


But still, forcing all users to become typesetters was one of the wrong turns 
taken during the
early development of publishing on computers.


I don’t think so at all. Users were not “forced” to do anything. If the 
autocorrect facilities
helping over the deficient keyboarding were not welcome, they could easily be 
turned off. And
professional typesetters always remained active, turning to the computer in the 
wake.
I’ve experienced myself being able thanks to Microsoft’s word processor to do 
professionally
looking typesetting. (As I was responsible for the content anyway, it didn’t 
make a difference.)
But first I had to add some entries to Word’s autocorrect for tweaking the 
keyboard.

You seem to revel in knowing all the little
details in French usage,


Not at all. That knowledge is a sheer necessity, and fortunately it is so 
narrow that
you don’t need to know that much to digitally typeset French. But you need to 
know the
relevant points. The fact that NARROW NO-BREAK SPACE is narrow doesn’t make it 
little,
but it misleads people to classify it under “fine typography”, even more in 
French where
(as found in TUS, in French in the text) it’s called an “espace fine insécable”.

but I bet not even all educated French people reach your level.


Precisely on this point, perhaps not but that point is relevant mainly to those
programming and documenting keyboard layouts. After that, punctuation spacing is
automated on level 2 (just press Shift) and easily turned off by several means.
I hope that will be welcome, as almost everyone in France is very careful to
always space the big punctuation marks by the means available so far.
And to always superscript the ordinal indicators and other abbreviation 
indicators,
at least while handwriting.


The best keyboard drivers won't help.


Why do you see that they won’t help?

So the idea that every string is supposed to be
"publication-ready" remains a fallacy. However, there shouldn't be encoding 
obstacles
to creating publication-ready strings. (Whether created by copy-editors, 
typesetters, or
advanced tools that post-process draft texts).


What I’d mainly like to see is that Unicode (supposing that you are writing on 
behalf
of the Consortium) do not impose a division of the workflow. Everybody should 
be able
to apply to any task the most appropriate process, no matter of how many parts 
it will
consist. If a subset of end-users wish to input strings that won’t need to be 
modified
in detail for publishing (except headings), Unicode is here to empower them to 
do so.

Can that be taken for granted?


If an Twitter message uses spaces around punctuation that are not the right 
width, who
cares;


As pointed out in the paragraph of my previous e-mail just below, the main issue
around punctuation spacing in French in non-justifying layout is not the width 
of
the space characters, but their line-breaking property. Believe it or not, 
U+00A0
NO-BREAK SPACE is breakable in those environments, that are therefore messing 
around
with spaced punctuation unless the space used is U+202F NARROW NO-BREAK SPACE. 
Or
U+2007 FIGURE SPACE, but if we’re having to use an extra space character, we 
may as well
pick the right one, given FIGURE SPACE is not fit for publishing, while NNBSP 
is.

but if your copy-editor can't prepare a manuscript for publication because of 
software
limitations, that's a different can of worms.


My copy-editor is me. I wrote in my previous (perhaps too long, but couldn’t 
help) e-mail:
“Making such text usable for publishing needs extra work, that today many users 
cannot
afford”, and: “Such a draft is also a service to the user, enabling him or her 
to
streamline the workflow. Such streamlining brings monetary and reputational 
benefit to
the user.” The working scheme used with TeX or regexes is not interoperable, 
and the
drafts are not all-purpose. A publishing-ready draft is in my opinion a plain 
text string
that can be copy-pasted as-is — or typed directly — in a blog post composer 
form while
being sure that all punctuation and punctuation spacing is fully operational. I 
don’t
currently do this, but many people do, and are doing word processing where the 
same
applies, given the autocorrect doesn’t use the up-to-date space and can hardly 
guess in
every case what the user intends to type, you pointed out.

A./

With due respect, I need to add that the disconnect in that is visible only to
French readers. Without NNBSP, punctuation à la française in e-mails is messed
up because even NBSP is ignored (I don’t know what exactly happens at backend;
anyway at frontend it’s like a normal space in at least one e-mail client and
in several if not all browsers, and if pasted in plain text from MS Word, it’s
truly replaced with SP. All that makes e-mails harder to read. Correct spacing
with punctuation in French is often considered “fine-tuning”, but only if that
punctuation spacing is not supported by the keyboard driver, and that’s still
almost always the case, except on the updated version 1.1 of the bépo layout
(and some personal prototypes not yet released).


Best regards,

Marcel

Re: A sign/abbreviation for "magister"

Reply via email to