On 01/11/2018 01:21, Asmus Freytag via Unicode wrote:
On 10/31/2018 3:37 PM, Marcel Schneider via Unicode wrote:
On 31/10/2018 19:42, Asmus Freytag via Unicode wrote:
[…]
It is a fallacy that all text output on a computer should match the convention
of "fine typography".
Much that is written on computers represents an (unedited) first draft. Giving
such texts the appearance of texts, which in the day of hot metal typography,
was reserved for texts that were fully edited and in many cases intended for
posterity is doing a disservice to the reader.
The disconnect is in many people believing the user should be <del>disabled to
write</del>
[prevented from writing]
Thank you for correcting.
his or her language without disfiguring it by lack of decent keyboarding, and
that such input should be considered standard for user input. Making such text
usable for publishing needs extra work, that today many users cannot afford,
while the mass of publishing has increased exponentially over the past decades.
The result is garbage, following the rule of “garbage in, garbage out.”
No argument that there are some things that users cannot key in easily and that
the common
fallbacks from the days of typewritten drafts are not really appropriate in
many texts that
otherwise fall short of being "fine typography".
The goal I wanted to reach by discussing and invalidating the biased and
misused concept
of “fine typography” is that this thread could get rid of it, but I’m
definitely unfortunate.
It’s hard for you to understand that relegating abbreviation indicators into
the realm of
“fine typography” recalls me what I got to hear (undisclosed for privacy) when
asking that
the French standard keyboard layouts (plural) support punctuation spacing with
NARROW NO-BREAK SPACE, and that is closely related to the issue about social
media that
you pointed below.
Don’t worry about users not being able to “key in easily” what is needed for
the digital
representation of their language, as long as:
1. Unicode has encoded what is needed;
2. Unicode does not prohibit the use of the needed characters.
The rest is up to keylayout designers. Keying in anything else is not an issue
so far.
The real
disservice to the reader is not to enable the inputting user to write his or her
language correctly. A draft whose backbone is a string usable as-is for
publishing
is not a disservice, but a service to the reader, paying the reader due respect.
Such a draft is also a service to the user, enabling him or her to streamline
the
workflow. Such streamlining brings monetary and reputational benefit to the
user.
I see a huge disconnect between "writing correctly" and "usable as-is for
publishing". These
two things are not at all the same.
Publishing involves making many choices that simply aren't necessary for more "rough
& ready"
types of texts. Not every twitter or e-mail message needs to be "usable as-is for
publishing", but
should allow "correctly written" text as far as possible.
Not every message, especially not those whose readers expect a quick response.
The reverse is true with new messages (tweets, thread lauchers, requests,
invitations).
As already discussed, there are several levels of correctness. We’re talking
only about
the accurate digital representation of human languages, which includes correct
punctuation.
E.g. in languages using letter apostrophe, hashtags made of a word including an
apostrophe
are broken when ASCII or punctuation apostrophe (close quote) is used, as we’ve
been told.
Supposedly part of this discussion would be streamlined if one could experience
how easy
it can be to type in one’s language’s accurate digital representation. But
it’s better
to be told what goes on, and what “strawmen” we’re confused with, since, again,
informed discussion brings advancement.
When "desktop publishing" as it was called then, became available, too many
people started to
obsess with form over content. You would get these beautifully laid out
documents, the contents
of which barely warranted calling them a first draft.
Typing in one’s language’s accurate digital representation is not being
obsessed with form
over content, provided that appropriate keyboarding is available. E.g. the
punctuation
apostrophe is on level 1 where the ASCII apostrophe is when digits are locked
on level 1
on the French keyboard I’ve in use; else, digits are on level 3 where is also
superscript e
for ready input of most of the ordinals (except 1ᵉʳ/1ʳᵉ, 2ⁿᵈ for ranges, and
plural with ˢ):
2ᵉ 3ᵉ 4ᵉ 5ᵉ 6ᵉ 7ᵉ 8ᵉ 9ᵉ 10ᵉ 11ᵉ 12ᵉ. Hopefully that demo makes clear what is
intended.
Users not needing accurate repsesentation in a given string are free to type in
otherwise.
The goal of this discussion is that Unicode allow accurate representation, not
impose it.
Actually Unicode is still imposing inaccurate representation to some languages
due to TUS
prohibiting the use of precomposed superscript letters in text representing
human languages
with standard orthography, which is what “ordinary text” seems to boil down to.
That disconnect seems to originate from the time where the computer became a
tool
empowering the user to write in all of the world’s languages thanks to Unicode.
No, this has nothing to do with Unicode / multi-script support.
Why not? Accurate interoperable digital representation of French was totally
impossible
before version 3.0 of Unicode (bringing the *new* NARROW NO-BREAK SPACE), while
before,
the Standard was prevented to have such a character by misdefining the
line-break
property of U+2008 PUCTUATION SPACE, that has the right width and serves no
purpose
only because unlike related U+2007 FIGURE SPACE (but not U+2012 FIGURE DASH,
mistakenly
added to the list in my previous e-mail), it is not non-breakable. Useful
punctuation
spacing was dismissed as being too “fine” a typography for being universally
available
and interoperable, while the opposite is true: It’s the only way of writing
French
without being at risk of conveying the impression of poor craftmanship (see
below).
The concept of “fine typography” was then used to draw a borderline between what
the user is supposed to input, and what he or she needs to get for publication.
This same dividing line applies in English (or any of the other individual
languages).
Yes of course. The four lines above only intended to set the scene. AFAICS, the
disconnect of an encoding standard designed for accuracy and interoperability,
the use and
the usefulness of which is intentionally throttled down in order to get
non-accurate and
non-interoperable digital representations of some languages, is unprecedented,
and it
originates from the time the Unicode Standard was set up. Spacing has been
fixed,
ordinal indicators are being fixed, and now, other abbreviation indicators
still need
fixing.
In the same move, that concept was extended in a way that it should include the
quality of the string, additionally to what _fine typography_ really is: fine
tuning of the page layout, such as vertical justification, slight variations in
the width of non-breakable spaces, and of course, discretionary ligatures.
Certain elements of styling are also part of fine typography. In some cases, readying a
"string"
for publication also means applying spelling conventions or grammatical
conventions (for those
cases where there are ambiguities in the common language, or applying preferred
word choices
or ways of formulating things that may be particular to individual publishers
or types of publications.
None of these is a reason not to be able to input abbreviation indicators in
plain text.
But for the rest, I cannot see that applying style guides’ orthographies is
part of fine
typography, just of publishing. These parameters are at the discretion of the
management.
That does not preclude the input of superscript on a keyboard, and as a side
note, the
intake of publishers is mainly at least rich text or another markup convention,
most
currently TeX (for scientific publications). But Unicode promises accurate
interoperable
representation of all of the world’s languages in plain text. Hence, authors
are advised
that a good way to make TeX more human-readable is to use more Unicode.
Using HYPHEN-MINUS instead of "EN DASH" or "HYPHEN" is perfectly OK for early
stages of
drafting a text. Attempting to follow those and similar conventions during that
phase forces
the author to pay attention to the wrong thing - his or her focus should be on
the ideas and
the content, not the form of the document.
There is some good point in that. But a close look at just these two
conventions leads
to significantly lessen the advantage of not using accurate punctuation in
one’s drafts.
1. HYPHEN-MINUS vs EN DASH or, should be added, EM DASH: That is not possible
in locales
using no spacing around EM DASH. Right, SPACE, HYPHEN-MINUS, SPACE is easily
replaced with
SPACE, EN DASH, SPACE or any other dashing convention at a later stage. But not
using
a correct dash out of U+2013, U+2014 and U+2015 is not nearly useful if all
these are
on level 2 of three digit keys (1, 2, 3 or another range). Additionally that
brings the
advantage of being able to differenciate while thinking at the content. Nobody
else can
do that job later with a comparable efficiency.
2. HYPHEN-MINUS vs HYPHEN: That has much of a non-starter. As already discussed
in
detail on this List, HYPHEN is a useless duplicate encoding of HYPHEN-MINUS,
which
in almost all fonts has the glyph of HYPHEN and is used for the system hyphen
from
the automated hyphenation when a .docx is exported as a .pdf file. Using fonts
designed otherwise requires either a special keyboard layout or weird
replacements
because the HYPHEN-MINUS in URLs and e-mail addresses must not be replaced. So
using
HYPHEN-MINUS everywhere a HYPHEN is intended is OK even in publishing. Only some
fonts may need fixing (I don’t know more than a single one).
Producing a plain text string usable for publishing was then put out of reach
of most common mortals, by using the lever of deficient keyboarding, but also
supposedly by an “encoding error” (scare quotes) in the line break property of
U+2008 PUNCTUATION SPACE, that should be non-breakable like its siblings
U+2007 FIGURE SPACE (still—as per UAX #14—recommended for use in numbers)
<del>and
U+2012 FIGURE DASH</del> to gain the narrow non-breaking space needed to space
the
[corrected, see above]
triads in numbers using space as a group separator, and to space big punctuation
in a Latin script using locale, where JTC1/SC2/WG2 had some meetings for the
UCS:
French.
Those details should be handled in a post-processing phase for documents that
are intended
for publication.
Not at all, as already stated above. Making a mess of any text file that is not
print-ready,
is an insult to the reader. And any *French* text not spacing punctuations with
NNBSP is at
risk of ending up as a mess.
One of the big problem in current architectures is that things like
"autocorrect"
which attempt to overcome the limitations of the current keyboards,
That is another disconnect, already pointed out repeatedly. Current keyboards
have no
intrinsic “limitations”, and referring to outdated keyboard layouts as a
fatality is
in disconnect with the reality, since all OS vendors offer facilities to
complete,
enhance or change the keyboard layout.
are applied at input time
only; and authors need to constantly interact with these helpers to make sure
they don't mis-
fire.
Correct; that is also where originated what was called “the apostrophe
catastrophe.”
Much text that is laboriously prepared this way, will not survive future
revisions during
the editing process needed to get the *content* to publication quality.
That only applies to files fed in an editing process. Many people are directly
publishing
out-of-the-keyboard, and that is where complete and readily available Unicode
support matters
most. Anything else can be made up by the rendering engine, as you already
noted. The force
of Unicode being interoperability and data exchange, I can see no technical
reason not to
type in Unicode on one’s keyboard, including abbrevation indicators of any kind.
All because users have no convenient tool to "touch-up" these dashes, quotes,
and spaces
in a later phase; at the same time they apply copy-editing, for example.
Because once you are in a WYSIWYG environment, you cannot simply transfer the
text to
your text editor to apply regexes, and people need to write macros in VBA to
get things
done I figure out. Autocorrect is consistent with WYSIWYG. People not
interested in seeing
what they’re typing may wish to use LaTeX, where they can see it in another
window.
What I cannot see is why these important issues should preclude users from
typing
preformatted superscripts on their keyboard, be it via a ‹superscript› dead key.
Such a dead key is already standardized, but again, Karl Pentzlin’s proposal to
encode the missing characters has been rejected, while in this thread we could
see there is an interest for what could be called a UnicodeChem notation, a
nearly plain text encoding of chemical elements, compounds and processes.
For everybody having beneath his or her hands a keyboard whose layout driver is
programmed in a fully usable way, the disconnect implodes. At encoding and input
levels (the only ones that are really on-topic in this thread) the sorcery
called
fine typography sums then up to nothing else than having the keyboard inserting
fully diacriticized letters, right punctuation, accurate space characters, and
superscript letters as ordinal indicators and abbreviation endings, depending
on the requirements.
In the days of typewritten manuscripts you had to follow certain conventions
that allowed the
typesetter to select the intended symbols and styled letters. I'm not arguing
that we should
return to where such fallbacks are used. And certainly not arguing that we
should be using
ASCII fallbacks for letters with diacritics, such as "oe" for "ö".
But many issues around selecting the precise type of space or dash are not so
much issues
of correct content but precisely issues of typography.
That is right so far as the French national printing office recommends to use
NBSP with the
colon, while the industry widely uses NNBSP for colon, too, Philippe Verdy
reported on this
List. It also states that the same should be done for angle quotation marks,
but does not so.
Here is indeed matter for fine-tuning, but as stated above and below, NBSP does
not work in
every environment, even not in most of the most common ones where users are
typing text. I
still call a string publication ready where big punctuations are spaced with
NNBSP uniformely.
Some occupy an intermediate level, where it would be quite appropriate to
apply them to
many automatically generated texts. (I am aware of your efforts in CLDR to that
effect).
Thank you for the occasion to invite everyone to join in and contribute to the
oncoming
surveys of Unicode’s Common Locale Data Repository. Much needs to be done in
French and
in many locales already present, even if the stress should naturally be on
adding *new*
locales still not in CLDR.
But I still believe that they have no place in content focused writing.
That is only the effect of an error of perception, that is widely fueled by the
deficient
keyboard design not supporting automated punctuation spacing for French. See
ticket in Trac.
Now was I talking about “all text output on a computer”? No, I wasn’t.
The computer is able to accept input of publishing-ready strings, since we have
Unicode. Precluding the user from using the needed characters by setting up
caveats and prohibitions in the Unicode Standard seems to me nothing else than
an outdated operating mode. U+202F NARROW NO-BREAK SPACE, encoded in 1999 for
Mongolian [1][2], has been readily ripped off by the French graphic industry.
In 2014, TUS started mentioning its use in French [3]; in 2018, it put it on
top [4].
That seems to me a striking example of how things encoded for other purposes
are reused (or following a certain usage, “abused”, “hacked”, “hijacked”) in
locales like French. If it wasn’t an insult to minority languages, that
language could be called, too, “digitally disfavored” in a certain sense.
On the other hand, I'm a firm believer in applying certain styling attributes
to things like e-mail or discussion papers. Well-placed emphasis can make such
texts more readable (without requiring that they pay attention to all other
facets of "fine typography".)
The parenthesized sidenote (that is probably the intended main content…) makes
this paragraph wrong. I’d buy it if either the parenthesis is removed or if it
comes after the following.
Now you are copy-editing my e-mails. :)
:)
I don't read or write French on the level that I can evaluate your contention
that the language
is digitally disadvantaged.
It was heavily disadvantaged until U+202F NARROW NO-BREAK SPACE was encoded and
widely
implemented. Implementation would have been speedy and straightforward if only
it had
been present from the beginning on, as U+2008 PUNCTUATION SPACE. Even the
character name
would have matched the purpose. Perhaps the Frenchmen implied were hindered in
fixing
that bug while being aware of its gravity.
Then it was still disadvantaged by lack of ordinal indicators, but that is now
fixed
thanks to CLDR Technical Committee, past summer. Many thanks.
Ultimately it is part of the languages using superscript as the abbreviation
indicator,
and not allowed by Unicode to use even the already encoded superscript letters.
That was
not fixed in CLDR for v34 because the browsers used to display the data,
notably in the
SurveyTool implemented as a web interface, still are not using decent fonts
having
Unicode conformant glyphs for all superscript letters and even digits as seen
in some
webmail interfaces. The resulting ransome note effect made it impossible to
responsively
back the use of those letters in natural languages as abbreviation indicators,
because
unlike phonetics using these letters in isolation, natural languages may have
abbreviation
endings encompassing more than the final letter.
For the abbreviation of Magister like on the Polish postcard, that is not a
problem.
To some extent, software will always reflect the biases of its creators, and in
some subtle ways
these will end up in conflict with conventions in other languages. In some
cases, conventions
applied by human typesetters cannot easily be duplicated by software that
cannot recognize
the meaning of the text,
Very good point. That is exactly the reason why the author should be enabled to
take full control
over his or her text, and that is best and most universally done by correctly
programming the
layout driver of the keyboard used.
and in some cases we have seen languages abandoning these
conventions in recent reforms in favor of a set of rules that are a bit more
"mechanistic"
if you will.
In German, it used to be necessary to understand the word division to know
whether or not
to apply a ligature. Some of the rules for combining words into compounds were
changed
and that may have made that process more regular as well.
That is a fine step forward for good typography.
But still, forcing all users to become typesetters was one of the wrong turns
taken during the
early development of publishing on computers.
I don’t think so at all. Users were not “forced” to do anything. If the
autocorrect facilities
helping over the deficient keyboarding were not welcome, they could easily be
turned off. And
professional typesetters always remained active, turning to the computer in the
wake.
I’ve experienced myself being able thanks to Microsoft’s word processor to do
professionally
looking typesetting. (As I was responsible for the content anyway, it didn’t
make a difference.)
But first I had to add some entries to Word’s autocorrect for tweaking the
keyboard.
You seem to revel in knowing all the little
details in French usage,
Not at all. That knowledge is a sheer necessity, and fortunately it is so
narrow that
you don’t need to know that much to digitally typeset French. But you need to
know the
relevant points. The fact that NARROW NO-BREAK SPACE is narrow doesn’t make it
little,
but it misleads people to classify it under “fine typography”, even more in
French where
(as found in TUS, in French in the text) it’s called an “espace fine insécable”.
but I bet not even all educated French people reach your level.
Precisely on this point, perhaps not but that point is relevant mainly to those
programming and documenting keyboard layouts. After that, punctuation spacing is
automated on level 2 (just press Shift) and easily turned off by several means.
I hope that will be welcome, as almost everyone in France is very careful to
always space the big punctuation marks by the means available so far.
And to always superscript the ordinal indicators and other abbreviation
indicators,
at least while handwriting.
The best keyboard drivers won't help.
Why do you see that they won’t help?
So the idea that every string is supposed to be
"publication-ready" remains a fallacy. However, there shouldn't be encoding
obstacles
to creating publication-ready strings. (Whether created by copy-editors,
typesetters, or
advanced tools that post-process draft texts).
What I’d mainly like to see is that Unicode (supposing that you are writing on
behalf
of the Consortium) do not impose a division of the workflow. Everybody should
be able
to apply to any task the most appropriate process, no matter of how many parts
it will
consist. If a subset of end-users wish to input strings that won’t need to be
modified
in detail for publishing (except headings), Unicode is here to empower them to
do so.
Can that be taken for granted?
If an Twitter message uses spaces around punctuation that are not the right
width, who
cares;
As pointed out in the paragraph of my previous e-mail just below, the main issue
around punctuation spacing in French in non-justifying layout is not the width
of
the space characters, but their line-breaking property. Believe it or not,
U+00A0
NO-BREAK SPACE is breakable in those environments, that are therefore messing
around
with spaced punctuation unless the space used is U+202F NARROW NO-BREAK SPACE.
Or
U+2007 FIGURE SPACE, but if we’re having to use an extra space character, we
may as well
pick the right one, given FIGURE SPACE is not fit for publishing, while NNBSP
is.
but if your copy-editor can't prepare a manuscript for publication because of
software
limitations, that's a different can of worms.
My copy-editor is me. I wrote in my previous (perhaps too long, but couldn’t
help) e-mail:
“Making such text usable for publishing needs extra work, that today many users
cannot
afford”, and: “Such a draft is also a service to the user, enabling him or her
to
streamline the workflow. Such streamlining brings monetary and reputational
benefit to
the user.” The working scheme used with TeX or regexes is not interoperable,
and the
drafts are not all-purpose. A publishing-ready draft is in my opinion a plain
text string
that can be copy-pasted as-is — or typed directly — in a blog post composer
form while
being sure that all punctuation and punctuation spacing is fully operational. I
don’t
currently do this, but many people do, and are doing word processing where the
same
applies, given the autocorrect doesn’t use the up-to-date space and can hardly
guess in
every case what the user intends to type, you pointed out.
A./
With due respect, I need to add that the disconnect in that is visible only to
French readers. Without NNBSP, punctuation à la française in e-mails is messed
up because even NBSP is ignored (I don’t know what exactly happens at backend;
anyway at frontend it’s like a normal space in at least one e-mail client and
in several if not all browsers, and if pasted in plain text from MS Word, it’s
truly replaced with SP. All that makes e-mails harder to read. Correct spacing
with punctuation in French is often considered “fine-tuning”, but only if that
punctuation spacing is not supported by the keyboard driver, and that’s still
almost always the case, except on the updated version 1.1 of the bépo layout
(and some personal prototypes not yet released).
Best regards,
Marcel