Re: Is the binaryness/textness of a data format a property?

2020-03-21 Thread Julian Bradfield via Unicode
On 2020-03-21, Eli Zaretskii via Unicode  wrote:
>> Date: Sat, 21 Mar 2020 11:13:40 -0600
>> From: Doug Ewell via Unicode 
>> 
>> Adam Borowski wrote:
>> 
>> > Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF
>> > or U+11000..U+7FFF (or possibly even up to 2³⁶ or 2⁴²), which has
>> > its uses but is not well-formed Unicode.
>> 
>> I'd be interested in your elaboration on what these uses are.
>
> Emacs uses some of that for supporting charsets that cannot be mapped
> into Unicode.  GB18030 is one example of such charsets.  The internal
> representation of characters in Emacs is UTF-8, so it uses 5-byte
> UTF-8 like sequences to represent such characters.

My own (now >10 year old) Unicode adaptation of XEmacs does the same,
even for charsets that can be mapped into Unicode. To ensure complete
backward compatibility, it distinguishes "legacy" charsets from Unicode,
and only does conversion when requested.



Re: On the lack of a SQUARE TB glyph

2019-09-27 Thread Julian Bradfield via Unicode
On 2019-09-27, David Starner via Unicode  wrote:
> On Thu, Sep 26, 2019 at 8:57 PM Fred Brennan via Unicode
> wrote:
[snip]
>> There is no sequence of glyphs that could be logically mapped, unless you're
>> telling me to request that the sequence T  B be recommended for general
>> interchange as SQUARE TB? That's silly.
>
> Why is that silly? You've got an unbounded set of these; even the base
> prefixes EPTGMkhdmμnp (and da) crossed with bBmglWsAKNJCΩT (plus a
> bunch more), which is over 200 combinations without all the units, and
> there's some exponents encoded, so some of those will need to be
> encoded with exponents. And that's far from a complete list of what
> people might want as squares.

Wouldn't T  B 
be a better sequence? 
In fact, it would have been nice (expecially for mathematicians) if
all combining marks could have been applied to character sequences, by
means of some "high precedence ZWJ" that binds more tightly than
combination.
(Playing devil's advocate here, since I don't think maths is plain
text:)

Or one could allow IDS to have leaf components that are any
characters, not just ideographic characters, and then one could have
all sorts of fun.


acute-macron hybrid?

2019-04-30 Thread Julian Bradfield via Unicode
The celebrated Bosworth-Toller dictionary of Anglo-Saxon uses a
curious diacritic to mark long vowels. It may be described as a long
shallow acute with a small down-tick at the right.
It contrasts with an acute (quite steep in this typeface) used to mark
accented short vowels.
Both can be seen in the fifth line of the scan at
http://lexicon.ff.cuni.cz/png/oe_bosworthtoller/b0002.png

What is its appropriate Unicode representation?
As a lumper, I would use a macron, but I wonder what a splitter would
say.


mildly OT from bidi - curious email

2019-02-06 Thread Julian Bradfield via Unicode
The current bidi discussion prompts me to post a curiosity I received
today.

I ordered something from a (UK) company, and the payment receipt came
via Stripe. So far, so common. The curious thing is that the (entirely
ASCII) company name was enclosed in a left-to-right direction, thus:

Subject: Your Aaa Ltd receipt [#-]

where  and  are the bidi control characters.

I don't think I've seen this before - I wonder why it happened?

Also today I got an otherwise ASCII message where every paragraph
started with BOM (or ZWNBSP as my font prefers to call it). I see from
the web that people used to do this - anybody know what the most
common software packages that do it are?



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: Ancient Greek apostrophe marking elision

2019-01-27 Thread Julian Bradfield via Unicode
On 2019-01-27, Michael Everson via Unicode  wrote:
> On 27 Jan 2019, at 05:21, Richard Wordingham 
>  wrote:
>> The closing single inverted comma has a different origin to the apostrophe.
> No, it doesn’t, but you are welcome to try to prove your assertion. 

As far as I can tell from the easily accessible literature, the
apostrophe derives from an in-line manuscript mark that is a point
with a tail, while the quotation marks derive from a marginal mark
shaped like an arrowhead (like modern guillemets). What is your story
about them?

>> Is someone going to tell me there is an advantage in treating "men's” as one 
>> word but "dogs'" as two?  As I've said, the argument for encoding English 
>> apostrophes as U+2019 is that even with adequate keyboards, users cannot be 
>> relied upon to distinguish U+02BC and U+2019 - especially with no feedback. 
>> A writing system should choose one and stick with it.  User unreliability 
>> forces a compromise.
>
> Polynesian users need to 02BC to be visually distinguished from 2019. 
> European users don’t need the apostrophe to be visually distinguished from 
> 2019. The edge case of “dogs’” doesn’t convince me. In all my years of 
> typesetting I have never once noticed this, much less considered it a problem 
> that needed fixing.

You have a very low opinion of Polynesian users. People (as opposed to
computers) use context to remove ambiguity. Before we had to interact
with pedantic computers, we were rarely confused by the typewriter-induced
confusion of 1 and l and 0 and O (or, indeed, the use of symmetrical
quotation marks).
Now a sensible orthographic choice for a language using comma-like
letters would be to use guillemets for quotation, and while I don't
know (there being precious few modern Polynesian materials online), I
would guess that the languages of French Polynesia do that.
If, like Hawaiian, you're stuck with English-style quotation marks for
historical reasons, an obvious typographic solution is to thin-space
them, French-style. (See previous thread!). That seems visually
preferable to relying on a small difference in size of what is already
a small letter compared to everything else on the page.


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: Encoding italic

2019-01-21 Thread Julian Bradfield via Unicode
On 2019-01-21, James Kass via Unicode  wrote:
> Consider superscript/subscript digits as a similar styling issue. The 
> Wikipedia page for Romanization of Chinese includes information about 
> the Wade-Giles system’s tone marks, which are superscripted digits.
>
> https://en.wikipedia.org/wiki/Romanization_of_Chinese
>
> Copy/pasting an example from the page into plain-text results in “ma1, 
> ma2, ma3, ma4”, although the web page displays the letters as italic and 
> the digits as (italic) superscripts.  IMO, that’s simply wrong with 
> respect to the superscript digits and suboptimal with respect to the 
> italic letters.

Wade-Giles (which should be written with an en-dash, not a hyphen, if
we're going to be fussy - as indeed Wikipedia is) is obsolete, but one
could say the same about pinyin. However, printed pinyin with tones
almost invariably uses the combining diacritics; in email where most people
can't be bothered to write diacritics, tone numbers are written just
as you have written above, with a following ascii digit. (With the
proviso that Chinese speakers don't usually write tones at all when
they write in pinyin.) They're often written like that even in web
pages, where superscripts would be easy - see Victor Mair's frequent
Language Log posts about Chinese writing and printing.
This seems significantly less wrong to me that writing H2SO4 for
H2SO4 which is also common in plain text...


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-15 Thread Julian Bradfield via Unicode
On 2019-01-15, Philippe Verdy via Unicode  wrote:
> This is not for Mongolian and French wanted this space since long and it
> has a use even in English since centuries for fine typography.
> So no, NNBSP is definitely NOT "exotic whitespace". It's just that it was
> forgotten in the early stages of computing with legacy 8-bit encodings but
> it should have been in Unicode since the begining as its existence is
> proven long before the computing age (before ASCII, or even before Baudot
> and telegraphic systems). It has alsway been used by typographs, it has
> centuries of tradition in publishing. And it has always been recommended
> and still today for French for all books/papers publishers.

Do you expect people to encode all the variable justification spaces
between words by combining all the (numerous) spaces already available
in Unicode?
And how about the kerning between letters? If spacing of punctuation
is to be encoded instead of left to display algorithms, shouldn't you
also encode the kerns instead of leaving them to the font display
technology?

Oh, and what about dropped initials? They have been used in both
manuscripts and typography for many centuries - surely we must encode
them?

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-14 Thread Julian Bradfield via Unicode
On 2019-01-14, James Kass via Unicode  wrote:
> Julian Bradfield wrote,
> > I have never seen a Unicode math alphabet character in email
> > outside this list.
>
> It's being done though.  Check this message from 2013 which includes the 
> following, copy/pasted from the web page into Notepad:
>
> 혗혈혙혛 혖혍 헔햳햮헭.향햱햠햬햤햶햮햱햪  © ퟮퟬퟭퟯ 햠햫햤햷 햦햱햠햸  
> 헀헂헍헁헎햻.햼허헆/헺헿헮헹헲혅헴헿헮혆
>
> https://apple.stackexchange.com/questions/104159/what-are-these-characters-and-how-can-i-use-them

Which makes the point very nicely. They're not being *used* to do maths,
they're being played with for purely decorative purposes, and moreover
in a way which breaks the actual intended use as a URL.
If you introduce random stuff into Unicode, people will play with it
(or use it for phishing).
The whole thread is, as it says, "what is this weird stuff"?

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-13 Thread Julian Bradfield via Unicode
On 2019-01-13, James Kass via Unicode  wrote:
> यदि आप किसी रोटरी फोन से कॉल कर रहे हैं, तो कृपया स्टार (*) दबाएं।

> What happens with Devanagari text?  Should the user community refrain 
> from interchanging data because 1980s era software isn't Unicode aware?

Devanagari is an established writing system (which also doesn't need
separate letters for different typefaces). Those who wish to exchange
information in devanagari will use either an ISCII or Unicode system
with suitable font support.
Just as those who wish to exchange English text with typographic
detail will use a suitable typographic mark-up system with font
support, which will typically not interfere with plain text searching.
Even in a PDF document, "art nouveau" will appear as "art nouveau"
whatever font it's in.

Incidentally, a large chunk of my facebook feed is Indian politics,
and of that portion of it that is in Hindi or other Indian
languages, most is still written in ASCII transcription, even though
every web browser and social media application in common use surely
has full Unicode support these days. Sometimes using your own writing
system is just too much effort!

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-13 Thread Julian Bradfield via Unicode
On 2019-01-14, James Kass via Unicode  wrote:
> 퐴푟푡 푛표푢푣푒푎푢 seems a bit 푝푎푠푠é nowadays, as well.
>
> (Had to use mark-up for that “span” of a single letter in order to 
> indicate the proper letter form.  But the plain-text display looks crazy 
> with that HTML jive in it.)

Indeed. But
 _Art nouveau_ seems a bit _passé_ nowadays
looks fine and is understood even by those who have never annotated a
manuscript with proof corrections.


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-13 Thread Julian Bradfield via Unicode
On 2019-01-13, Marcel Schneider via Unicode  wrote:
> As far as the information goes that was running until now on this List,
> Mathematicians are both using TeX and liking the Unicode math alphabets.

As Khaled has said, if they use them, it's because some software
designer has decided to use them to implement markup.
I have never seen a Unicode math alphabet character in email outside
this list.

> These statements make me fear that the font you are using might unsupport
> the NARROW NO-BREAK SPACE U+202F > <. If you see a question mark between

It displays as a space. As one would expect - I use fixed width fonts
for plain text.

> these pointy brackets, please let us know. Because then, You’re unable to
> read interoperably usable French text, too, as you’ll see double punctuation
> (eg "?!") where a single mark is intended, like here !

I see "like here !".
French text does not need narrow spacing any more than science does.
When doing typography, fifty centimetres is $50\thinspace\mathrm{cm}$;
in plain text, 50cm does just fine.
Likewise, normal French people writing email write "Quel idiot!", or
sometimes "Quel idiot !".

If you google that phrase on a few French websites, you'll see that
some (such as Larousse, whom one might expect to care about such
things) use no space before punctuation, while others (such as some
random T-shirt company) use an ASCII space.

The Académie Française, which by definition knows more about French
orthography than you do, uses full ASCII spaces before ? and ! on its
front page. Also after opening guillemets, which looks even more
stupid from an Anglophone perspective.

> Aiming at extending the subset of environments supporting correct typesetting

There are many fine programs, including TeX, for doing good
typesetting. Unicode is not about typesetting, it's about information
exchange and preservation.


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-13 Thread Julian Bradfield via Unicode
On 2019-01-12, James Kass via Unicode  wrote:
> This is a math formula:
> a + b = b + a
> ... where the estimable "mathematician" used Latin letters from ASCII as 
> though they were math alphanumerics variables.

Yup, and it's immediately understandable by anyone reading on any
computer that understands ASCII.  That's why mathematicians write like
that in plain text.

> This is an italicized word:
> 푘푎푘푖푠푡표푐푟푎푐푦
> ... where the "geek" hacker used Latin italics letters from the math 
> alphanumeric range as though they were Latin italics letters.

It's a sequence of question marks unless you have an up to date
Unicode font set up (which, as it happens, I don't for the terminal in
which I read this mailing list). Since actual mathematicians don't use
the Unicode math alphabets, there's no strong incentive to get updated
fonts.

> Where's the harm?

You lose your audience for no reasons other than technogeekery. 


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-13 Thread Julian Bradfield via Unicode
On 2019-01-12, Richard Wordingham via Unicode  wrote:
> On Sat, 12 Jan 2019 10:57:26 + (GMT)
> Julian Bradfield via Unicode  wrote:
>
>> It's also fundamentally misguided. When I _italicize_ a word, I am
>> writing a word composed of (plain old) letters, and then styling the
>> word; I am not composing a new and different word ("_italicize_") that
>> is distinct from the old word ("italicize") by virtue of being made up
>> of different letters.
>
> And what happens when you capitalise a word for emphasis or to begin a
> sentence?  Is it no longer the same word?

Indeed. As has been observed up-thread, the casing idea is a dumb one!
We are, however, stuck with it because of legacy encoding transported
into Unicode. We aren't stuck with encoding fonts into Unicode.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-13 Thread Julian Bradfield via Unicode
On 2019-01-12, James Kass via Unicode  wrote:

> Sounds like you didn't try it.  VS characters are default ignorable.

By software that has a full understanding of Unicode. There is a very
large world out there of software that was written before Unicode was
dreamed of, let alone popular.

> apricot
> a︁p︁r︁i︁c︁o︁t︁
> Notepad finds them both if you type the word "apricot" into the search box.

What has Notepad to do with me?

> "But for plain text, it's crazy."
>
> Are you a member of the plain-text user community?

Certainly:)

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A last missing link for interoperable representation

2019-01-12 Thread Julian Bradfield via Unicode
On 2019-01-11, James Kass via Unicode  wrote:
> Exactly.  William Overington has already posted a proof-of-concept here:
> https://forum.high-logic.com/viewtopic.php?f=10=7831
> ... using a P.U.A. character /in lieu/ of a combining formatting or VS 
> character.  The concept is straightforward and works properly with 
> existing technology.

It does not work with much existing technology. Interspersing extra
codepoints into what is otherwise plain text breaks all the existing
software that has not been, and never will be updated to deal with
arbitrarily complex algorithms required to do Unicode searching.
Somebody who need to search exotic East Asian text will know that they
need software that understands VS, but a plain ordinary language user
is unlikely to have any idea that VS exist, or that their searches
will mysteriously fail if they use this snazzy new pseudo-plain-text
italicization technique

It's also fundamentally misguided. When I _italicize_ a word, I am
writing a word composed of (plain old) letters, and then styling the
word; I am not composing a new and different word ("_italicize_") that
is distinct from the old word ("italicize") by virtue of being made up
of different letters.

I think the VS or combining format character approach *would* have
been a better way to deal with the mess of mathematical alphabets,
because for mathematicians, *b* is a distinct symbol from b, and while
there may be correlated use of alphabets, there need be no connection
whatever between something notated b and something notated *b*.

But for plain text, it's crazy.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A sign/abbreviation for "magister"

2018-11-02 Thread Julian Bradfield via Unicode
On 2018-11-02, James Kass via Unicode  wrote:
> Alphabetic script users write things the way they are spelled and spell 
> things the way they are written.  The abbreviation in question as 
> written consists of three recognizable symbols.  An "M", a superscript 
> "r", and an equal sign (= two lines).  It can be printed, handwritten, 

That's not true. The squiggle under the r is a squiggle - it is a
matter of interpretation (on which there was some discussion a hundred
messages up-thread or so :) whether it was intended to be = .
Just as it is a matter of interpretation whether the superscript and
squiggle were deeply meaningful to the writer, or whether they were
just a stylistic flourish for Mr.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A sign/abbreviation for "magister"

2018-10-31 Thread Julian Bradfield via Unicode
On 2018-10-31, Marcel Schneider via Unicode  wrote:

> Preformatted Unicode superscript small letters are meeting the French 
> superscript 
> requirement, that is found in:
> http://www.academie-francaise.fr/abreviations-des-adjectifs-numeraux
> (in French). This brief article focuses on the spelling of the indicators, 
> without questioning the fact that they are superscript.

When one does question the Académie about the fact, this is their
reply:

 Le fait de placer en exposant ces mentions est de convention
 typographique ; il convient donc de le faire. Les seules exceptions
 sont pour Mme et Mlle.

which, if my understanding of "convient" is correct, carefully does
quite say that it is *wrong* not to superscript, but that one should
superscript when one can because that is the convention in typography.

My original question was:

 Dans les imprimés ou dans le manuscrit on écrit "1er, 
45e"
 etc. (J'utilise l'indication HTML pour les lettres supérieures.)

 La question est: est-ce que les lettres supérieures sont
 *obligatoires*, ou sont-ils simplement une question de style? C'est à
 dire, si on écrit "1er, 45e" etc., est-ce une erreur, ou un style
 simple mais correct? 

I did not think that their Dictionary desk would understand the
concept of plain text, so I didn't ask explicitly for their opinions
on encoding :)

Which takes us back to when typography is plain text...

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: second attempt (was: A sign/abbreviation for "magister")

2018-10-31 Thread Julian Bradfield via Unicode
On 2018-10-31, Janusz S. =?utf-8?Q?Bie=C5=84?= via Unicode 
 wrote:
> On Mon, Oct 29 2018 at 12:20 -0700, Doug Ewell via Unicode wrote:

[ as did I in private mail ]

>> The abbreviation in the postcard, rendered in
>> plain text, is "Mr".
>
> The relevant fragment of the postcard in a loose translation is
>
> Use the following address:   ...
>  is the abbreviation of magister.
>
> I don't think your rendering
>
>Mr is the abbreviation of magister.
>
> has the same meaning.

I do, for the reasons stated by many.

If the topic were a study of the ways in which people indicate
abbreviations by typographic or manuscript styling, then it would be
important to know the exact form of the marks; but that is not plain
text. One cannot expect to discuss detailed technical questions using only
plain text, other than by using language to describe the details.

> Please note that I didn't asked *whether* to encode the abbreviation. I
> asked *how* to do it.

Doug and I have argued that the encoding is "Mr". Further detail can be
given in natural language as a note. You could use the various hacks
you've discussed, with modifier letters; but that is not "encoding",
that is "abusing Unicode to do markup". At least, that's the view I
take!

Perhaps a more challenging case is that at one time in English, it was
common to write and print "the" as "ye" (from older
"þe"). Here, there is actually a potential contrast between
the forms "ye" ("the") and "ye" (2nd plural pronoun), and
the contrast could be realized: "the/ye idle braggarts are a curse
upon England". Is the encoding of "ye" to be "ye" or "the"?
A hard-line plain-texter such as myself would probably argue for
"the".








-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A sign/abbreviation for "magister"

2018-10-30 Thread Julian Bradfield via Unicode
On 2018-10-30, Marcel Schneider via Unicode  wrote:
> Dr Bradfield just added on 30/10/2018 at 14:21 something that I didn’t 
> know when replying to Dr Ewell on 29/10/2018 at 21:27:

>> The English abbreviation Mr was also frequently superscripted in the
>> 15th-17th centuries, and that didn't mean anything special either - it
>> was just part of a general convention of superscripting the final
>> segment of abbreviations, probably inherited from manuscript practice.
>
> So English dropped the superscript requirement for common abbreviations 

Who said anything about requirement? I didn't.
The practice of using superscripts to end abbreviations is alive and
well in manuscript - I do it myself in writting notes for myself. For
example, "condition" I will often write as "condn", and
"equation" as "eqn".

> in the 17ᵗʰ or 18ᵗʰ century to keep it only for ordinals. Should Unicode 

What do you mean, for ordinals? If you mean 1st, 2nd etc., then there
is not now (when superscripting looks very old-fashioned) and never
has been any requirement to superscript them, as far as I know -
though since the OED doesn't have an entry for "1st", I can't easily
check.


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: A sign/abbreviation for "magister"

2018-10-30 Thread Julian Bradfield via Unicode
On 2018-10-30, James Kass via Unicode  wrote:
> (Still responding to Ken Whistler's post)

> Do you know the difference between H₂SO₄ and H2SO4?  One of them is a 
> chemical formula, the other one is a license plate number. T̲h̲a̲t̲ is 
> not a stylistic difference /in my book/.  (Emphasis added.)

Yes. In chemical notation, sub/superscripting is semantically
significant.
That's not the case for abbreviations: the choice of Mr or any of its
superscripted and decorated variations is not semantically
significant.
The English abbreviation Mr was also frequently superscripted in the
15th-17th centuries, and that didn't mean anything special either - it
was just part of a general convention of superscripting the final
segment of abbreviations, probably inherited from manuscript practice.

> But suppose both those strings were *intended* to represent the chemical 
> formula?  Then one of them would be optimally correct; the other one... meh.
>
> Now what if we were future historians given the task of encoding both of 
> those strings, from two different sources, and had no idea what those 
> two strings were supposed to represent?  Wouldn't it be best to preserve 
> both strings intact, as they were originally written?

Indeed - and that means an image, not any textual representation. The
typeface might be significant too.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)

2018-08-21 Thread Julian Bradfield via Unicode
On 2018-08-20, Mark E. Shoulson via Unicode  wrote:
> Moreover, they [William's pronoun symbols] are once again an attempt to 
> shoehorn Overington's pet 
> project, "language-independent sentences/words," which are still 
> generally deemed out of scope for Unicode.

I find it increasingly hard to understand why William's project is out
of scope (apart from the "demonstrate use first, then encode"
principle, which is in any case not applied to emoji), when emoji are
language-independent words - or even sentences: the GROWING HEART
emoji is (I presume) supposed to be a language-independent way of
saying "I love you more every day". Which seems rather more
fatuous as a thing to put in a writing-systems standard than the
things I think William would want.

Not that I want to hear any more about William's unmentionables; I
just wish emoji were equally unmentionable.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: Thoughts on Emoji Selection Process

2018-08-11 Thread Julian Bradfield via Unicode
On 2018-08-11, Charlotte Buff via Unicode  wrote:
> There is no semantic difference between a softball and a baseball. They are
> literally the same object, just in slightly different sizes. There isn’t a
> semantic difference between a squirrel and a chipmunk either (mainly
> because they don’t represent anything beyond their own identities just like
> the majority of modern emoji inventions), but at the very least they are
> *different things*.

I think you don't understand the meaning of "semantic", "literally",
or "the same". Which is a pity, because I'm all in sympathy with your
general attitude to emoji and Unicode.

I'm not just being pedantic - I can't even work out what you're
attempting to say in this paragraph.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: 0027, 02BC, 2019, or a new character?

2018-01-27 Thread Julian Bradfield via Unicode
On 2018-01-26, Richard Wordingham via Unicode  wrote:
> Some systems (or admins) have been totally defeated by even the ASCII
> version of ʹO’Sullivanʹ.  That bodes ill for Kazakhs.

The head (about to be ex-head) of my university is Sir Timothy O'Shea.
On the student record system, it is impossible to search for students
called O'Shea (I have one). I suppose it doesn't sanitize correctly -
I haven't tried looking for little Bobby Tables yet. It hadn't
occurred to me to check, but of course searching for O’Shea doesn't
work either, as they usually enter their own names into the initial
record, and use 0027.


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: Counting Devanagari Aksharas

2017-04-22 Thread Julian Bradfield via Unicode
On 2017-04-22, Eli Zaretskii via Unicode  wrote:
>> From: Richard Wordingham via Unicode 
[...]
>> I've encountered the problem that, while at least I can search for
>> text smaller than a cluster, there's no indication in the window of
>> where in the window the text is.
>
> I could imagine Emacs decomposing characters temporarily when only
> part of a cluster matches the search string.  Assuming this would make
> sense to users of some complex scripts, that is.  You are welcome to
> suggest such a feature by using report-emacs-bug.

That's what I do in my emacs with combining characters, and if I had
complex script support, I'd expect the same to happen there.
emacs is a programmer's editor, after all :)

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: Proposal to add standardized variation sequences for chess notation

2017-04-12 Thread Julian Bradfield via Unicode
On 2017-04-12, Philippe Verdy via Unicode  wrote:
> 2017-04-12 8:35 GMT+02:00 Martin J. Dürst :
>> On Go boards, the grid cells are definitely rectangular, not square. The
>> reason for this is that boards are usually looked at at an angle, and
>> having the cells be higher than wide makes them appear (close to) square.
>> However, because diagrams are usually viewed at close to a right angle, Go
>> diagrams use squares, not rectangles.
>
> That's not a valid reason.  "Go" uses **square** cells not **rectangles***
> because of the form of the pieces (round) and the fact they must nearly
> touch each other to surround other pieces.

I don't think Go players and board makers have any interest in your
views of valid reasons.
According to the information provided by various national Go
societies, the typical Japanese Go cell is 22mm by 23.6mm, for the
reason Martin stated.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.