Re: On the lack of a SQUARE TB glyph

2019-09-30 Thread Andre Schappo via Unicode


> On Sep 27, 1 Reiwa, at 08:17, Julian Bradfield via Unicode 
>  wrote:
> 
> Or one could allow IDS to have leaf components that are any
> characters, not just ideographic characters, and then one could have
> all sorts of fun.

I do like this idea.

Note: This is a modified repost as I previously forgot to credit Julian as the 
originator

André Schappo






Re: On the lack of a SQUARE TB glyph

2019-09-29 Thread Andre Schappo via Unicode


> Or one could allow IDS to have leaf components that are any
> characters, not just ideographic characters, and then one could have
> all sorts of fun.

I do like that idea

André Schappo






unicode tweet

2019-05-30 Thread Andre Schappo via Unicode

This tweet made me laugh 
twitter.com/padolsey/status/1133835770773626881
 勞

André Schappo




Re: Correct way to express in English that a string is encoded ... using UTF-8 ... with UTF-8 ... in UTF-8?

2019-05-15 Thread Andre Schappo via Unicode



> On May 15, 31 Heisei, at 12:22 pm, Costello, Roger L. via Unicode 
>  wrote:
> 
> Hello Unicode experts!
> 
> Which is correct:
> 
> (a) The input file contains a string. The string is encoded using UTF-8.
> 
> (b) The input file contains a string. The string is encoded with UTF-8.
> 
> (c) The input file contains a string. The string is encoded in UTF-8.
> 
> (d) Something else (what?)
> 
> /Roger
> 

(d) The input file contains a string which is UTF-8 encoded.

André Schappo




Re: Encoding italic (was: A last missing link)

2019-01-16 Thread Andre Schappo via Unicode



> On 16 Jan 2019, at 09:30, James Kass via Unicode  wrote:
> 
> 
> I wrote,
> 
> > The VS possibility would double the character count of any strings
> > including them.
> 
> A kind list member has pointed out privately that the above is mistaken.  
> Twitter character counts aren't actually character counts.  Each math-alpha 
> counts as two characters as do the VS characters.  So a string with VS 
> characters interspersed would actually be triple rather than double.

Odd! I have just briefly experimented with twitter and it appears that any 
character ≥ U+1100 has a count of 2 and any character < U+1100 has a count of 1.

I remember many years ago twitter was incorrectly counting in UTF-16 encoding 
units thus giving a count of 1 for BMP characters and a count of 2 for astral 
characters. That problem was fixed long ago.

André Schappo

> (I've also been advised that a lot of the math-alpha on Twitter involves 
> fraktur, script, and double struck characters.  As was pointed out to me, 
> that practice would probably continue even if Twitter enabled italic and bold 
> styling as a feature.  Again, I do not personally know how widespread the 
> practice is.)
> 





Re: Translating the standard

2018-03-14 Thread Andre Schappo via Unicode


On 13 Mar 2018, at 02:49, Yifán Wáng via Unicode 
> wrote:

Somewhat digressing from the topic, but I'd like to make some comment
on this part as I smell a persistent myth among some, hopefully small number
of, software engineers in Anglosphere.

First, the fact that computer languages are written using English
words doesn't mean that programmers are supposed to have proportional
English knowledge. Take the word of Matz, the creator of Ruby
language: "The English skill is a super-powerful rare card (in the
career path of a Japanese engineer)!" He then continue that you should
be in keeping with most up-to-date overseas info/trend in order to be
a high-tier engineer and so on. It's far from "requirement".
http://eikaiwa.dmm.com/blog/3826/

I've also read somewhere a memoir of a middle-aged programmer who was
already into BASIC in childhood. One day he thought he'd written off a
"great" program and printed it on paper, but to his surprise, an
auntie who took a look at it immediately decoded the program and
roughly understood what it was meant to do; she knew English, and he
didn't.

Programming as such, is just like a Chinese room replaced with
English, where you sit inside a cramped room night after night,
communicating with a computer by typing in English words the bulky
reference guide teaches you. Most East Asian countries are blessed
enough with a tremendous number of translated technical publications
(e.g. O'Reilly) each year, not to mention firsthand writings in their
own languages. So the documentation is easily available if you don't
speak English the language.

Second, that English is lingua franca doesn't necessarily mean the
English spoken in the wild is. The aviation industry is another field
which employs English as the common language, but they exert utmost
effort to maintain the system working. Namely, they have a controlled
word set with semantics as disambiguated as possible, called
ASD-STE100, for technical documentation, such as maintenance manuals,
to minimize errors caused by limited English knowledge. Unicode, on
the other hand, is merely written in a free style used when English
speakers who (almost) graduated from college write to English speakers
who (almost) graduated from college. Having such level of proficiency
being a non-native speaker isn't something trivial, unless someone is
constantly in contact with English-speaking community. (And
programming community isn't contained inside English-speaking
community at all.)

That said, I agree to almost everything Alastair said after. If I have
to add one more thing, a monolingual writing is usually too tightly
coupled with the language, more than engineers may believe, even if
the writer carefully chose their words to be context-neutral. Thus
it's hard job to say no more and no less than the original text in
another language, especially when exactitude matters. It's one of the
problems prevent from fully automated translation being a thing, I
guess.

Best regards,

Yifan


When it comes to program identifiers, languages such as Chinese has a huge 
advantage as it is a much more compact language than English. So, one can write 
meaningful identifier names with a small number of Chinese characters. In an 
all Chinese development team, producing software for the Chinese market, why 
not have the program identifiers written in Chinese? Or maybe this does happen?

Over the years I have talked with many Chinese students about this, and usually 
they tell me something like: "Our lecturers in China tell us to always use 
English for program identifiers".

I make use of several languages for my program identifiers ➜ 
jsfiddle.net/user/coas/fiddles My use of 
non English languages for program identifiers is somewhat random

André Schappo




Unicode 11.0 and 12.0 Cover Design Art

2018-03-12 Thread Andre Schappo via Unicode

One of my project students has an art gallery as a client ➜ 
surfacegallery.org This gallery is also a focal 
point for a collective of local artists.

This morning I had a project meeting with this student. I suggested that 
surface gallery artists might like to submit entries.

I showed the Unicode character set to the student and she was well impressed. I 
also suggested possible cover design art.

The basic principle of my suggestions was that the artwork should be 
constructed from Unicode characters and only Unicode characters. My suggestions 
included: plants, animals, portraits, cityscape, zoo, farm ...etc... If the 
artists collective use my suggestions then the unicode cover artwork they 
submit will most definitely feature Unicode.

Recent Unicode cover artwork has not featured Unicode (well not in any way that 
I can determine) and I think it should and it should feature it prominently and 
obviously.

I do not know who or how the artwork is judged but I think it would be good if 
members of this list could vote on the submitted cover artwork.

André Schappo






Re: Internationalised Computer Science Exercises

2018-01-29 Thread Andre Schappo via Unicode


On 28 Jan 2018, at 04:12, Richard Wordingham via Unicode 
<unicode@unicode.org<mailto:unicode@unicode.org>> wrote:

On Sat, 27 Jan 2018 14:13:40 -0800
Shervin Afshar <shervinafs...@gmail.com<mailto:shervinafs...@gmail.com>> wrote:

On Mon, Jan 22, 2018 at 2:08 PM, Richard Wordingham via Unicode <
unicode@unicode.org<mailto:unicode@unicode.org>> wrote:

On Mon, 22 Jan 2018 at 16:39:57, Andre Schappo via Unicode <
unicode@unicode.org<mailto:unicode@unicode.org>> wrote:

By way of example, one programming challenge I set to students a
couple of weeks ago involves diacritics. Please see
jsfiddle.net/coas/wda45gLp<http://jsfiddle.net/coas/wda45gLp><https://jsfiddle.net/coas/wda45gLp/>

Did any of them come up with the idea of using traces instead of
strings?

Cor Blimey I am really pleased if the students have even heard of Unicode let 
alone heard of trace monoid

...and I confess, I knew nothing of trace monoid until I read the below 
wikipedia article but then again my ignorance is profound

BTW. these internationalised computer science exercises I have written and am 
writing are not part of any course or module and so are optional. In providing 
such exercises I am hoping to spark an interest in Unicode and 
internationalisation. I wrote a couple more yesterday 
jsfiddle.net/coas/3c7y88ot<http://jsfiddle.net/coas/3c7y88ot> & 
jsfiddle.net/coas/aau8cqaw<https://jsfiddle.net/coas/aau8cqaw>

André Schappo


Care to elaborate? Are you referring to sequence alignment methods?

No, I'm thinking of the trace monoid (see e.g.
https://en.wikipedia.org/wiki/Trace_monoid).  One way of thinking of
strings is as concatenations of the NFD decompositions of their
constituent characters. Then the canonical equivalence classes of these
strings form the trace monoid of indecomposable characters.  The theory
of regular expressions (though you may not think that mathematical
regular expressions matter) extends to trace monoids, with the
disturbing exception that the Kleene star of a regular language is not
necessarily regular.  (The prototypical example is sequences (xy)^n
where x and y are distinct and commute, i.e. xy and yx are canonically
equivalent in Unicode terms.  A Unicode example is the set of strings
composed only of U+0F73 TIBETAN VOWEL SIGN II - there is no FSM that
will recognise canonically equivalent strings).

One consequence of this view is that one has to think of U+1EAD LATIN
SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW (ậ) beinɡ both composed of
the Vietnamese vowel letter U+00E2 LATIN SMALL LETTER A WITH CIRCUMFLEX
(â) and tone mark  U+0323 COMBINING DOT BELOW and also composed of, in
the spirit of Thai ISO 11940 transliteration, of the transliterated Thai
vowel U+1EA1 LATIN SMALL LETTER A WITH DOT BELOW (ạ), corresponding to
U+0E31 THAI CHARACTER MAI HAN-AKAT, and the tone mark U+0302 COMBINING
CIRCUMFLEX ACCENT, corresponding to U+0E49 THAI CHARACTER MAI THO.  (In
ISO 11940 as specified, the tone mark is actually written on the
immediately preceding consonant, not on the vowel.)

Richard.


  
André Schappo
schappo.blogspot.co.uk<https://schappo.blogspot.co.uk>
twitter.com/andreschappo<https://twitter.com/andreschappo>
weibo.com/andreschappo<https://weibo.com/andreschappo>
groups.google.com/forum/#!forum/computer-science-curriculum-internationalization<https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization>







Re: 0027, 02BC, 2019, or a new character?

2018-01-26 Thread Andre Schappo via Unicode
Ah! Yes That is a battle I gave up a long time ago. The database here can only 
handle ASCII. I have stopped trying to get the systems people here to convert 
the database to UTF-8.

A few days ago I asked the systems people if they were going upgrade their MS 
mail server to handle non ASCII email addresses such as my Chinese email 
address. I will not go into details but basically they have no plans to support 
non ASCII email addresses.

Further to my challenge:

Before I set the below challenges to the students I described a possible 
scenario.


Imagine you are responsible for a website with a backend database. This website 
provides financial management for a number of extremely wealthy clients. These 
clients are from many different parts of the world. If you cannot be bothered 
to get their names correct you could easily offend and hence lose clients. Just 
losing one client will be a huge loss in revenue for your company.

My advice is: Learn the correct forms of their names in both the Latin script 
and the native script. Store both forms in your backend database.


André Schappo

On 26 Jan 2018, at 08:49, Shriramana Sharma 
<samj...@gmail.com<mailto:samj...@gmail.com>> wrote:

But your outgoing "From" address doesn't seem to have an accent!?

On 26-Jan-2018 13:58, "Andre Schappo via Unicode" 
<unicode@unicode.org<mailto:unicode@unicode.org>> wrote:

Talking of typing names correctly. Few people bother to type the acute accent 
in André.

This academic year, for the first time ever, I gave the following challenges to 
my web programming class of 143 students. I gave these challenges in the first 
lecture.

①  learn how to write my name correctly on your desktop computers and mobile 
phones
② whenever you email me, ensure you write my name correctly

I am pleased to report that the majority of this class now do type my name 
correctly when emailing me 

André Schappo

On 25 Jan 2018, at 18:48, Mark Davis ☕️ via Unicode 
<unicode@unicode.org<mailto:unicode@unicode.org>> wrote:

My apologies for the typo. There's no excuse for misspelling someone's name 
(especially since I live in Switzerland, and type German every day).

Thanks for calling my attention to it: the doc has been updated.

Mark

Mark

On Thu, Jan 25, 2018 at 4:15 AM, Andrew West via Unicode 
<unicode@unicode.org<mailto:unicode@unicode.org>> wrote:
On 23 January 2018 at 00:55, James Kass via Unicode 
<unicode@unicode.org<mailto:unicode@unicode.org>> wrote:
>
> Regular American users simply don't type umlauts, period.

Not even the president of the Unicode Consortium when referring to
Christoph Päper:

http://www.unicode.org/L2/L2018/18051-emoji-ad-hoc-resp.pdf

Andrew



  
André Schappo
schappo.blogspot.co.uk<https://schappo.blogspot.co.uk/>
twitter.com/andreschappo<https://twitter.com/andreschappo>
weibo.com/andreschappo<https://weibo.com/andreschappo>
groups.google.com/forum/#!forum/computer-science-curriculum-internationalization<https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization>






  
André Schappo
schappo.blogspot.co.uk<https://schappo.blogspot.co.uk>
twitter.com/andreschappo<https://twitter.com/andreschappo>
weibo.com/andreschappo<https://weibo.com/andreschappo>
groups.google.com/forum/#!forum/computer-science-curriculum-internationalization<https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization>







Re: 0027, 02BC, 2019, or a new character?

2018-01-26 Thread Andre Schappo via Unicode

Talking of typing names correctly. Few people bother to type the acute accent 
in André.

This academic year, for the first time ever, I gave the following challenges to 
my web programming class of 143 students. I gave these challenges in the first 
lecture.

①  learn how to write my name correctly on your desktop computers and mobile 
phones
② whenever you email me, ensure you write my name correctly

I am pleased to report that the majority of this class now do type my name 
correctly when emailing me 

André Schappo

On 25 Jan 2018, at 18:48, Mark Davis ☕️ via Unicode 
> wrote:

My apologies for the typo. There's no excuse for misspelling someone's name 
(especially since I live in Switzerland, and type German every day).

Thanks for calling my attention to it: the doc has been updated.

Mark

Mark

On Thu, Jan 25, 2018 at 4:15 AM, Andrew West via Unicode 
> wrote:
On 23 January 2018 at 00:55, James Kass via Unicode 
> wrote:
>
> Regular American users simply don't type umlauts, period.

Not even the president of the Unicode Consortium when referring to
Christoph Päper:

http://www.unicode.org/L2/L2018/18051-emoji-ad-hoc-resp.pdf

Andrew



  
André Schappo
schappo.blogspot.co.uk
twitter.com/andreschappo
weibo.com/andreschappo
groups.google.com/forum/#!forum/computer-science-curriculum-internationalization







Internationalization & Unicode Conference 2018

2018-01-24 Thread Andre Schappo via Unicode

I am thinking that people at Internationalization & Unicode Conference 2018 may 
well be interested in my story and, at times difficult, journey. It has been a 
long journey. Title of my presentation would be "How I Internationalized my 
Computer Science Teaching".

Would any organisation on this list be willing to fund my attendance: travel 
from England, accommodation ...etc...

Alternatively, can you please point me to a funding body to which I can apply.

Thank you

André Schappo




Internationalised Computer Science Exercises

2018-01-22 Thread Andre Schappo via Unicode

I continue my endeavours to get Unicode and Internationalisation into/onto (I 
am not sure which is correct) University and School Curricula. Here is another 
of my endeavours

Yesterday, I drafted a final year student project specification for the 
2018/2019 academic year. These projects will start in October but students will 
be choosing their project some time around June. The project involves producing 
a set of internationalised Computer Science exercises for both educators and 
students. Details at 
schappo.blogspot.co.uk/2018/01/computer-science-internationalization_21.html

I am confident that more than one student will choose this project.

By way of example, one programming challenge I set to students a couple of 
weeks ago involves diacritics. Please see 
jsfiddle.net/coas/wda45gLp

There is huge potential for some really interesting and challenging Unicode 
exercises. If you have any suggestions for such exercises they would be most 
welcome. Email me direct or share on this list.

TIA

André Schappo



Re: 0027, 02BC, 2019, or a new character?

2018-01-18 Thread Andre Schappo via Unicode


On 18 Jan 2018, at 08:21, Andre Schappo via Unicode 
<unicode@unicode.org<mailto:unicode@unicode.org>> wrote:



On 16 Jan 2018, at 08:00, Richard Wordingham via Unicode 
<unicode@unicode.org<mailto:unicode@unicode.org>> wrote:

On Mon, 15 Jan 2018 20:16:21 -0800
James Kass via Unicode <unicode@unicode.org<mailto:unicode@unicode.org>> wrote:

It will probably be the ASCII apostrophe.  The stated intent favors
the apostrophe over diacritics or special characters to ensure that
the language can be input to computers with standard keyboards.

Typing U+0027 into a word processor takes planning.  Of the three, it
should obviously be the modifier letter U+02BC, but I think what gets
stored will be U+0027 or the single quotation mark U+2019.

However, we shouldn't overlook the diacritic mark U+0315 COMBINING COMMA
ABOVE RIGHT.

Richard.

I have just tested twitter hashtags and as one would expect, U+02BC does not 
break hashtags. See 
twitter.com/andreschappo/status/953903964722024448<http://twitter.com/andreschappo/status/953903964722024448>


I have done a bit more investigation and as a result have written a short blog 
article ➜ 
schappo.blogspot.co.uk/2018/01/computer-science-internationalization_18.html<http://schappo.blogspot.co.uk/2018/01/computer-science-internationalization_18.html>
<https://schappo.blogspot.co.uk/2018/01/computer-science-internationalization_18.html>

André Schappo



Re: 0027, 02BC, 2019, or a new character?

2018-01-18 Thread Andre Schappo via Unicode


On 18 Jan 2018, at 08:21, Andre Schappo via Unicode 
<unicode@unicode.org<mailto:unicode@unicode.org>> wrote:



On 16 Jan 2018, at 08:00, Richard Wordingham via Unicode 
<unicode@unicode.org<mailto:unicode@unicode.org>> wrote:

On Mon, 15 Jan 2018 20:16:21 -0800
James Kass via Unicode <unicode@unicode.org<mailto:unicode@unicode.org>> wrote:

It will probably be the ASCII apostrophe.  The stated intent favors
the apostrophe over diacritics or special characters to ensure that
the language can be input to computers with standard keyboards.

Typing U+0027 into a word processor takes planning.  Of the three, it
should obviously be the modifier letter U+02BC, but I think what gets
stored will be U+0027 or the single quotation mark U+2019.

However, we shouldn't overlook the diacritic mark U+0315 COMBINING COMMA
ABOVE RIGHT.

Richard.

I have just tested twitter hashtags and as one would expect, U+02BC does not 
break hashtags. See 
twitter.com/andreschappo/status/953903964722024448<http://twitter.com/andreschappo/status/953903964722024448>


...and, just in case 
twitter.com/andreschappo/status/953944089896083456<http://twitter.com/andreschappo/status/953944089896083456>
<https://twitter.com/andreschappo/status/953944089896083456>

André Schappo



Re: 0027, 02BC, 2019, or a new character?

2018-01-18 Thread Andre Schappo via Unicode


On 16 Jan 2018, at 08:00, Richard Wordingham via Unicode 
> wrote:

On Mon, 15 Jan 2018 20:16:21 -0800
James Kass via Unicode > wrote:

It will probably be the ASCII apostrophe.  The stated intent favors
the apostrophe over diacritics or special characters to ensure that
the language can be input to computers with standard keyboards.

Typing U+0027 into a word processor takes planning.  Of the three, it
should obviously be the modifier letter U+02BC, but I think what gets
stored will be U+0027 or the single quotation mark U+2019.

However, we shouldn't overlook the diacritic mark U+0315 COMBINING COMMA
ABOVE RIGHT.

Richard.

I have just tested twitter hashtags and as one would expect, U+02BC does not 
break hashtags. See 
twitter.com/andreschappo/status/953903964722024448

André Schappo







Re: Possible bug in formal grammar for extended grapheme cluster

2017-12-18 Thread Andre Schappo via Unicode
Ah! That explains why

pcre2grep -u '^\X{1}$'

matches with






...etc...

André Schappo

On 17 Dec 2017, at 17:17, Mark Davis ☕️ via Unicode 
> wrote:

Thanks for the feedback. You're correct about this; that is a holdover from an 
earlier version of the document when there was a more basic treatment of RI 
sequences.

There is already an action to modify these. There is a placeholder review note 
about that just above

http://www.unicode.org/reports/tr29/proposed.html#Table_Combining_Char_Sequences_and_Grapheme_Clusters

(scroll up just a bit).

Mark

Mark

On Sun, Dec 17, 2017 at 4:16 PM, David P. Kendal via Unicode 
> wrote:
Hi,

It’s possible I’m missing something, but the formal grammar/regular
expression given for extended grapheme clusters appears to have a bug
in it.


The bug is here:

RI-Sequence := Regional_Indicator+

If the formal grammar is intended to exactly match the rules given the
the “Grapheme Cluster Boundary Rules” section below it as-is, then
this should be

RI-Sequence := Regional_Indicator Regional_Indicator

since as given it would cause any number of RI characters to coalesce
into a single grapheme cluster, instead of pairs of characters. That
is, the text U+1F1EC U+1F1E7 U+1F1EA U+1F1FA would represent one
grapheme cluster instead of the correct two.

--
dpk (David P. Kendal) · Nassauische Str. 36, 10717 DE · http://dpk.io/
   we do these things not because they are easy,  +49 159 
03847809
  but because we thought they were going to be easy
  — ‘The Programmers’ Credo’, Maciej Cegłowski




  
André Schappo
https://schappo.blogspot.co.uk
https://twitter.com/andreschappo
https://weibo.com/andreschappo
https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization







Re: ASCII v Unicode

2017-11-04 Thread Andre Schappo via Unicode
We now have a literal number for ASCII which is 31 pages 
https://twitter.com/srl295/status/926530928171671552

André Schappo

On 3 Nov 2017, at 15:45, Asmus Freytag (c) 
> wrote:

On 11/3/2017 8:29 AM, Rick McGowan wrote:
The 10.0 chart PDF is 2570 pages.

On 11/3/2017 2:36 AM, Asmus Freytag via Unicode wrote:
single file code charts in a while, but I believe you get at least that number 
again.


PS:  @Andre: update to my last message: 1,500 Core, 2570+ Charts, and, say 430, 
for the UAXs would make 4,500 pages. Off by a factor 9 from your initial value, 
but not quite "zillions". :)

  
André Schappo
https://schappo.blogspot.co.uk
https://twitter.com/andreschappo
https://weibo.com/andreschappo
https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization







Re: ASCII v Unicode

2017-11-03 Thread Andre Schappo via Unicode

h I think the only way we can resolve this "X page Unicode book" issue 
is to recruit an infinite number of monkeys  。。️

André Schappo

On 3 Nov 2017, at 12:50, Phake Nick 
<c933...@gmail.com<mailto:c933...@gmail.com>> wrote:

The entire Unicode can also be printed onto a single page if you use a very 
huge paper coupled with smaller font size! ​I think a football field sized 
paper could possibly do the job?

2017-11-03 19:29 GMT+08:00 Andre Schappo via Unicode 
<unicode@unicode.org<mailto:unicode@unicode.org>>:

On 3 Nov 2017, at 09:36, Asmus Freytag via Unicode 
<unicode@unicode.org<mailto:unicode@unicode.org>> wrote:

On 11/3/2017 2:13 AM, Andre Schappo via Unicode wrote:

You may find https://twitter.com/andreschappo/status/926163719331176450 amusing 


André Schappo


You're wildly off in your page count.

The "book" part of Unicode (Core Specification) alone is 1,500 pages. I haven't 
looked at the single file code charts in a while, but I believe you get at 
least that number again. Then add the dozen or so "Annexes" for a few hundred 
additional pages and be happy that nobody prints the Unicode Character Database 
(or the Unihan Database for that matter).

A./

Yes, I agree, my page count is much lower than it should be for Unicode, if I 
was being literal. I was being figurative rather than literal. I was just 
making a point to the ASCII developers/programmers and ASCII Academics 

Prior to tweeting I did consider other numbers. My considerations included 
1000, 5000 and 1. But in my mind "Unicode is a 500 page book" seemed to 
flow better. I don't know why.

Actually, it probably for the best that I wrote "500 page" because otherwise 
ASCII developers/programmers and ASCII Academics would not even start reading 
the Unicode book if they thought it was (say) 5000 pages long.

Let's now look at it literally and here is a template "Unicode is a X page 
book".

My guess would be "Unicode is a 1+ page book"

Anyone care to estimate X?

André Schappo




  
André Schappo
https://schappo.blogspot.co.uk
https://twitter.com/andreschappo
https://weibo.com/andreschappo
https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization







Re: ASCII v Unicode

2017-11-03 Thread Andre Schappo via Unicode

On 3 Nov 2017, at 09:36, Asmus Freytag via Unicode 
<unicode@unicode.org<mailto:unicode@unicode.org>> wrote:

On 11/3/2017 2:13 AM, Andre Schappo via Unicode wrote:

You may find https://twitter.com/andreschappo/status/926163719331176450 amusing 


André Schappo


You're wildly off in your page count.

The "book" part of Unicode (Core Specification) alone is 1,500 pages. I haven't 
looked at the single file code charts in a while, but I believe you get at 
least that number again. Then add the dozen or so "Annexes" for a few hundred 
additional pages and be happy that nobody prints the Unicode Character Database 
(or the Unihan Database for that matter).

A./

Yes, I agree, my page count is much lower than it should be for Unicode, if I 
was being literal. I was being figurative rather than literal. I was just 
making a point to the ASCII developers/programmers and ASCII Academics 

Prior to tweeting I did consider other numbers. My considerations included 
1000, 5000 and 1. But in my mind "Unicode is a 500 page book" seemed to 
flow better. I don't know why.

Actually, it probably for the best that I wrote "500 page" because otherwise 
ASCII developers/programmers and ASCII Academics would not even start reading 
the Unicode book if they thought it was (say) 5000 pages long.

Let's now look at it literally and here is a template "Unicode is a X page 
book".

My guess would be "Unicode is a 1+ page book"

Anyone care to estimate X?

André Schappo




ASCII v Unicode

2017-11-03 Thread Andre Schappo via Unicode

You may find https://twitter.com/andreschappo/status/926163719331176450 amusing 


André Schappo



Re: Emoji anomaly

2017-10-29 Thread Andre Schappo via Unicode
Peter

Thank you very much for your informative response. I see that U+1F321 ➜ U+1F32C 
do not have Emoji_Presentation property set. Time for me to do some reading to 
determine why.

André

On 29 Oct 2017, at 00:20, Peter Edberg 
<pedb...@unicode.org<mailto:pedb...@unicode.org>> wrote:

This is about characters U+1F327,U+1F326

The variation selector FE0F is *not* unnecessary in with these. Looking at
https://www.unicode.org/Public/emoji/5.0/emoji-data.txt
those characters do *not* have the Emoji-Presentation property set, and they do 
have variation sequences defined.

From https://www.unicode.org/reports/tr51/#Emoji_Variation_Selector_Notes, such 
singleton emoji characters
“should have emoji presentation selectors on base characters with 
Emoji_Presentation=No whenever an emoji presentation is desired”

- Peter E

On Oct 28, 2017, at 4:11 AM, Andre Schappo via Unicode 
<unicode@unicode.org<mailto:unicode@unicode.org>> wrote:


I am working on a Blog Article ( 
https://schappo.blogspot.co.uk/2017/10/computer-science-internationalization.html
 ) and do not currently have access to OSX High Sierra, I am using OSX Sierra. 
I would appreciate some help from someone using OSX High Sierra.

Using Sierra's Chinese Simplified Input Method the Emoji ️ and ️ have an 
unnecessary U+FE0F variation selector appended. The other Emoji I have tested 
with Sierra's Chinese Simplified Input Method do not have the variation 
selector appended. Could someone please check if the same happens with High 
Sierra

Thank you

André
  
André Schappo
https://schappo.blogspot.co.uk<https://schappo.blogspot.co.uk/>
https://twitter.com/andreschappo
https://weibo.com/andreschappo
https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization








Emoji anomaly

2017-10-28 Thread Andre Schappo via Unicode

I am working on a Blog Article ( 
https://schappo.blogspot.co.uk/2017/10/computer-science-internationalization.html
 ) and do not currently have access to OSX High Sierra, I am using OSX Sierra. 
I would appreciate some help from someone using OSX High Sierra.

Using Sierra's Chinese Simplified Input Method the Emoji ️ and ️ have an 
unnecessary U+FE0F variation selector appended. The other Emoji I have tested 
with Sierra's Chinese Simplified Input Method do not have the variation 
selector appended. Could someone please check if the same happens with High 
Sierra

Thank you

André
  
André Schappo
https://schappo.blogspot.co.uk
https://twitter.com/andreschappo
https://weibo.com/andreschappo
https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization







Re: Unicode education in Schools

2017-08-24 Thread Andre Schappo via Unicode

Because there are many systems that can now handle BMP characters but not 
cannot handle SMP characters.

One example being systems that use mysql utf8 (3 byte encoding) and have not 
yet updated to utf8mb4 (4 byte encoding)

So, I consider it important to familiarise students with SMP characters as well 
as BMP characters. Then when they develop software they will, at the start, be 
thinking beyond ASCII and Unicode BMP characters.

André Schappo

> On 24 Aug 2017, at 17:45, Shriramana Sharma  wrote:
> 
> So how do you think it matters if the characters are in the BMP or SMP?




Unicode education in Schools

2017-08-24 Thread Andre Schappo via Unicode
I came across this School Unicode exercise 
https://community.computingatschool.org.uk/resources/4546 The exercise concerns 
Emoji but to me the important point is that the schoolchildren are having to 
think about SMP characters. I do not know if schools gives an explanation of 
the BMP and SMP planes.

Slowly, Unicode is making it's way into School Curricula

André Schappo



Unicode 10 Cover Art

2017-08-21 Thread Andre Schappo via Unicode
Unicode 10.0 Cover Design Art

Were there entries and, if yes, which won?

André Schappo





Re: Should U+3248 ... U+324F be wide characters?

2017-08-18 Thread Andre Schappo via Unicode

On 18 Aug 2017, at 00:50, Philippe Verdy via Unicode 
> wrote:


2017-08-17 18:46 GMT+02:00 Asmus Freytag (c) via Unicode 
>:
On 8/17/2017 7:47 AM, Philippe Verdy wrote:
2017-08-17 16:24 GMT+02:00 Mike FABIAN via Unicode 
>:
Asmus Freytag via Unicode > 
さんはかきました:
Most emoji now have "W", for example:

1F600..1F64F;W   # So[80] GRINNING FACE..PERSON WITH FOLDED HANDS

That seems correct because emoji behave more like Ideographs.

Isn’t this the same for “CIRCLED NUMBER TEN ON BLACK SQUARE”?
This seems to me also more like an Ideograph.

Not really. They have existed since extremely long without being bound to 
ideographs or sinographic requirements on metrics. Notably their baseline and 
vertical extension do not follow the sinographic em-square layout convention 
(except when they are rendered with CJK fonts, or were encoded in documents 
with legacy CJK encodings, also rendered with suitable CJK fonts being then 
prefered to Latin fonts which won't use the large siongraphic metrics).

If they were like emojis, they would actually be larger : I think it is a case 
for definining a Emoji-variant for them (where they could also be colored or 
have some 3D-like look)

There's an emoji variant for the standard digits.

Do you speak about circled numbers ? I don't think so.

I (and Mike as well to which I was replying) was speaking about a good case for 
defining emoji variant of these circled (or squared) numbers (Mike spoke about 
circled number 10, which is not encoded as an emoji and not even as an 
ideograph, and that he proposed to give a wide width property like ideographs).



Are not CJK ideographs both (W)ide and (S)quare? Does (W)ide imply or define 
that the ideograph should also be (S)quare?

It seems to me that there are many characters that are both (W)ide and (S)quare 
eg emoji

André Schappo



Re: Unicode education in UK Schools

2017-07-08 Thread Andre Schappo via Unicode
Interesting. Thanks Asmus.

So what of other countries? Anyone on this list from China, Japan, Korea, 
Russia, Thailand ...etc... What is the situation in your countries with respect 
to Unicode education in your country's Schools, Colleges and Universities?

TIA

André Schappo

> On 7 Jul 2017, at 19:45, Asmus Freytag via Unicode <unicode@unicode.org> 
> wrote:
> 
> I performed a quick search "Informatik und Unicode" to see whether I could 
> find documents from German academic institutions discussing Unicode in the 
> context of computer science (Informatik).
> 
> Among the first page of search results I found a number of summaries and 
> presentations that may have been (or possibly are) usable as introductory 
> lectures.
> 
> One item looked like it could have been intended as source material for 
> secondary schools rather than for use in the University.
> 
> I also checked whether there are accessible homework assignments that mention 
> Unicode ("Hausaufgabe Unicode"). I didn't go very deep, but it seems that 
> it's not untypical to relegate Unicode to a sidebar, explaining the "\u" 
> notation and mentioning that you get ASCII if you set the upper byte to 0 (in 
> a UTF-16 string, as supported by Java etc.).
> 
> I've not (yet) located any assignments that try to address any of the 
> "tricky" issues in the use of Unicode.
> 
> A./
> 
> 
> On 7/7/2017 2:02 AM, Andre Schappo via Unicode wrote:
>> 
>> There is some evidence that Unicode is now being introduced to Computer 
>> Science pupils in UK Schools. Hove Park School give a summary of their 
>> Computer Science curriculum for Years 8 and 9 
>> http://www.hovepark.brighton-hove.sch.uk/department/computer-science
>> 
>> From Year 9 curriculum summary:  "• Students code text into binary using 
>> ASCII and understand the limitations of this and the need for Unicode"
>> 
>> I think it unlikely they give much coverage of Unicode at Hove Park School 
>> but it is a promising start. Personally I am much encouraged, as Computer 
>> Science education in the UK, at all levels, continues to be dominated by 
>> ASCII.
>> 
>> …and…
>> 
>> as part of my continuing endeavours to get Computer Science/IT/ICT 
>> Internationalization on the School/College/University curricula I recently 
>> setup a google discussion forum 
>> https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization
>>  
>> <https://groups.google.com/forum/#%21forum/computer-science-curriculum-internationalization>
>>  If you know of any academics who might be interested please do let them 
>> know of this new forum. Unicode is, of course, a fundamental building block 
>> for internationalization and so should feature prominently in Computer 
>> Science teaching, at all levels.
>> 
>> André Schappo
>> 
> 




Unicode education in UK Schools

2017-07-07 Thread Andre Schappo via Unicode

There is some evidence that Unicode is now being introduced to Computer Science 
pupils in UK Schools. Hove Park School give a summary of their Computer Science 
curriculum for Years 8 and 9 
http://www.hovepark.brighton-hove.sch.uk/department/computer-science

From Year 9 curriculum summary:  "• Students code text into binary using ASCII 
and understand the limitations of this and the need for Unicode"

I think it unlikely they give much coverage of Unicode at Hove Park School but 
it is a promising start. Personally I am much encouraged, as Computer Science 
education in the UK, at all levels, continues to be dominated by ASCII.

…and…

as part of my continuing endeavours to get Computer Science/IT/ICT 
Internationalization on the School/College/University curricula I recently 
setup a google discussion forum 
https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization
 If you know of any academics who might be interested please do let them know 
of this new forum. Unicode is, of course, a fundamental building block for 
internationalization and so should feature prominently in Computer Science 
teaching, at all levels.

André Schappo