subject:"Bit arithmetic on Unicode characters\?"

Re: Bit arithmetic on Unicode characters?

2016-10-09 Thread David Starner

On Sun, Oct 9, 2016 at 4:03 AM Mark Davis ☕️  wrote:

> Essentially all of the game pieces that are in Unicode were added for
> compatibility with existing character sets. I'm guessing that there are
> hundreds to thousands of possible other symbols associated with games in
> one way or another, or that could be dug out of instruction manuals (eg,
> http://www.catan.com/files/downloads/catan_5th_ed_rules_eng_150303.pdf).
> (Many of those would be encumbered by copyright issues, but there are no
> doubt others that would not.)
>

I see two symbols used in text in that Catan manual; there's a white star
(U+2606) and a twelve-pointed red star (U+2739 or U+1F7D2?). I don't see
why books about games would be any different than any other book in this
manner; symbols used in running text should be encoded.

Re: Bit arithmetic on Unicode characters? / Re: Why incomplete subscript/superscript alphabet ?

2016-10-09 Thread Marcel Schneider

On Sun, 9 Oct 2016 13:00:30 +0200, Mark Davis ☕️ wrote:

[…]
> 
> I would recommend that any proposal for additional game symbols provide
> clear evidence for why those particular game symbols are required to be
> exchanged in plain text, in a way that many, many other possible game
> symbols are not.

I missed this point: “are required to be EXCHANGED in plain text.”

Would it be possible to add this as a requirement into the relevant section 
of TUS, please? Indeed I canʼt see any need to feed those French abbreviations
into a plain text data exchange. Weʼd rather write them out, or use the common 
acronyms:
‘BN’ for ‘Bibliothèque Nationale’ [National Library];
‘BM’ for ‘Bibliothèque Municipale’ [City Library].

However what we can do when it comes to abbreviate ‘bibliothèque’ or other 
words ending in ‘-que’ in plain text, one step I think we could do towards 
disambiguation is to emit a *new* recommendation for the abbreviation dot, 
that *is* already used in ‘M.’ for ‘Monsieur’ [Mister], and also in ‘cf.’ 
and other Latin abbreviations. So in plain text one could write either 
‘Biblio.que’ or ‘Bib.que’ for ‘Bibliothèque’ [Library].

While the official rejection rationale of *MODIFIER LETTER SMALL Q is still 
missing, I can now believe that it reiterated the recommendation to use 
markup, the more as MS Word does not mess up line spacing when superscript 
formatting is applied, and as this is better-looking in Tahoma than modifier 
letters when used to express semantics of abbreviation indicator or ordinal 
indicator. Iʼve run a test on ‘M^gr’, for ‘Monseigneur’ [Monsignor], and on 
‘3^e’. To avoid process garbage, Iʼve made the results available on-line.[1]

What got me really started, was the bizarre “Comment” on the Proposal to 
encode *MODIFIER LETTER SMALL Q. What I can do now, is to suggest to apply 
some kind of quality management on both sides, so that corporate officials 
refrain from publishing sloppy ad-hoc papers for consideration by the UTC, 
and Unicode wonʼt be reduced to accept all and everything for archiving in 
the Document Register.

I believe that this could be a practicable way to avoid other people to get 
bugged.

Regards,
Marcel

[1] Interested subscribers are welcome to view the screenshot from:
http://dispoclavier.com/French-abbrev-super-vs-modif.png
and to open the Word document from:
http://dispoclavier.com/French-abbrev-super-vs-modif.docx

Re: Bit arithmetic on Unicode characters?

2016-10-09 Thread Hans Åberg


> On 9 Oct 2016, at 13:00, Mark Davis ☕️  wrote:
> 
> Essentially all of the game pieces that are in Unicode were added for 
> compatibility with existing character sets. I'm guessing that there are 
> hundreds to thousands of possible other symbols associated with games in one 
> way or another, 

There is http://www.chessvariants.com/.

Re: Bit arithmetic on Unicode characters?

2016-10-09 Thread Mark Davis ☕️

Essentially all of the game pieces that are in Unicode were added for
compatibility with existing character sets. I'm guessing that there are
hundreds to thousands of possible other symbols associated with games in
one way or another, or that could be dug out of instruction manuals (eg,
http://www.catan.com/files/downloads/catan_5th_ed_rules_eng_150303.pdf).
(Many of those would be encumbered by copyright issues, but there are no
doubt others that would not.)

I would recommend that any proposal for additional game symbols provide
clear evidence for why those particular game symbols are required to be
exchanged in plain text, in a way that many, many other possible game
symbols are not.

Mark

On Sun, Oct 9, 2016 at 3:02 AM, Garth Wallace  wrote:

> On Sat, Oct 8, 2016 at 9:31 AM, Philippe Verdy  wrote:
>
>> Markup for rotation is highly underdeveloped, and in this case for chess
>> it has its own semantics, it's not just a presentation feature, possibly
>> meant for playing on larger boards with more players than 2, and
>> distinguished just like there's a distinction between white and black, or
>> meant to signal some dangerous positions or candidate target positions for
>> the next moves.
>>
>
> Not exactly. Rotation of chess piece symbols is not a presentation feature
> (at least as I understand the term), and isn't meant for use with
> multiplayer games. The rotated pieces are used in chess problems,
> specifically heterodox or "fairy chess" problems, where they stand in for
> non-standard pieces. A rotated rook, for instance, means "a piece that is
> not a rook but is similar in some respects"; which piece it represents
> specifically depends on context. Conventionally, the upside-down queen
> represents a "grasshopper" and the upside-down knight a "nightrider", but
> otherwise they are assigned on a problem-by-problem basis. This practice
> dates back to the early 20th century and was originally so that problem
> composers wouldn't have to cut new type for every new piece they invent but
> is now traditional.
>
> I also see some additions like florettes, and elephants needed for
>> traditional Asian variants of the game, plus combined forms (e.g.
>> tower+horse) which are quite intrigating.
>> There are also variants rotated 45 degrees.
>>
>
> The florettes are also used in problems, as are the equihoppers (the
> symbol that looks a bit like a bow tie or spindle). The compound symbols
> are found in problems and in several common variants such as Capablanca
> Chess and Grand Chess. The jester's cap is similar. The elephant and fers
> are used in shatranj or medieval chess.
>
>
>> All those are not just meant for display on the grid of a board but in
>> discussions about strategies. There are also combining notations added on
>> top of chess pieces (e.g. numbering pawns that are otherwise identical, but
>> in plain text you can still use notations with superscript digits or
>> letters, distinguished clearly from the numbering of grid positions, or by
>> adding some other punctuation marks).
>>
>
> I haven't encountered that. It's rarely necessary to differentiate
> individual pawns in notation: their moves are so limited that it's usually
> obvious which pawn is moving, and there is a standard method of
> disambiguating moves by starting square if needed.
>
>
>> I still don't see in these images the elephants (or other pieces like
>> unmovable rocks or rivers, or special pieces added to create handicaps for
>> one of the player). I've also seen some chess players using special queens
>> by putting a pawn on top of a nother falt pawn, with more limited movements
>> than a standard queen. There are also bishops/sorcerers/magicians, eagles,
>> dragoons, tigers/lions, rats, dogs/foxes, snakes,
>> spiders, soldiers/archers, canons, walls/fortresses, gold/treasures...
>> Chess games have a lot of variants with their supporters. Modern movies are
>> also promoting some variants.
>>
>
> There are elephants in the proposal, using a shape found in medieval
> manuscripts. Rocks and rivers are board features and not found in notation.
>
>
>>
>> 2016-10-08 17:24 GMT+02:00 Ken Shirriff :
>>
>>>
>>> Looking at the image, the idea of the proposal is to include chess piece
>>> symbols in all four 90° rotations? Wouldn't it be better to do this in
>>> markup than in Unicode? I fear a combinatorial explosion if Unicode starts
>>> including all the possible orientations of characters. (Maybe there's a
>>> good reason to do this for chess; I'm just going off the image
>>> 
>>> .)
>>>
>>
> The proposal covers this. These have a well-established use in chess
> notation, which doesn't apply to non-chess symbols. Markup would be the
> wrong way to do this. It's not like, say, electronic schematics where a
> diode symbol may be found in any orientation but still always

Re: Bit arithmetic on Unicode characters?

2016-10-08 Thread Garth Wallace

On Sat, Oct 8, 2016 at 9:31 AM, Philippe Verdy  wrote:

> Markup for rotation is highly underdeveloped, and in this case for chess
> it has its own semantics, it's not just a presentation feature, possibly
> meant for playing on larger boards with more players than 2, and
> distinguished just like there's a distinction between white and black, or
> meant to signal some dangerous positions or candidate target positions for
> the next moves.
>

Not exactly. Rotation of chess piece symbols is not a presentation feature
(at least as I understand the term), and isn't meant for use with
multiplayer games. The rotated pieces are used in chess problems,
specifically heterodox or "fairy chess" problems, where they stand in for
non-standard pieces. A rotated rook, for instance, means "a piece that is
not a rook but is similar in some respects"; which piece it represents
specifically depends on context. Conventionally, the upside-down queen
represents a "grasshopper" and the upside-down knight a "nightrider", but
otherwise they are assigned on a problem-by-problem basis. This practice
dates back to the early 20th century and was originally so that problem
composers wouldn't have to cut new type for every new piece they invent but
is now traditional.

I also see some additions like florettes, and elephants needed for
> traditional Asian variants of the game, plus combined forms (e.g.
> tower+horse) which are quite intrigating.
> There are also variants rotated 45 degrees.
>

The florettes are also used in problems, as are the equihoppers (the symbol
that looks a bit like a bow tie or spindle). The compound symbols are found
in problems and in several common variants such as Capablanca Chess and
Grand Chess. The jester's cap is similar. The elephant and fers are used in
shatranj or medieval chess.

> All those are not just meant for display on the grid of a board but in
> discussions about strategies. There are also combining notations added on
> top of chess pieces (e.g. numbering pawns that are otherwise identical, but
> in plain text you can still use notations with superscript digits or
> letters, distinguished clearly from the numbering of grid positions, or by
> adding some other punctuation marks).
>

I haven't encountered that. It's rarely necessary to differentiate
individual pawns in notation: their moves are so limited that it's usually
obvious which pawn is moving, and there is a standard method of
disambiguating moves by starting square if needed.

> I still don't see in these images the elephants (or other pieces like
> unmovable rocks or rivers, or special pieces added to create handicaps for
> one of the player). I've also seen some chess players using special queens
> by putting a pawn on top of a nother falt pawn, with more limited movements
> than a standard queen. There are also bishops/sorcerers/magicians, eagles,
> dragoons, tigers/lions, rats, dogs/foxes, snakes,
> spiders, soldiers/archers, canons, walls/fortresses, gold/treasures...
> Chess games have a lot of variants with their supporters. Modern movies are
> also promoting some variants.
>

There are elephants in the proposal, using a shape found in medieval
manuscripts. Rocks and rivers are board features and not found in notation.

>
> 2016-10-08 17:24 GMT+02:00 Ken Shirriff :
>
>>
>> Looking at the image, the idea of the proposal is to include chess piece
>> symbols in all four 90° rotations? Wouldn't it be better to do this in
>> markup than in Unicode? I fear a combinatorial explosion if Unicode starts
>> including all the possible orientations of characters. (Maybe there's a
>> good reason to do this for chess; I'm just going off the image
>> 
>> .)
>>
>
The proposal covers this. These have a well-established use in chess
notation, which doesn't apply to non-chess symbols. Markup would be the
wrong way to do this. It's not like, say, electronic schematics where a
diode symbol may be found in any orientation but still always represents a
diode: a rotated queen symbol is specifically *not a queen* but another
piece entirely.

Currently, fairy chess problemists rely on font hacks and PDFs (even for
relatively short texts).

Re: Bit arithmetic on Unicode characters?

2016-10-08 Thread Philippe Verdy

Markup for rotation is highly underdeveloped, and in this case for chess it
has its own semantics, it's not just a presentation feature, possibly meant
for playing on larger boards with more players than 2, and distinguished
just like there's a distinction between white and black, or meant to signal
some dangerous positions or candidate target positions for the next moves.

I also see some additions like florettes, and elephants needed for
traditional Asian variants of the game, plus combined forms (e.g.
tower+horse) which are quite intrigating.
There are also variants rotated 45 degrees.

All those are not just meant for display on the grid of a board but in
discussions about strategies. There are also combining notations added on
top of chess pieces (e.g. numbering pawns that are otherwise identical, but
in plain text you can still use notations with superscript digits or
letters, distinguished clearly from the numbering of grid positions, or by
adding some other punctuation marks).

I still don't see in these images the elephants (or other pieces like
unmovable rocks or rivers, or special pieces added to create handicaps for
one of the player). I've also seen some chess players using special queens
by putting a pawn on top of a nother falt pawn, with more limited movements
than a standard queen. There are also bishops/sorcerers/magicians, eagles,
dragoons, tigers/lions, rats, dogs/foxes, snakes,
spiders, soldiers/archers, canons, walls/fortresses, gold/treasures...
Chess games have a lot of variants with their supporters. Modern movies are
also promoting some variants.

2016-10-08 17:24 GMT+02:00 Ken Shirriff :

> Looking at the image, the idea of the proposal is to include chess piece
> symbols in all four 90° rotations? Wouldn't it be better to do this in
> markup than in Unicode? I fear a combinatorial explosion if Unicode starts
> including all the possible orientations of characters. (Maybe there's a
> good reason to do this for chess; I'm just going off the image
> 
> .)
>
> Ken
>
> On Fri, Oct 7, 2016 at 9:36 PM, Garth Wallace  wrote:
>
>> Sorry about the blank reply. Itchy trigger finger.
>>
>> On Thu, Oct 6, 2016 at 2:28 PM, Ken Whistler  wrote:
>>
>>>
>>> On 10/6/2016 12:44 PM, Garth Wallace wrote:
>>>
>>> Some representatives of the WFCC have proposed alternate arrangements
>>> that assume there will be a need for bitwise operations to covert between
>>> the existing chess symbols in the Miscellaneous Symbols block and related
>>> symbols in the new block. I don't see the need but maybe I'm missing
>>> something.
>>>
>>>
>>> I don't think you are missing anything. Bitwise operations would
>>> certainly *not* be needed in a case like this. Small lookup and mapping
>>> tables would suffice.
>>>
>>> --Ken
>>>
>>>
>> Thank you.
>>
>> Just to be clear, this is the proposed allocation as it stands:
>> http://i556.photobucket.com/albums/ss7/Garth_Wallace/propose
>> d%20characters_zps81m0frvl.png
>>
>> That arrangement is the result of some discussion with a representative
>> of the WFCC.
>>
>> And here are the alternatives that another WFCC representative recently
>> proposed and that prompted my question: http://i556.photobucket.com/al
>> bums/ss7/Garth_Wallace/wfcc%20alternatives_zpstdvfgcf2.png
>>
>
>

Re: Bit arithmetic on Unicode characters?

2016-10-08 Thread Ken Shirriff

Looking at the image, the idea of the proposal is to include chess piece
symbols in all four 90° rotations? Wouldn't it be better to do this in
markup than in Unicode? I fear a combinatorial explosion if Unicode starts
including all the possible orientations of characters. (Maybe there's a
good reason to do this for chess; I'm just going off the image

.)

Ken

On Fri, Oct 7, 2016 at 9:36 PM, Garth Wallace  wrote:

> Sorry about the blank reply. Itchy trigger finger.
>
> On Thu, Oct 6, 2016 at 2:28 PM, Ken Whistler  wrote:
>
>>
>> On 10/6/2016 12:44 PM, Garth Wallace wrote:
>>
>> Some representatives of the WFCC have proposed alternate arrangements
>> that assume there will be a need for bitwise operations to covert between
>> the existing chess symbols in the Miscellaneous Symbols block and related
>> symbols in the new block. I don't see the need but maybe I'm missing
>> something.
>>
>>
>> I don't think you are missing anything. Bitwise operations would
>> certainly *not* be needed in a case like this. Small lookup and mapping
>> tables would suffice.
>>
>> --Ken
>>
>>
> Thank you.
>
> Just to be clear, this is the proposed allocation as it stands:
> http://i556.photobucket.com/albums/ss7/Garth_Wallace/
> proposed%20characters_zps81m0frvl.png
>
> That arrangement is the result of some discussion with a representative of
> the WFCC.
>
> And here are the alternatives that another WFCC representative recently
> proposed and that prompted my question: http://i556.photobucket.com/
> albums/ss7/Garth_Wallace/wfcc%20alternatives_zpstdvfgcf2.png
>

Re: Bit arithmetic on Unicode characters?

2016-10-08 Thread Garth Wallace

Sorry about the blank reply. Itchy trigger finger.

On Thu, Oct 6, 2016 at 2:28 PM, Ken Whistler  wrote:

>
> On 10/6/2016 12:44 PM, Garth Wallace wrote:
>
> Some representatives of the WFCC have proposed alternate arrangements that
> assume there will be a need for bitwise operations to covert between the
> existing chess symbols in the Miscellaneous Symbols block and related
> symbols in the new block. I don't see the need but maybe I'm missing
> something.
>
>
> I don't think you are missing anything. Bitwise operations would certainly
> *not* be needed in a case like this. Small lookup and mapping tables
> would suffice.
>
> --Ken
>
>
Thank you.

Just to be clear, this is the proposed allocation as it stands:
http://i556.photobucket.com/albums/ss7/Garth_Wallace/proposed%20characters_zps81m0frvl.png

That arrangement is the result of some discussion with a representative of
the WFCC.

And here are the alternatives that another WFCC representative recently
proposed and that prompted my question:
http://i556.photobucket.com/albums/ss7/Garth_Wallace/wfcc%20alternatives_zpstdvfgcf2.png

Re: Bit arithmetic on Unicode characters?

2016-10-07 Thread Garth Wallace

On Thu, Oct 6, 2016 at 2:28 PM, Ken Whistler  wrote:

>
> On 10/6/2016 12:44 PM, Garth Wallace wrote:
>
> Some representatives of the WFCC have proposed alternate arrangements that
> assume there will be a need for bitwise operations to covert between the
> existing chess symbols in the Miscellaneous Symbols block and related
> symbols in the new block. I don't see the need but maybe I'm missing
> something.
>
>
> I don't think you are missing anything. Bitwise operations would certainly
> *not* be needed in a case like this. Small lookup and mapping tables
> would suffice.
>
> --Ken
>
>

Re: Bit arithmetic on Unicode characters?

2016-10-07 Thread Oren Watson

Except that it states at the very start of that file "this file should not be
parsed for machine-readable information."

On Fri, Oct 7, 2016 at 6:41 PM, Andrew West  wrote:

> On 7 October 2016 at 23:31, Doug Ewell  wrote:
> >
> > Well, "treacherous" is right. I'd hesitate to trust an algorithm to
> > recognize PLANCK CONSTANT as the character name that logically fits
> > between MATHEMATICAL ITALIC SMALL G and MATHEMATICAL ITALIC SMALL I.
>
> Well, it could be picked up from that most treacherous of Unicode data
> files http://www.unicode.org/Public/UNIDATA/NamesList.txt
>
> Andrew
>

Re: Bit arithmetic on Unicode characters?

2016-10-07 Thread Andrew West

On 7 October 2016 at 23:31, Doug Ewell  wrote:
>
> Well, "treacherous" is right. I'd hesitate to trust an algorithm to
> recognize PLANCK CONSTANT as the character name that logically fits
> between MATHEMATICAL ITALIC SMALL G and MATHEMATICAL ITALIC SMALL I.

Well, it could be picked up from that most treacherous of Unicode data
files http://www.unicode.org/Public/UNIDATA/NamesList.txt

Andrew

RE: Bit arithmetic on Unicode characters?

2016-10-07 Thread Doug Ewell

Andrew West wrote:

> Well, it could be picked up from that most treacherous of Unicode data
> files http://www.unicode.org/Public/UNIDATA/NamesList.txt

Even then, you have:

...
1D454   MATHEMATICAL ITALIC SMALL G
#  0067 latin small letter g
1D455   
x (planck constant - 210E)
1D456   MATHEMATICAL ITALIC SMALL I
#  0069 latin small letter i
...

The only way you can tell from this that U+210E is a mathematical italic
small H is from the context of the previous character. That wouldn't
bode well if the letter A were one of the exceptionally located code
points. Thankfully, it never is, so this cleverness might work after
all.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Bit arithmetic on Unicode characters?

2016-10-07 Thread Doug Ewell

Richard Wordingham wrote:

>> I can't find anything in the UCD that distinguishes one "font
>> variant" from another (UnicodeData.txt shown as an example):
>
> It's in that most treacherous of properties, the character's name.

Well, "treacherous" is right. I'd hesitate to trust an algorithm to
recognize PLANCK CONSTANT as the character name that logically fits
between MATHEMATICAL ITALIC SMALL G and MATHEMATICAL ITALIC SMALL I.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Bit arithmetic on Unicode characters?

2016-10-07 Thread Richard Wordingham

On Fri, 07 Oct 2016 09:06:31 -0700
"Doug Ewell"  wrote:

> Richard Wordingham wrote:

> > Perhaps there is just enough information in the UCD to allow
> > exhaustive, automated tests.

> I can't find anything in the UCD that distinguishes one "font variant"
> from another (UnicodeData.txt shown as an example):

> 1D400;MATHEMATICAL BOLD CAPITAL A;Lu;0;L; 0041N;
> 1D434;MATHEMATICAL ITALIC CAPITAL A;Lu;0;L; 0041N;
> 1D468;MATHEMATICAL BOLD ITALIC CAPITAL A;Lu;0;L; 0041N;


It's in that most treacherous of properties, the character's name.

Richard.

Re: Bit arithmetic on Unicode characters?

2016-10-07 Thread Hans Åberg


> On 7 Oct 2016, at 18:06, Doug Ewell  wrote:

> I can't find anything in the UCD that distinguishes one "font variant"
> from another (UnicodeData.txt shown as an example):
> 
> 1D400;MATHEMATICAL BOLD CAPITAL A;Lu;0;L; 0041N;
> 1D434;MATHEMATICAL ITALIC CAPITAL A;Lu;0;L; 0041N;
> 1D468;MATHEMATICAL BOLD ITALIC CAPITAL A;Lu;0;L; 0041N;
> 1D49C;MATHEMATICAL SCRIPT CAPITAL A;Lu;0;L; 0041N;
> 1D4D0;MATHEMATICAL BOLD SCRIPT CAPITAL A;Lu;0;L; 0041N;
> 1D504;MATHEMATICAL FRAKTUR CAPITAL A;Lu;0;L; 0041N;
> 1D538;MATHEMATICAL DOUBLE-STRUCK CAPITAL A;Lu;0;L; 0041N;
> 1D56C;MATHEMATICAL BOLD FRAKTUR CAPITAL A;Lu;0;L; 0041N;
> 1D5A0;MATHEMATICAL SANS-SERIF CAPITAL A;Lu;0;L; 0041N;
> 1D5D4;MATHEMATICAL SANS-SERIF BOLD CAPITAL A;Lu;0;L;
> 0041N;
> 1D608;MATHEMATICAL SANS-SERIF ITALIC CAPITAL A;Lu;0;L;
> 0041N;
> 1D63C;MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL A;Lu;0;L;
> 0041N;
> 1D670;MATHEMATICAL MONOSPACE CAPITAL A;Lu;0;L; 0041N;
> 
> And that's probably as it should be, because UTC never intended MAS to
> be readily transformed to and from "plain" characters. They're supposed
> to be used for mathematical expressions in which styled letters have
> special meaning.

I use them for input text files, and it is not particularly difficult. An 
efficient method is to use text substitutions, as available on MacOS. The 
resulting file is UTF-8 with the correct character, and typesetting systems 
like LuaTeX with ConTeXt or LaTeX/unicode-math translates it into a PDF. It is 
usually easy to immediately spot if a math style is wrong. Using it in the 
input makes one more aware of new styles that in the past was not available.

Re: Bit arithmetic on Unicode characters?

2016-10-07 Thread Doug Ewell

Richard Wordingham wrote:

> Yes, it's a trade-off. The application I had in mind is converting
> between mathematical letter variants and their 'plain' forms.

Long-time list members might remember a Windows utility I wrote to
convert between normal Unicode text and Mathematical Alphanumeric
Symbols. Andrew West (of BabelPad fame) has a similar, web-based app
that also supports things like small caps and superscript.

Both of these use lookup tables to do the conversions, and use
algorithms only for very broad-based operations, like distinguishing the
Latin-letter range in the MAS block from the Greek letters and the
digits. There's no practical value in implementing conversions like this
algorithmically. Maybe if there were one or two exceptions in the MAS
range instead of two dozen, it might be different.

> Perhaps there is just enough information in the UCD to allow
> exhaustive, automated tests.

I can't find anything in the UCD that distinguishes one "font variant"
from another (UnicodeData.txt shown as an example):

1D400;MATHEMATICAL BOLD CAPITAL A;Lu;0;L; 0041N;
1D434;MATHEMATICAL ITALIC CAPITAL A;Lu;0;L; 0041N;
1D468;MATHEMATICAL BOLD ITALIC CAPITAL A;Lu;0;L; 0041N;
1D49C;MATHEMATICAL SCRIPT CAPITAL A;Lu;0;L; 0041N;
1D4D0;MATHEMATICAL BOLD SCRIPT CAPITAL A;Lu;0;L; 0041N;
1D504;MATHEMATICAL FRAKTUR CAPITAL A;Lu;0;L; 0041N;
1D538;MATHEMATICAL DOUBLE-STRUCK CAPITAL A;Lu;0;L; 0041N;
1D56C;MATHEMATICAL BOLD FRAKTUR CAPITAL A;Lu;0;L; 0041N;
1D5A0;MATHEMATICAL SANS-SERIF CAPITAL A;Lu;0;L; 0041N;
1D5D4;MATHEMATICAL SANS-SERIF BOLD CAPITAL A;Lu;0;L;
0041N;
1D608;MATHEMATICAL SANS-SERIF ITALIC CAPITAL A;Lu;0;L;
0041N;
1D63C;MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL A;Lu;0;L;
0041N;
1D670;MATHEMATICAL MONOSPACE CAPITAL A;Lu;0;L; 0041N;

And that's probably as it should be, because UTC never intended MAS to
be readily transformed to and from "plain" characters. They're supposed
to be used for mathematical expressions in which styled letters have
special meaning. (My utility, and I'm sure Andrew's, were written
entirely tongue-in-cheek.)

> My email client found a font to render U+1D547 as the unwary
> would expect, i.e. using a glyph suitable for ℙ U+2119 DOUBLE-STRUCK
> CAPITAL P. I was surprised when I first saw those gaps; I would have
> expected characters with appropriate singleton decompositions to protect
> the unwary. (The idea might have come up at the time of encoding, and
> been dismissed with reasons.)

Unifying identical characters with identical meanings, rather than
creating pointless duplicates, was a major design tenet of Unicode.

> I don't know whether the font's misrendering is an accident or is
> deliberate partial protection of the victims of bad character code
> selection.

Either way, it's a bug. Users who try to render an unassigned code point
should not be "protected" by showing them a glyph that the font designer
thought should be there. They should be shown a .notdef glyph so they
know something is wrong.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Bit arithmetic on Unicode characters?

2016-10-07 Thread Hans Åberg


> On 7 Oct 2016, at 09:27, Garth Wallace  wrote:
> 
> Unicode doesn't really address chess piece properties like white/black beyond 
> naming conventions.

>From the formal point of view, Unicode only assigns character numbers (code 
>points), which gets a binary representation first when encoded, like with 
>UTF-8 which makes it agree with ASCII for small numbers. The math alphabetical 
>letters are out of order because of legacy, but that is not a problem as one 
>will use an interface that sorts it out. These numbers are only for display to 
>humans, and computers are nowadays fast enough to sort it out. A chess program 
>has its own, optimized representation anyway.

So possibly you might add more properties.

Re: Bit arithmetic on Unicode characters?

2016-10-07 Thread Garth Wallace

On Thu, Oct 6, 2016 at 5:42 PM, Shawn Steele 
wrote:

> Presumably a table-based approach would merely require rerunning the
> table-building script from the UCD when new versions were released.
>

For casing, sure, but that's not really relevant in this context, since
Unicode doesn't really address chess piece properties like white/black
beyond naming conventions.

Re: Bit arithmetic on Unicode characters?

2016-10-07 Thread Richard Wordingham

On Thu, 6 Oct 2016 21:18:15 -0400
Oren Watson  wrote:

> On Thu, Oct 6, 2016 at 8:28 PM, Richard Wordingham <
> richard.wording...@ntlworld.com> wrote:

> > Yes, it's a trade-off.  The application I had in mind is converting
> > between mathematical letter variants and their 'plain' forms.
> > Perhaps there is just enough information in the UCD to allow
> > exhaustive, automated tests.

> That application is hindered by the fact that
> 
> 픆픋플픕픝픺픿핅핇핈핉핑풝풠풡풣풤풧풨풭풺풼퓄 are unallocated
> characters, forming gaps in the otherwise contiguous mathematical
> alphabets.

(Aside: That written statement is illegal! -:)

Yep.  It's a known nuisance, which is why I suggested exhaustive tests.
My email client found a font to render U+1D547 as the unwary
would expect, i.e. using a glyph suitable for ℙ U+2119 DOUBLE-STRUCK
CAPITAL P. I was surprised when I first saw those gaps; I would have
expected characters with appropriate singleton decompositions to protect
the unwary.  (The idea might have come up at the time of encoding, and
been dismissed with reasons.)  I don't know whether the font's
misrendering is an accident or is deliberate partial protection of the
victims of bad character code selection.

An old application of arithmetic was transliteration between the
major Indian Indic scripts.  That falls foul of Tamil and of characters
that were not represented in ISCII.

Richard.

Re: Bit arithmetic on Unicode characters?

2016-10-06 Thread Oren Watson

That application is hindered by the fact that

픆픋플픕픝픺픿핅핇핈핉핑풝풠풡풣풤풧풨풭풺풼퓄 are unallocated
characters, forming gaps in the otherwise contiguous mathematical
alphabets.


On Thu, Oct 6, 2016 at 8:28 PM, Richard Wordingham <
richard.wording...@ntlworld.com> wrote:

> On Thu, 6 Oct 2016 16:54:21 -0700
> Ken Whistler  wrote:
>
> > On 10/6/2016 4:32 PM, Richard Wordingham wrote:
> > > The
> > > problem is that manually constructed lookup tables are prone to
> > > human error.
> >
> > ... as are manually constructed algorithms that attempt to take
> > advantage of sub-ranges of case pair adjacency in the Unicode code
> > charts to do casing with bit arithmetic.
>
> Yes, it's a trade-off.  The application I had in mind is converting
> between mathematical letter variants and their 'plain' forms. Perhaps
> there is just enough information in the UCD to allow exhaustive,
> automated tests.
>
> For _simple_ case folding, algorithmic case folding can be expanded to
> a list of range tests, generalising what is often done for ASCII.
> Obviously the testing should be repeated with each new version of
> Unicode, which is straightforward if the case folding is compliant with
> Unicode.  (Turkish would be a reason for not being compliant.)
>
> Richard.
>

Re: Bit arithmetic on Unicode characters?

2016-10-06 Thread Richard Wordingham

On Thu, 6 Oct 2016 16:54:21 -0700
Ken Whistler  wrote:

> On 10/6/2016 4:32 PM, Richard Wordingham wrote:
> > The
> > problem is that manually constructed lookup tables are prone to
> > human error.
> 
> ... as are manually constructed algorithms that attempt to take 
> advantage of sub-ranges of case pair adjacency in the Unicode code 
> charts to do casing with bit arithmetic.

Yes, it's a trade-off.  The application I had in mind is converting
between mathematical letter variants and their 'plain' forms. Perhaps
there is just enough information in the UCD to allow exhaustive,
automated tests.

For _simple_ case folding, algorithmic case folding can be expanded to
a list of range tests, generalising what is often done for ASCII.
Obviously the testing should be repeated with each new version of
Unicode, which is straightforward if the case folding is compliant with
Unicode.  (Turkish would be a reason for not being compliant.)

Richard.

Re: Bit arithmetic on Unicode characters?

2016-10-06 Thread Ken Whistler




On 10/6/2016 4:32 PM, Richard Wordingham wrote:

The
problem is that manually constructed lookup tables are prone to human
error.


... as are manually constructed algorithms that attempt to take 
advantage of sub-ranges of case pair adjacency in the Unicode code 
charts to do casing with bit arithmetic.


--Ken

Re: Bit arithmetic on Unicode characters?

2016-10-06 Thread Richard Wordingham

On Thu, 6 Oct 2016 12:44:05 -0700
Garth Wallace  wrote:

> Other than converting between UTFs, is bit arithmetic commonly
> performed on Unicode characters? I was under the impression that it's
> a rarity if it is done at all.

It's possible to use it for the bulk of case folding, especially if the
program only supports a specific repertoire.

For specialist tasks, exploiting arithmetic relationships make sense.
I would expect that most ASCII clones are handled that way.  The
problem is that manually constructed lookup tables are prone to human
error.

Richard.

Re: Bit arithmetic on Unicode characters?

2016-10-06 Thread Ken Whistler



On 10/6/2016 12:44 PM, Garth Wallace wrote:
Some representatives of the WFCC have proposed alternate arrangements 
that assume there will be a need for bitwise operations to covert 
between the existing chess symbols in the Miscellaneous Symbols block 
and related symbols in the new block. I don't see the need but maybe 
I'm missing something.


I don't think you are missing anything. Bitwise operations would 
certainly *not* be needed in a case like this. Small lookup and mapping 
tables would suffice.


--Ken

Re: Bit arithmetic on Unicode characters?

2016-10-06 Thread Philippe Verdy

As far as we know, arithmetic is performed only in
- subsets of decimal digits in ASCII and for a dozen of scripts and
converting automatically between them using a single additive constant for
the 10 digits.
- Basic Latin/ASCII for mapping lettercases and mapping non-decimal digits
(adding 6 starting at 10 to use letters A..Z after 0..9)
- the subset of precomposed syllables in Hangul (needed also for checking
canonical equivalences and for the standard NFC/NFD normalizations, and
partly for implementing NFKC/NFKD normalizations and collation).
- in all other cases, this is not reliable at all (characters may still be
allocated in unused slots without any relation to case mappings, e.g. for
the slot in the basic Greek alphabet with the final sigma only encoded in
lowercase, or for mapping the Turkic distinction of dotted I and undotted
i): you'll need proper mapping tables.
- for symbols which could benefit of it (such as box-drawing characters),
it is not used, except for Braille patterns, or for mapping between black
and white versions of chess pieces, or mapping between comparable mahjong
tiles series in their basic set (but not necessarily with the same constant
in extended sets, as it would have required allocating them in more columns
than strictly needed), or for ASCII letters with mapping mathematical
variants of Latin letters or RIS symbols or wide variants for CJK.


2016-10-06 21:44 GMT+02:00 Garth Wallace :

> Other than converting between UTFs, is bit arithmetic commonly performed
> on Unicode characters? I was under the impression that it's a rarity if it
> is done at all.
>
> I've been working on a proposal for additional chess symbols used in chess
> problems and variant games, and I've been in communication with the World
> Federation for Chess Composition, which is the international organization
> in charge of chess problems. We have agreement on the repertoire and the
> text of the proposal, but the arrangement of the proposed characters within
> the new block is a sticking point. Some representatives of the WFCC have
> proposed alternate arrangements that assume there will be a need for
> bitwise operations to covert between the existing chess symbols in the
> Miscellaneous Symbols block and related symbols in the new block. I don't
> see the need but maybe I'm missing something.
>

Re: Bit arithmetic on Unicode characters?

2016-10-06 Thread Asmus Freytag (c)


  
  
On 10/6/2016 12:44 PM, Garth Wallace
  wrote:


  Other than converting between UTFs, is bit
arithmetic commonly performed on Unicode characters? I was under
the impression that it's a rarity if it is done at all.


I've been working on a proposal for additional chess
  symbols used in chess problems and variant games, and I've
  been in communication with the World Federation for Chess
  Composition, which is the international organization in charge
  of chess problems. We have agreement on the repertoire and the
  text of the proposal, but the arrangement of the proposed
  characters within the new block is a sticking point. Some
  representatives of the WFCC have proposed alternate
  arrangements that assume there will be a need for bitwise
  operations to covert between the existing chess symbols in the
  Miscellaneous Symbols block and related symbols in the new
  block. I don't see the need but maybe I'm missing something.
  

Bit arithmetic was used for ASCII-only
systems to do case mappings.
I'm sure it had its advantages in the times
of severely constrained memory and very slow processors, but
it's really not extensible, is it?
Offset calculations are commonly performed
on digits, even today; so it would not be too much of a stretch
to expect that for a subset of symbols for which there is a need
of a common processing step, there might be attention paid to
making easy offset calculations possible.
A./

Bit arithmetic on Unicode characters?

2016-10-06 Thread Garth Wallace

Other than converting between UTFs, is bit arithmetic commonly performed on
Unicode characters? I was under the impression that it's a rarity if it is
done at all.

I've been working on a proposal for additional chess symbols used in chess
problems and variant games, and I've been in communication with the World
Federation for Chess Composition, which is the international organization
in charge of chess problems. We have agreement on the repertoire and the
text of the proposal, but the arrangement of the proposed characters within
the new block is a sticking point. Some representatives of the WFCC have
proposed alternate arrangements that assume there will be a need for
bitwise operations to covert between the existing chess symbols in the
Miscellaneous Symbols block and related symbols in the new block. I don't
see the need but maybe I'm missing something.

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters? / Re: Why incomplete subscript/superscript alphabet ?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

RE: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Re: Bit arithmetic on Unicode characters?

Bit arithmetic on Unicode characters?

27 matches

Site Navigation

Mail list logo

Footer information