Re: [ProofPower] Unicode and ProofPower

Phil Clayton Fri, 27 Mar 2015 07:43:19 -0700

Rob,

12/03/15 12:31, Rob Arthan wrote:

I am currently working on support for input and output using Unicode
into ProofPower. I would appreciate any feedback on the translation
scheme I am proposing to use, which is described here:


http://www.lemma-one.com/ProofPower/unicode/pp-unicode.html


I doubt %xNNNNNN% occurs in any existing scripts, so I think that's fine.

Your Unicode values for %lseq% and %rseq% have not taken into accountthe correction to the Z Standard in Technical Corrigendum 1:

Page 24, in 6.4.6.5

In line 3, replace “0000 3008 LEFT ANGLE BRACKET” by “0000 27E8MATHEMATICAL LEFT ANGLE BRACKET”.In line 4, replace “0000 3009 RIGHT ANGLE BRACKET” by “0000 27E9MATHEMATICAL RIGHT ANGLE BRACKET”.

So
   - 3008 should be 27E8
   - 3009 should be 27E9

You currently have
2588 FULL BLOCK

for %EFT%. This renders quite large. I'm guessing that thedescription 'full block' means that the intention is for the glyph to beentirely black. This is larger that the square we're used to and it maynot even be square. I think

25A0 BLACK SQUARE
could be more suitable (and would be square!)
To compare in your email client:  █ (2588) and ■ (25A0)
What do you think?

You currently have
208D SUBSCRIPT LEFT PARENTHESIS     for %ulbegin%
208E SUBSCRIPT RIGHT PARENTHESIS    for %ulend%

but these parentheses don't have underscores like the ProofPower glyphs.One option is

298B LEFT SQUARE BRACKET WITH UNDERBAR
298C RIGHT SQUARE BRACKET WITH UNDERBAR

They appear as ⦋ (298B) and ⦌ (298C) but I think they're too close torefinement symbols. I think this could be a case where you probablywant to combine another glyph with the existing parenthesis using thecombining diacritics. I think the best options are

0331 COMBINING MACRON BELOW
which gives ̱( ̱)
These work in my email client and terminal.  Alternatively
0332 COMBINING LOW LINE
which gives (̲ )̲

These don't overlay correctly in my email client or terminal but are asingle wide character.


For epsilon, I can see why
03F5 GREEK LUNATE EPSILON SYMBOL
has been used instead of
03B5 GREEK SMALL LETTER EPSILON

- it is the correct shape for the equivalent LaTeX character. I wasfinding that the lunate epsilon is very similar to the membershipoperator. The lower case LaTeX mathematical greek letters appearitalicized so I wonder whether the standard should have used e.g.

𝜃 (1D703) instead of θ (03B8)
𝜆 (1D706) instead of λ (03BB)
𝜇 (1D707) instead of μ (03BC)
so you would have chosen
𝜖 (1D716) instead of ϵ (03F5)

That would have meant also that any Greek authors of Z don't findthemselves deprived of certain letters. As things stand, I agree withyour choice.

There are some multi-character operators that could be translated to asingle character operator that would result in Unicode documents more inline with ISO Z.

\%down%s        ---> ⧹ (29F9) for schema hiding
%filter%%down%s ---> ⨡ (2A21) for schema projection
%rcomp%%down%s  ---> ⨟ (2A1F) for schema composition

This would require ProofPower-Z to be updated to add the new symbols. Imention it now because the transition to Unicode could be used as anopportunity to eliminate the non-ISO forms from Unicode documents, ifyou wished to transition to the ISO symbol.

Either way, you probably want to add schema projection and compositionto the ML function utf8_to_pp_replacement_data.

The situation with hyphen minus is unfortunate. I presume the issue isthat you can't identify which instances of ASCII hyphen-minus are Zsubtraction operators.


The following are not well supported on my system:
01F13C SQUARED LATIN CAPITAL LETTER M    for %SML%
01F143 SQUARED LATIN CAPITAL LETTER T    for %<:%
01F149 SQUARED LATIN CAPITAL LETTER Z    for %SZT%
What do you consider converting them to the expanded form, e.g.
  %SZT% ---> ⌜↘Z↕
?

I think a single character is preferable and ideally it would be somehowbased on the opening corner '⌜'. I wonder if there is a combining trickpossible here.

I am particularly interested to know how good the coverage for the
Unicode characters needed is on different systems and on different
applications. You can see how good your web browser is just by looking
at the above Web page, which includes GIFs that show how LaTeX renders
the ProofPower symbols alongside the Unicode translation as rendered by
your browser. To see how good your mail client is, the following should
look something like the HOL definition of distributed union:

⌜∀ x a⦁ x ∈ ⋃ a ⇔ (∃ s⦁ x ∈ s ∧ s ∈ a)⌝

This is readable on my system - all glyphs are available. The sizes area little odd but that could be due to font substitution.

(There is a complete list of the symbols used at the end of this e-mail.)

Current indications are that Mac OS can do everything that is required,
but that recent Linux systems and MS Windows tend to miss one or two
glyphs: the squared upper case letters I am using for some of the
quotation symbols seem to be poorly supported, for example. Most text
editors and word processors on the systems I have tried do support
UTF-8. If anyone can advise on how to extend the range of Unicode glyphs
available on Linux or on how to get font substitution working nicely
with OpenMotif, I would be very grateful.


On Linux, perhaps it helps to install or upgrade the package
  stix-fonts (Fedora)
  fonts-stix (Ubuntu)
or, without using packages, follow these instructions:
http://www.stixfonts.org/install.html

With fontconfig, I suspect a user installation just requires the font tobe dropped into ~/.fonts and it just works.

There is a prototype implementation of the translation as two
free-standing programs pputf8 and utf8pp here:

https://github.com/RobArthan/pp-contrib

(These have been around for a while, but I have just changed them to
implement the somewhat simplified and extended scheme described in the
Web page cited above.)

My immediate plan is to integrate the translation into ProofPower
itself, so that Unicode input and output are options you can select with
set_flag (similar to the way that you can currently select ASCII output,
by calling set_flag(“use_extended_chars”, false)). I will provide
separate flags for input and output.

Great - I look forward to seeing Z fly past in my shell!! Now I need towork out how to get the same key mapping working in Linux generally.

Below, as a further test of your e-mail client, you will find the ASCII
encoding and UTF-8 encoding of the corresponding Unicode for all the
characters in the ProofPower extended character set.

I have checked the Unicode support with a few programs. I suspect thatthe user experience will also be affected by which fonts are available,in particular STIX, because fontconfig performs glyph substitution.



Thunderbird 17.0.2, this email

The following have no glyph available:
 >       2  %<:%: 🅃
 >      20  %SML%: 🄼
 >      24  %SZT%: 🅉

Also:
 >      27  %Upsilon%: Υ    -- glyph appears identical to capital Y
 >     108  %select%: ϵ     -- very similar to ∈


Firefox 18.0.2, pp-unicode.html

The following have no glyph available:
0x01F13C
0x01F143
0x01F149

0x0003A5 is identical to capital Y


I'm using Fedora 16 (still!) and have package versions as follows:

$ rpm -q stix-fonts fontconfig thunderbird firefox
stix-fonts-1.0.0-2.fc15.noarch
fontconfig-2.8.0-8.fc16.x86_64
fontconfig-2.8.0-8.fc16.i686
thunderbird-17.0.2-1.fc16.x86_64
firefox-18.0.2-1.fc16.x86_64


Regards,
Phil

_______________________________________________
Proofpower mailing list
[email protected]
http://lemma-one.com/mailman/listinfo/proofpower_lemma-one.com

Re: [ProofPower] Unicode and ProofPower

Reply via email to