# Re: [ProofPower] Unicode and ProofPower

Phil,

Many thanks for your feedback. I will update the web page and the pputf8
contrib source shortly.

On 27 Mar 2015, at 14:41, Phil Clayton <phil.clay...@lineone.net> wrote:

> Rob,
>
> 12/03/15 12:31, Rob Arthan wrote:
>> I am currently working on support for input and output using Unicode
>> into ProofPower. I would appreciate any feedback on the translation
>> scheme I am proposing to use, which is described here:
>>
>> http://www.lemma-one.com/ProofPower/unicode/pp-unicode.html
>
> I doubt %xNNNNNN% occurs in any existing scripts, so I think that's fine.

Me too!

>
> Your Unicode values for %lseq% and %rseq% have not taken into account the
> correction to the Z Standard in Technical Corrigendum 1:
> Page 24, in 6.4.6.5
> In line 3, replace “0000 3008 LEFT ANGLE BRACKET” by “0000 27E8 MATHEMATICAL
> LEFT ANGLE BRACKET”.
> In line 4, replace “0000 3009 RIGHT ANGLE BRACKET” by “0000 27E9 MATHEMATICAL
> RIGHT ANGLE BRACKET”.
> So
>   - 3008 should be 27E8
>   - 3009 should be 27E9

> You currently have
> 2588 FULL BLOCK
> for %EFT%.  This renders quite large.   I'm guessing that the description
> 'full block' means that the intention is for the glyph to be entirely black.
> This is larger that the square we're used to and it may not even be square.
> I think
> 25A0 BLACK SQUARE
> could be more suitable (and would be square!)
> To compare in your email client:  █ (2588) and ■ (25A0)
> What do you think?
>
Agreed.

> You currently have
> 208D SUBSCRIPT LEFT PARENTHESIS     for %ulbegin%
> 208E SUBSCRIPT RIGHT PARENTHESIS    for %ulend%
> but these parentheses don't have underscores like the ProofPower glyphs.  One
> option is
> 298B LEFT SQUARE BRACKET WITH UNDERBAR
> 298C RIGHT SQUARE BRACKET WITH UNDERBAR
> They appear as ⦋ (298B) and ⦌ (298C) but I think they're too close to
> refinement symbols.  I think this could be a case where you probably want to
> combine another glyph with the existing parenthesis using the combining
> diacritics.  I think the best options are
> 0331 COMBINING MACRON BELOW
> which gives ̱( ̱)
> These work in my email client and terminal.  Alternatively
> 0332 COMBINING LOW LINE
> which gives (̲ )̲
> These don't overlay correctly in my email client or terminal but are a single
> wide character.

I am chary about using the COMBINING characters as it seems likely to increase
the chances that any given text-processing system will go wrong in some way.
The use of underlined brackets in the ProofPower font was a bit of an arbitrary
choice. So I have changed to the following which look like the beginning and
end of underlining.

2A3D RIGHTHAND INTERIOR PRODUCT: ⨽
2A3C INTERIOR PRODUCT: ⨼

>
> For epsilon, I can see why
> 03F5 GREEK LUNATE EPSILON SYMBOL
> has been used instead of
> 03B5 GREEK SMALL LETTER EPSILON
> - it is the correct shape for the equivalent LaTeX character.

I don’t know why we used \varepsilon in LaTeX rather than \epsilon for this. (I
think there is slight tendency to use the lunate epsilon for the Hilbert choice
operator in the literature, but the ordinary is often used too and seems to be
more common these days, and in Z you expect it just to be the ordinary greek
letter.)
l
>  I was finding that the lunate epsilon is very similar to the membership
> operator.  The lower case LaTeX mathematical greek letters appear italicized
> so I wonder whether the standard should have used e.g.
> 𝜃 (1D703) instead of θ (03B8)
> 𝜆 (1D706) instead of λ (03BB)
> 𝜇 (1D707) instead of μ (03BC)
> so you would have chosen
> 𝜖 (1D716) instead of ϵ (03F5)
> That would have meant also that any Greek authors of Z don't find themselves
> deprived of certain letters.  As things stand, I agree with your choice.
>

I think it is an excellent idea to use the mathematical greek letters. I have
done that, fixing a mistake with 𝜓 in passing and treating epsilon the same as
all the others now.

> There are some multi-character operators that could be translated to a single
> character operator that would result in Unicode documents more in line with
> ISO Z.
> \%down%s        ---> ⧹ (29F9) for schema hiding
> %filter%%down%s ---> ⨡ (2A21) for schema projection
> %rcomp%%down%s  ---> ⨟ (2A1F) for schema composition
> This would require ProofPower-Z to be updated to add the new symbols.  I
> mention it now because the transition to Unicode could be used as an
> opportunity to eliminate the non-ISO forms from Unicode documents, if you
> wished to transition to the ISO symbol.
>
> Either way, you probably want to add schema projection and composition to the
> ML function utf8_to_pp_replacement_data.
>

I don’t think you had updated your clone of the pputf8 contrib.
utf8_to_pp_replacement_data was a half-hearted attempt to introduce a bit more
compatibility with ISO Z that I decided to abandon. A proper translation
between ProofPower-Z and ISO Z needs to be on parse trees rather than on
streams of lexical tokens.

I agree that it would be good to start accepting keywords with single-character
Unicode equivalents for the Z operators as variant syntax for the forms with
subscripts.

> The situation with hyphen minus is unfortunate.  I presume the issue is that
> you can't identify which instances of ASCII hyphen-minus are Z subtraction
> operators.

No that’s not really the issue. I just disagree with the choices made in ISO Z.

>
> The following are not well supported on my system:
> 01F13C SQUARED LATIN CAPITAL LETTER M    for %SML%
> 01F143 SQUARED LATIN CAPITAL LETTER T    for %<:%
> 01F149 SQUARED LATIN CAPITAL LETTER Z    for %SZT%
> What do you consider converting them to the expanded form, e.g.
>  %SZT% ---> ⌜↘Z↕
> ?

That is what the original pputf8 scheme did, but I didn’t think it would be
very popular in the long term.

> I think a single character is preferable and ideally it would be somehow
> based on the opening corner '⌜'.  I wonder if there is a combining trick
> possible here.
>

I think it is quite important for these to work, so I have changed to using
circled lower-case letters.

>
>> I am particularly interested to know how good the coverage for the
>> Unicode characters needed is on different systems and on different
>> at the above Web page, which includes GIFs that show how LaTeX renders
>> the ProofPower symbols alongside the Unicode translation as rendered by
>> your browser. To see how good your mail client is, the following should
>> look something like the HOL definition of distributed union:
>>
>> ⌜∀ x a⦁ x ∈ ⋃ a ⇔ (∃ s⦁ x ∈ s ∧ s ∈ a)⌝
>
> This is readable on my system - all glyphs are available.  The sizes are a
> little odd but that could be due to font substitution.
>
It does seem to be very common for the quantifiers to be displayed not much
taller than lower-case letters.

>
>> (There is a complete list of the symbols used at the end of this e-mail.)
>>
>> Current indications are that Mac OS can do everything that is required,
>> but that recent Linux systems and MS Windows tend to miss one or two
>> glyphs: the squared upper case letters I am using for some of the
>> quotation symbols seem to be poorly supported, for example. Most text
>> editors and word processors on the systems I have tried do support
>> UTF-8. If anyone can advise on how to extend the range of Unicode glyphs
>> available on Linux or on how to get font substitution working nicely
>> with OpenMotif, I would be very grateful.
>
> On Linux, perhaps it helps to install or upgrade the package
>  stix-fonts (Fedora)
>  fonts-stix (Ubuntu)
> or, without using packages, follow these instructions:
> http://www.stixfonts.org/install.html
> With fontconfig, I suspect a user installation just requires the font to be
> dropped into ~/.fonts and it just works.
>
>
>> There is a prototype implementation of the translation as two
>> free-standing programs pputf8 and utf8pp here:
>>
>> https://github.com/RobArthan/pp-contrib
>>
>> (These have been around for a while, but I have just changed them to
>> implement the somewhat simplified and extended scheme described in the
>> Web page cited above.)
>>
>> My immediate plan is to integrate the translation into ProofPower
>> itself, so that Unicode input and output are options you can select with
>> set_flag (similar to the way that you can currently select ASCII output,
>> by calling set_flag(“use_extended_chars”, false)). I will provide
>> separate flags for input and output.
>
> Great - I look forward to seeing Z fly past in my shell!!  Now I need to work
> out how to get the same key mapping working in Linux generally.
>
>
Please let us know how you get on with that.

>> Below, as a further test of your e-mail client, you will find the ASCII
>> encoding and UTF-8 encoding of the corresponding Unicode for all the
>> characters in the ProofPower extended character set.
>
> I have checked the Unicode support with a few programs.  I suspect that the
> user experience will also be affected by which fonts are available, in
> particular STIX, because fontconfig performs glyph substitution.
>
>
> Thunderbird 17.0.2, this email
>
> The following have no glyph available:
> >       2  %<:%: 🅃
> >      20  %SML%: 🄼
> >      24  %SZT%: 🅉
>
> Also:
> >      27  %Upsilon%: Υ    -- glyph appears identical to capital Y
> >     108  %select%: ϵ     -- very similar to ∈
>
>
> Firefox 18.0.2, pp-unicode.html
>
> The following have no glyph available:
> 0x01F13C
> 0x01F143
> 0x01F149
>
> 0x0003A5 is identical to capital Y
>
>
> I'm using Fedora 16 (still!) and have package versions as follows:
>
> \$ rpm -q stix-fonts fontconfig thunderbird firefox
> stix-fonts-1.0.0-2.fc15.noarch
> fontconfig-2.8.0-8.fc16.x86_64
> fontconfig-2.8.0-8.fc16.i686
> thunderbird-17.0.2-1.fc16.x86_64
> firefox-18.0.2-1.fc16.x86_64
>
>
> Regards,
> Phil
>
> _______________________________________________
> Proofpower mailing list
> Proofpower@lemma-one.com
> http://lemma-one.com/mailman/listinfo/proofpower_lemma-one.com

Thanks again!

Rob.


_______________________________________________
Proofpower mailing list
Proofpower@lemma-one.com
http://lemma-one.com/mailman/listinfo/proofpower_lemma-one.com