[ProofPower] Support for Unicode and UTF-8

Rob Arthan Sun, 07 Feb 2016 08:11:14 -0800

As it says in the CHANGES entry for OpenProofPower 3.1w5:

>  The Reader/Writer now supports input and output in the UTF-8 encoding
> of a Unicode representation of the mathematical symbols. This gives an
> alternative to the ProofPower extended character set that is more
> portable and convenient for many purposes. (This is not yet supported by
> xpp or the document preparation system.)



I would welcome feedback on the following thoughts about what to
do next with xpp and the document preparation system.

For the document preparation system, we want to support
documents that use the UTF-8 encoding.  Some systems try to
distinguish between UTF-8 and other encodings automatically
or by including the UTF-8 byte order mark (BOM) as a magic
number at the the head of the file. I don’t think either of these
approaches is promising for the ProofPower document preparation
system. So I propose using a new filename extension “.ppd”
for UTF-8 encoded ProofPower documents that would be recognised
by docsml and doctex.

docsml would simply pass UTF-8 encodings on unchanged and
the pp scripts would now default to running
the ProofPower database with input and output in UTF-8. 
This will mean that if you run docsml and then run pp
in a terminal window on the resulting “.sml” file, you will see the proper
mathematical symbols, provided your system has reasonable support
for UTF-8 and Unicode. It also means that text-processing utilities
like “grep” will work properly on “.sml” files without you having to set
LANG=C. Likewise, off-the-shelf text editors should work well on
all the files involved.

doctex would map the UTF-8 encodings for ProofPower symbols
into the appropriate LaTeX commands. I propose that it will leave
UTF-8 encodings that are not recognised as ProofPower symbols
unchanged (so if you you are using the utf8 package in LaTeX for
purposes of internationalization, then things will work as you would
expect). The sieve keyword file format would be extended to allow
Unicode code points to be given as equivalents for keywords,
allowing you to extend the range of supported mathematical
symbols if you desire.

The sieve view file and sieve keyword file format would support 
a keyword indicating that the file uses the UTF-8 encoding.
This is just for backwards compatibility for any users who have
their own versions of these files and don’t want to upgrade
them to UTF-8 just yet - the files supplied with the ProofPower
distribution would have this keyword.

I have been investigating the options for xpp and have concluded that
making it use UTF-8 internally would require either some substantial
work on the OpenMotif GUI toolkit that it uses or porting it to a more
modern GUI toolkit. These aren’t very attractive options in the short
term. So my proposal is that xpp would use the existing extended
character set internally and convert to and from UTF-8 as required
(as inferred from the filename or by selecting an option). To allow
xpp to work this way, there will be a way of invoking pp so that
it does input and output in the ProofPower encoding.

The intention of the above is that all the text files in the ProofPower
distribution will use the UTF-8 encoding from now on. Your existing
documents will still work and you can convert to using UTF-8 gradually
if you wish. If for some reason you maintain “.sml” files rather than just
generating them as needed from “.doc” (or now “.ppd” files), then you
can use xpp (or the pputf8 and utf8pp programs in the pputf8 contrib)
to convert old “.sml” files to UTF-8 if you wish.

Regards,

Rob.



_______________________________________________
Proofpower mailing list
[email protected]
http://lemma-one.com/mailman/listinfo/proofpower_lemma-one.com

[ProofPower] Support for Unicode and UTF-8

Reply via email to