Re: Unicode Library

Konstantin Khomoutov Wed, 12 Apr 2006 02:52:19 -0700

On Tue, Apr 11, 2006 at 01:18:59PM -0700, Jeremy Brown wrote:

[...]
> 3. Do codes already exist in the Plucker file format to say,
> "this page is stored as UTF8 / UCS-16 / KOI8-R / ISO-8859-6 /
> etc."?  Or would this be something that might need to be
> decided on and included?
PyPlucker has excellent support for charsets, so you probably
should only concentrate on the reader.


Since I'm that who (partially) implemented support for charsets
in Vade-Mecum (PPC Plucker reader), I can explain how to extract
charset information from the database. And after all, Vade-Mecum
is free and thus you can study its source code.
Feel free to ask questions -- I will glad to help.

P.S.
UCS-16 (which is UTF-16, actually since UCS-* can be either
UCS-2 or UCS-4) cannot be used in Plucker documents since byte 0
is used as an escape code that introduces "text-embedded
functions" used for markup (akin to tags in HTML), so only
ASCII-compatible SBCSs and MBCSs can be used (including UTF-8,
of course). But after all, WWW uses ASCII-based
charsets/encodings AFAIK, so a distiller should not probably be
concerned about UTF-16/32.

_______________________________________________
plucker-dev mailing list
[email protected]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Re: Unicode Library

Reply via email to