On Tue, Apr 11, 2006 at 01:18:59PM -0700, Jeremy Brown wrote: [...] > 3. Do codes already exist in the Plucker file format to say, > "this page is stored as UTF8 / UCS-16 / KOI8-R / ISO-8859-6 / > etc."? Or would this be something that might need to be > decided on and included? PyPlucker has excellent support for charsets, so you probably should only concentrate on the reader.
Since I'm that who (partially) implemented support for charsets in Vade-Mecum (PPC Plucker reader), I can explain how to extract charset information from the database. And after all, Vade-Mecum is free and thus you can study its source code. Feel free to ask questions -- I will glad to help. P.S. UCS-16 (which is UTF-16, actually since UCS-* can be either UCS-2 or UCS-4) cannot be used in Plucker documents since byte 0 is used as an escape code that introduces "text-embedded functions" used for markup (akin to tags in HTML), so only ASCII-compatible SBCSs and MBCSs can be used (including UTF-8, of course). But after all, WWW uses ASCII-based charsets/encodings AFAIK, so a distiller should not probably be concerned about UTF-16/32. _______________________________________________ plucker-dev mailing list [email protected] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
