Hi again, sorry for dropping out of the discussion, but I have been away for a week. Going through the discussion once more and doing some experiments with the attached sample page led me to these conclusion:
- characters with utf-8 encoding are copied verbatim to the pdb file - the encoding of the pdb is set to utf-8 correctly - the V1.2 plucker viewer cannot handle utf-8 and interprets the byte sequence C3 A4 (a umlaut) as A tilde and Euro, as if it was Latin encoding Is this correct? The suggested solution by David was to do a preprocessing that replaces utf-8 characters with html entities. However, this won't work for pages downloaded by plucker-build automatically, as there is no way to modify this page on-the-fly at the moment. Would patches to the CVS head revision of plucker-build be accepted? I'd like to add: - the possibility to write filters for each downloaded page in Python and external programs (I'd use this do strip down pages a bit, too) - conversion from utf-8 to html entities -- Freundliche Gruesse / Best Regards Patrick Ohly Senior Software Engineer -------------------------------------------------------------------- //// pallas Pallas GmbH / Hermuelheimer Str. 10 / 50321 Bruehl / Germany [EMAIL PROTECTED] / www.pallas.com Tel +49-2232-1896-30 / Fax +49-2232-1896-29 --------------------------------------------------------------------Title: Umlaute
Umlaute
Entity: äUTF-8: ä

