Skip to site navigation (Press enter)

Re: utf-8 encoding

Patrick Ohly Tue, 25 Feb 2003 01:07:37 -0800

Hi again,

sorry for dropping out of the discussion, but I have been away for
a week. Going through the discussion once more and doing some
experiments with the attached sample page led me to these conclusion:


- characters with utf-8 encoding are copied verbatim to the pdb file
- the encoding of the pdb is set to utf-8 correctly
- the V1.2 plucker viewer cannot handle utf-8 and interprets the
  byte sequence C3 A4 (a umlaut) as A tilde and Euro, as if it
  was Latin encoding

Is this correct?

The suggested solution by David was to do a preprocessing that
replaces utf-8 characters with html entities. However, this
won't work for pages downloaded by plucker-build automatically,
as there is no way to modify this page on-the-fly at the moment.

Would patches to the CVS head revision of plucker-build be accepted?

I'd like to add:
- the possibility to write filters for each downloaded page
  in Python and external programs (I'd use this do strip down
  pages a bit, too)
- conversion from utf-8 to html entities

-- 
Freundliche Gruesse / Best Regards

Patrick Ohly
Senior Software Engineer
--------------------------------------------------------------------
//// pallas 
Pallas GmbH / Hermuelheimer Str. 10 / 50321 Bruehl / Germany
[EMAIL PROTECTED] / www.pallas.com
Tel +49-2232-1896-30 / Fax +49-2232-1896-29
--------------------------------------------------------------------

Title: Umlaute

Umlaute

Entity: ä

UTF-8: ä