On Fri, 2003-02-14 at 17:12, David A. Desrosiers wrote:
> 
> > >   I agree as well, and unfortunately, that page isn't properly using
> > > any encoding, thus the failure.
> 
> > Just to make sure I get it right: you think that the a umlaut is not
> > encoded correctly according to utf-8?
> 
>       No, it is encoded right, assuming the page properly declares the
> charset as UTF-8, which the original page does not.

So Content-Type: text/html; charset=utf-8
in the HTTP header is not sufficient? I verified with ethereal
that my web browser really gets this information, and in
StructuredHTMLParser/__init__ this information is also still
available (line 847 in head revision of TextParser.py).

> > Well, I still worry :-/ I cannot change how the page generates the page
> > and I cannot filter the pages while plucker-build downloads them.
> > Overriding the encoding with --charset=utf-8 also had no effect, the
> > characters are still broken in the Plucker viewer.
> 
>       Parse it down locally and regex it out with before_command and some
> sed or perl,

But before_command is not executed for each page, is it? The reason I'm
asking  is that I am following links on this site, and the url I
gave is just one example of the pages that are downloaded and
converted automatically by plucker-build.

-- 
Freundliche Gruesse / Best Regards

Patrick Ohly
Senior Software Engineer
--------------------------------------------------------------------
//// pallas 
Pallas GmbH / Hermuelheimer Str. 10 / 50321 Bruehl / Germany
[EMAIL PROTECTED] / www.pallas.com
Tel +49-2232-1896-30 / Fax +49-2232-1896-29
--------------------------------------------------------------------

_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list

Reply via email to