On Fri, 2003-02-14 at 17:12, David A. Desrosiers wrote: > > > > I agree as well, and unfortunately, that page isn't properly using > > > any encoding, thus the failure. > > > Just to make sure I get it right: you think that the a umlaut is not > > encoded correctly according to utf-8? > > No, it is encoded right, assuming the page properly declares the > charset as UTF-8, which the original page does not.
So Content-Type: text/html; charset=utf-8 in the HTTP header is not sufficient? I verified with ethereal that my web browser really gets this information, and in StructuredHTMLParser/__init__ this information is also still available (line 847 in head revision of TextParser.py). > > Well, I still worry :-/ I cannot change how the page generates the page > > and I cannot filter the pages while plucker-build downloads them. > > Overriding the encoding with --charset=utf-8 also had no effect, the > > characters are still broken in the Plucker viewer. > > Parse it down locally and regex it out with before_command and some > sed or perl, But before_command is not executed for each page, is it? The reason I'm asking is that I am following links on this site, and the url I gave is just one example of the pages that are downloaded and converted automatically by plucker-build. -- Freundliche Gruesse / Best Regards Patrick Ohly Senior Software Engineer -------------------------------------------------------------------- //// pallas Pallas GmbH / Hermuelheimer Str. 10 / 50321 Bruehl / Germany [EMAIL PROTECTED] / www.pallas.com Tel +49-2232-1896-30 / Fax +49-2232-1896-29 -------------------------------------------------------------------- _______________________________________________ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list

