Hi Yury,
so we agree? The bug is in HTMLGenerator, but the expected encoding
isn't UTF-8 (reading from http://www.w3.org/ doesn't work for me
(NullPointerException)), but ISO-8859-1 or maybe the default encoding of
the JVM. Can you file a bug in bugzilla?
Regards,
Joerg
Yury Mikhienko wrote:
Hi Joerg!
Thanx for your reply.
The pure Tidy works properly (output stream encoding is the same as the input stream
encoding).
The problem, from my point of view, is in transformer (or streamer [if xpath is null
value]) input stream encoding (HTMLGenerator),
because Tidy DOM parser returns KOI
Hello Yuri,
I only can confirm the bug in HTML generator. It seems it can not read
the KOI8-R encoded file correctly. I tested it with your html snippet
saved to a static file.
serializer.setOutputProperty(OutputKeys.ENCODING, "KOI8-R"); of course
does not help, because that's only the output.
Hi all!
Can anyone help me with the following problem:
I have a KOI8-R encoded HTML document. After processing this document with
HTMLGenerator, in output I have ISO-8859-1 encoded document :((
for example
The source document:
(from URL: /test)
ðÒÉ×ÅÔ!
ðÒÉ×ÅÔ!
(in sitemap.xmap):