On 29.10.2004 08:44, Tuomo L wrote:
We're having some serious encoding problems. This happens only with the @href attributes in html, when using characters like �, � and � (in Finnish alphabet). Form encoding works just fine. I've gone through all the threads concerning encoding (other people having encoding problems too). No luck so far. Is this still an issue in Cocoon? Could someone please tell what's wrong?
What's the page encoding? Forms work like expected? Just the links don't work? This normally points to a different page encoding than UTF-8 as link requests are encoded in UTF-8 while form requests are encoded in page encoding. I don't think it is a Cocoon issue.
First a link about all the encodings: http://wiki.apache.org/cocoon/RequestParameterEncoding (mostly written by Bruno).
According to IE, the page encoding is set to UTF-8. The container-encoding and form-encoding in web.xml (Tomcat) are set to UTF-8.
The container-encoding should not be touched at all and remain ISO-8859-1.
HTMLSerializer is set to use UTF-8 (mime-type="text/html; charset=utf-8") and has the parameter <encoding>UTF-8</encoding>.
This should result in <meta http-equiv="Content-Type" content="text/html;charset=utf-8">. The request encoding header should have the same value ... what's not that easy when using a recent Tomcat: http://issues.apache.org/bugzilla/show_bug.cgi?id=26997
The xsl stylesheets use ISO-8859-1, though.
That's not a problem.
I've also tried setting everything to ISO-8859-1, but the problem with the href-attributes in html remains. Mozilla Firefox shows the characters correctly when doing "view source", but if I save the document on disk and open with ASCII-editor, the encoding is wrong there with both IE and Mozilla. So maybe it's not a browser problem?
Here's an example:
<a href="��" foo="��">��</a>
becomes:
<a href="%C3%A4%C3%B6" foo="äö">äö</a>
when it should read (I think):
<a href="äö" foo="äö">äö</a>
... follow-up mail:
The URL-encoding is done wrong when serializing to HTML. According to specs "��" should become "%E4%F6" when encoded, not "%C3%A4%C3%B6". This seems to be the problem. So far I've noticed this problem with the HREF-attribute only.
For a test I made a styslesheet that substitutes "�" with "%E4" before serializing to HTML. This works, but it should be done by the serializer, right?
Seems like a Cocoon issue.
If it would be an error at all, it would be a Xalan serializer problem I think. But there were bugs reported on this topic and rejected because of the specs (I think they have the same problems like you):
http://nagoya.apache.org/jira/browse/XALANJ-1412 http://nagoya.apache.org/jira/browse/XALANJ-1548
As I wrote: you simply get different request encodings when sending a form or just clicking <a href=""/>.
Joerg
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
