On 29.10.2004 08:44, Tuomo L wrote:

We're having some serious encoding problems. This happens only with the @href attributes in html, when using characters like �, � and � (in Finnish alphabet). Form encoding works just fine. I've gone through all the threads concerning encoding (other people having encoding problems too). No luck so far. Is this still an issue in Cocoon? Could someone please tell what's wrong?


What's the page encoding? Forms work like expected? Just the links don't work? This normally points to a different page encoding than UTF-8 as link requests are encoded in UTF-8 while form requests are encoded in page encoding. I don't think it is a Cocoon issue.

First a link about all the encodings: http://wiki.apache.org/cocoon/RequestParameterEncoding (mostly written by Bruno).

According to IE, the page encoding is set to UTF-8. The
container-encoding and form-encoding in web.xml (Tomcat) are set to UTF-8.

The container-encoding should not be touched at all and remain ISO-8859-1.

HTMLSerializer is set to use UTF-8 (mime-type="text/html; charset=utf-8")
and has the parameter <encoding>UTF-8</encoding>.

This should result in <meta http-equiv="Content-Type" content="text/html;charset=utf-8">. The request encoding header should have the same value ... what's not that easy when using a recent Tomcat: http://issues.apache.org/bugzilla/show_bug.cgi?id=26997

The xsl stylesheets use ISO-8859-1, though.

That's not a problem.

I've also tried setting everything to ISO-8859-1, but
the problem with the href-attributes in html remains. Mozilla Firefox
shows the characters correctly when doing "view source", but if I save the
document on disk and open with ASCII-editor, the encoding is wrong there
with both IE and Mozilla. So maybe it's not a browser problem?

Here's an example:

<a href="��" foo="��">��</a>

becomes:

<a href="%C3%A4%C3%B6" foo="&auml;&ouml;">&auml;&ouml;</a>

when it should read (I think):

<a href="&auml;&ouml;" foo="&auml;&ouml;">&auml;&ouml;</a>

... follow-up mail:
The URL-encoding is done wrong when serializing to HTML. According to
specs "��" should become "%E4%F6" when encoded, not "%C3%A4%C3%B6".
This seems to be the problem. So far I've noticed this problem with
the HREF-attribute only.

For a test I made a styslesheet that substitutes "�" with "%E4"
before serializing to HTML. This works, but it should be done by the
serializer, right?

Seems like a Cocoon issue.

If it would be an error at all, it would be a Xalan serializer problem I think. But there were bugs reported on this topic and rejected because of the specs (I think they have the same problems like you):

http://nagoya.apache.org/jira/browse/XALANJ-1412
http://nagoya.apache.org/jira/browse/XALANJ-1548

As I wrote: you simply get different request encodings when sending a
form or just clicking <a href=""/>.

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to