I have a problem concerning special german characters occuring in urls.
The HTML-Serializer encodes the urls to following output (source code of HTML file):
<a href="%C3%9CTest.html">ÜTest</a> <a href="%C3%84Test.html">ÄTest</a>
So the � is encoded to %C3%9C and � to %C3%84 but I need %DC for � and %C4 for �.
The java.net.URLEncoder.encode method brings the following:
System.out.print(java.net.URLEncoder.encode("��","UTF-8")); Result: %C3%9C%C3%84
System.out.print(java.net.URLEncoder.encode("��","ISO-8859-1")); Result: %DC%C4
So why does the serializer does this UTF-8 url encoding?
AFAIK this is the correct behaviour, URLs are UTF-8 encoded.
In the web.xml I set the container-encoding and form-encoding parameters to ISO-8859-1 without any changes.
This has no influence on the serialization at all.
Serializer is the defined the following way in the sitemap:
<map:serializer logger="sitemap.serializer.html" mime-type="text/html" name="html" pool-grow="4" pool-max="32" pool-min="4" src="org.apache.cocoon.serialization.HTMLSerializer"> <doctype-public>-//W3C//DTD HTML 4.01 Transitional//EN</doctype-public> <doctype-system>http://www.w3.org/TR/html4/loose.dtd</doctype-system> <encoding>ISO-8859-1</encoding> </map:serializer>
This influences the encoding of the pages as you expect it, but not of URLs. I also would like to get it confirmed with a spec, but until now I didn't find it.
Can you give me any hints how I get the url correctly encoded? (need it for further database lookups).
Don't use URLs, but forms. They are encoded as expected and can be read correctly. This is why the parameter in cocoon.xconf is also called form-encoding.
Joerg
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
