I have a problem concerning special german characters occuring in urls. I made a minimal example to show my problems. Assume following pipeline snippet:

<map:match pattern="SpecialCharacters.html">
  <map:generate type="file" src="context://content/test1.xml"/>
  <map:serialize type="html"/>
</map:match>

The test1.xml looks like this. Please consider the special german characters in the url (hope the are displayed correctly in your mail client):

<?xml version="1.0" encoding="iso-8859-1" ?>
<html>
   <head>
     <title>Test</title>
   </head>
   <body>
      <a href="�Test.html">�Test</a>
      <a href="�Test.html">�Test</a>
   </body>
</html>

The HTML-Serializer encodes the urls to following output (source code of HTML file):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/TR/html4/loose.dtd";>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Test</title>
</head>
<body>
<a href="%C3%9CTest.html">&Uuml;Test</a>
<a href="%C3%84Test.html">&Auml;Test</a>
</body>
</html>


So the � is encoded to %C3%9C and � to %C3%84 but I need %DC for � and %C4 for �.

The java.net.URLEncoder.encode method brings the following:

System.out.print(java.net.URLEncoder.encode("��","UTF-8"));
Result: %C3%9C%C3%84

System.out.print(java.net.URLEncoder.encode("��","ISO-8859-1"));
Result: %DC%C4

So why does the serializer does this UTF-8 url encoding? In the web.xml I set the container-encoding and form-encoding parameters to ISO-8859-1 without any changes. Serializer is the defined the following way in the sitemap:

<map:serializer logger="sitemap.serializer.html" mime-type="text/html"

    name="html" pool-grow="4" pool-max="32" pool-min="4"
    src="org.apache.cocoon.serialization.HTMLSerializer">
 <doctype-public>-//W3C//DTD HTML 4.01 Transitional//EN</doctype-public>
 <doctype-system>http://www.w3.org/TR/html4/loose.dtd</doctype-system>
 <encoding>ISO-8859-1</encoding>
</map:serializer>

Can you give me any hints how I get the url correctly encoded? (need it for further database lookups).

Cocoon: Dev-Snapshot from 2004-03-29
Java: 1.4.2_03

Thanks for your help

Harald

--
Institut f�r Tourismus- und Geo-Informationssysteme GmbH
Sitz: Friedrichstrasse 57-59 38855 Wernigerode

B�ro: Gie�erweg 5
      38855 Wernigerode            Web:     http://www.itgis.com
                                   Tel:     03943/557807
                                   Fax:     03943/557808

Das Internet-Lexikon - Ein Dienst der ITGIS GmbH:
http://www.knowlex.org

Privat: http://www.harald-wehr.de



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to