On 30.10.2004 02:42, Marc Portier wrote:

That late? ;-)

But then in the bug report for Xalan (someone having this same problem) it says:

"According to section 16.2 of the XSLT Recommendation [1], non-ASCII characters in URI attribute values should be escaped using the method recommended in Section B.2.1 of the HTML 4.0 Recommendation [2]. The latter recommends that non-ASCII characters be represented in UTF-8 prior to applying the "%HH" escaping described by the URI RTF, regardless of the output encoding."


nifty, didn't know... so whatever output encoding you set the uri's will be utf-8 encoded, and then url-encoded?

Yes, that's how I understand it and wrote it in my first reply to Tuomo's question.


haven't ever seen this, I was under the impression that to xalan attributes were just attributes and would have expected characters to be replaced by character-entity-refs depending on if they are supported or not by the applied output-encoding

No, Xalan handles href attributes differently.

This is what Xalan does (HTML serialization), so it obeys the spec.

Correct me if I'm wrong, but during serialization if there are special characters (above 128) in an URL:s request parameters (href-attributes etc.), they are first encoded in UTF-8 by Xalan. Even if the browser

apparently, would like to see some test evidence to be on the safe side though

I can confirm this behaviour for old versions of Xalan coming with Cocoon 2.0 RC 1. At that time we tried to produce links with request params and they did not work because of encoding. We had to change the links to some form.submit() javascript stuff.


detects the page as ISO-8859-1 or anything else, these URL:s in the HTML source contain parameters in UTF-8. Now, when user clicks on this link,

but it is not about request-parameters is it?

It is as far as I understand.

it is about the proper URL part, no?

Don't know exactly. Had no tests for URL part and request param part.

as in:

http://server:port/path/more-path?request-param=value
---------------------------------|-------------------
 >>  area-not-fixed-by-cocoon << |  >> area fixed by cocoon <<

(in fact I'm even doubthing if we are fixing the names of the request-params (actually my guess would be we're only doing the values))

see http://cvs.apache.org/viewcvs.cgi/cocoon/trunk/src/java/org/apache/cocoon/environment/http/HttpRequest.java?rev=55600&root=Apache-SVN&view=auto

there is the internal decode() method. it gets only called from areas that do with request-parameter-values (as I started to think: not even the names)

Cocoon reads the request parameters in as ISO-8859-1, and converts them to UTF-8, without knowing that these parameters were already UTF-8!

That's how I understand it (just the first part is not done by Cocoon, but by the container as Mark wrote below too).


nope, don't think so... first nuance (see above) the container reads
and applies (typically) ISO-8859-1,...

and cocoon correctly re-encodes request-parameter-values based on its 'form-encoding', but isn't (at least to my knowledge) touching the url part of things

But if you convert values from ISO-8859-1 to UTF-8 though they already have been UTF-8 and not ISO-8859-1 you are in troubles like Tuomo, aren't you?


Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to