Joerg Heinicke wrote:
On 30.10.2004 02:42, Marc Portier wrote:
That late? ;-)
ugh
But then in the bug report for Xalan (someone having this same problem) it says:
"According to section 16.2 of the XSLT Recommendation [1], non-ASCII characters in URI attribute values should be escaped using the method recommended in Section B.2.1 of the HTML 4.0 Recommendation [2]. The latter recommends that non-ASCII characters be represented in UTF-8 prior to applying the "%HH" escaping described by the URI RTF, regardless of the output encoding."
nifty, didn't know... so whatever output encoding you set the uri's will be utf-8 encoded, and then url-encoded?
Yes, that's how I understand it and wrote it in my first reply to Tuomo's question.
haven't ever seen this, I was under the impression that to xalan attributes were just attributes and would have expected characters to be replaced by character-entity-refs depending on if they are supported or not by the applied output-encoding
No, Xalan handles href attributes differently.
thx for boosting my knowledge, now, this isn't actually making things easier, is it?
This is what Xalan does (HTML serialization), so it obeys the spec.
Correct me if I'm wrong, but during serialization if there are special characters (above 128) in an URL:s request parameters (href-attributes etc.), they are first encoded in UTF-8 by Xalan. Even if the browser
apparently, would like to see some test evidence to be on the safe side though
I can confirm this behaviour for old versions of Xalan coming with Cocoon 2.0 RC 1. At that time we tried to produce links with request params and they did not work because of encoding. We had to change the links to some form.submit() javascript stuff.
detects the page as ISO-8859-1 or anything else, these URL:s in the HTML source contain parameters in UTF-8. Now, when user clicks on this link,
but it is not about request-parameters is it?
It is as far as I understand.
well, then I missed some question-mark somewhere ;-)
scanning back through the history I did find this:
<a href="��" foo="��">��</a>
this is NOT about request-parameter values IMHO
it is about the proper URL part, no?
Don't know exactly. Had no tests for URL part and request param part.
as in:
http://server:port/path/more-path?request-param=value ---------------------------------|------------------- >> area-not-fixed-by-cocoon << | >> area fixed by cocoon <<
(in fact I'm even doubthing if we are fixing the names of the request-params (actually my guess would be we're only doing the values))
see http://cvs.apache.org/viewcvs.cgi/cocoon/trunk/src/java/org/apache/cocoon/environment/http/HttpRequest.java?rev=55600&root=Apache-SVN&view=auto
there is the internal decode() method. it gets only called from areas that do with request-parameter-values (as I started to think: not even the names)
Cocoon reads the request parameters in as ISO-8859-1, and converts them to UTF-8, without knowing that these parameters were already UTF-8!
That's how I understand it (just the first part is not done by Cocoon, but by the container as Mark wrote below too).
nope, don't think so... first nuance (see above) the container reads and applies (typically) ISO-8859-1,...
and cocoon correctly re-encodes request-parameter-values based on its 'form-encoding', but isn't (at least to my knowledge) touching the url part of things
But if you convert values from ISO-8859-1 to UTF-8 though they already have been UTF-8 and not ISO-8859-1 you are in troubles like Tuomo, aren't you?
you get me doubthing :-)
first reading said yes, but I'm not convinced, as long as it is about the values and not the names or the @action part we're in good shape, no?
taking one step at the time (what am I not seeing?):
- suppose a sax stream (producing xhtml) before serialization has a @href holding an eurosign (\u20AC unicode char)
- I hear you guys saying that xalan will recognize the uri-type attribute and serialize this character out as %E2%82%AC regardless of the chosen output encoding (didn't catch it but I am assuming that the output-encoding is set to UTF-8 anyways, and matches the form-encoding setting)
- so we get an html page out telling the browser it is utf-8 encoded
- so the browser will apply utf-8 encoding to form-values (and names) if this were about a form, but it's about this ready @href
- now this @href already has this same encoding (thx xalan) in place: so things should work the same as for the form (as long as the mentioned eurosign is strictly in the parameter-values)
So assuming all this reasoning is ok, what could never work is this:
- change your form-encoding (and matching setting of serialization) to anything else then UTF-8, cos then request-params in forms and pre-built ones in url's get encoded differently and we have no way to make a distinction over at cocoon's side
It's sad news for Tuomo, but I can't see why it wouldn't be just working if (and only if)
- this is about parameter-values and NOT about URL's or parameter-names (because there we *need* to do some work)
- container-encoding is traditionally set to ISO-8859-1 (unless using a container like jetty where you can modify it's internal behaviour)
- form-encoding is strictly kept to 'utf-8' (thx for the lesson) and the serializer follows that (meta-equiv and all)
regards, -marc= -- Marc Portier http://outerthought.org/ Outerthought - Open Source, Java & XML Competence Support Center Read my weblog at http://blogs.cocoondev.org/mpo/ [EMAIL PROTECTED] [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
