Tuomo L wrote:
taking one step at the time (what am I not seeing?):
- suppose a sax stream (producing xhtml) before serialization has a @href holding an eurosign (\u20AC unicode char)
- I hear you guys saying that xalan will recognize the uri-type attribute and serialize this character out as %E2%82%AC regardless of the chosen output encoding (didn't catch it but I am assuming that the output-encoding is set to UTF-8 anyways, and matches the form-encoding setting)
- so we get an html page out telling the browser it is utf-8 encoded
- so the browser will apply utf-8 encoding to form-values (and names) if this were about a form, but it's about this ready @href
- now this @href already has this same encoding (thx xalan) in place: so things should work the same as for the form (as long as the mentioned eurosign is strictly in the parameter-values)
So assuming all this reasoning is ok, what could never work is this:
- change your form-encoding (and matching setting of serialization) to anything else then UTF-8, cos then request-params in forms and pre-built ones in url's get encoded differently and we have no way to make a distinction over at cocoon's side
You're right.
thx for confirming
It's sad news for Tuomo, but I can't see why it wouldn't be just working if (and only if)
- this is about parameter-values and NOT about URL's or parameter-names (because there we *need* to do some work)
Yes, I was talking about parameter values all the time, but didn't show it clear enough in the example. It should be:
<a href="someurl?foo=��" foo="��">��</a>
ok, that makes things clear
Where the foo's value gets UTF-8 encoded by Xalan during serialization, no matter what the settings are where ever.
- container-encoding is traditionally set to ISO-8859-1 (unless using a container like jetty where you can modify it's internal behaviour)
Mine is set to ISO-8859-1.
good, keep it like that
- form-encoding is strictly kept to 'utf-8' (thx for the lesson) and the serializer follows that (meta-equiv and all)
These don't help either, since the UTF-8 encoded parameter values are read in as ISO-8859-1 and the output is invalid. If these parameter
now this I don't understand
they are indeed read in using ISO-8859-1, but then inside cocoon they get re-en-decoded:
1. yourUtf8UrlEncodedValue --> first urldecoded and then interpreted by container using ISO-8859-1
2. this result re-encoded by cocoon using 'container-encoding' (==ISO-8859-1)
3. the bytes coming out of that should equal the bytes of the parameter-value right after url-encoding
4. so decoding these with 'form-encoding' (==UTF-8) should really just work
values are now put for example in database, there are several '?'-marks where those special characters should appear.
well, as a general remark you have to be careful with both
1. databases --> they typically have an encoding set too, and you should consult the settings of your jdbc driver to make sure you're not having a mismatch there
2. interpreting question-marks: I remember spending oodles of time looking at something that worked all the time just because the tool I used to read the logfiles or sql-output was not supporting the encoding or was using a font that had no glyph for a certain character then you can spot these questionmarks while all is well in fact)
anyways: safest thing to do is some code step debugging (at the level of the 'decode' method mentioned earlier) or inserting javacode that counts the length of the string or even better compares/or dumps intvalues of all chars in JVM memory
best to take it one step at a time...
Maybe I just have to send the parameters within a form (as Joerg had done it), which is not a very practical when you only need to do a simple HTTP-GET with parameters. Or then I use a XSL-stylesheet which
I agree
and as argued above this doesn't make sense: the form will be encoding the values exactly in the same way (ie. first utf-8 then url-encode) as xalan prepared things... so things should really just work IMHO
converts all the special characters in parameter values to ISO-8859-1
juk
before Xalan serialization. This works, but is also inpractical, since I have to write a long xsl:choose-section. Doing it this way also decreases the performance of my application.
Can we come up with a better solution?
Thank you guys for taking interest in this issue.
I'ld like to just understand first, and if we need to then also fix this for sure...
regards,
-marc= (off for 5 days helas, I hope you guys find a nice way out - and let us know)
regards, -marc= -- Marc Portier http://outerthought.org/ Outerthought - Open Source, Java & XML Competence Support Center Read my weblog at http://blogs.cocoondev.org/mpo/ [EMAIL PROTECTED] [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
