Re: Avoiding the escaping UTF-8 unicode text

david_n_bertoni 8 Mar 2004 19:18:55 -0000


> On Mar 8, 2004, at 11:39 AM, [EMAIL PROTECTED] wrote:
>
> >    "The html output method may output a character using a character
> > entity
> > reference, if one is defined for it in the version of HTML that the
> > output
> > method is using."
> >
> > Many XSLT processors do this, not just Xalan-C, so I'm not sure why you
> > think they should NOT be escaped.  There's no way to change this
> > behavior
> > at this time, unless you want to modify the source code.  If you really
> > need this, you can create a Bugzilla report and request an enhancement.
>
> I think you're missing the point of my original email - it was taking
> what I perceived to be 3-byte UTF-8 character sequences and escaping
> each byte as an HTML entity.  Of course, this turned out to be correct,
> because they weren't well-formed UTF-8, but I didn't realize that at
> the time.  I wasn't trying to suggest that Xalan shouldn't do output
> escaping (we're glad that it does), but rather that Xalan should be
> able to tell the difference between escapable and non-escapable
> characters in UTF-8 - which, it turns out, it can, if you don't screw
> up the encoding.. :-)

Yes, I was confused by the fact you said and XML to XML tranformation
worked correctly, but XML to HTML did not.  Clearly, they must have beeen
with different data sets, so the comparison was not relevant.

I'm curious as to how invalid UTF-8 byte sequences would get into a
transformation.  If the parser did not detect these, that's a problem.  Did
you paste these sequences into a document and parse it?  What was the
encoding declaration on the document?

Dave
Re: Avoiding the escaping UTF-8 unicode text

Reply via email to