Re: Avoiding the escaping UTF-8 unicode text

david_n_bertoni 8 Mar 2004 20:12:27 -0000


> On Mar 8, 2004, at 2:18 PM, [EMAIL PROTECTED] wrote:
>
> > Yes, I was confused by the fact you said and XML to XML tranformation
> > worked correctly, but XML to HTML did not.  Clearly, they must have
> > beeen
> > with different data sets, so the comparison was not relevant.
>
> Well, *we* didn't think they were different data sets, but the output
> xml went through an unintended change between transformations.

I'm not disputing that, but when you make a claim in a posting, you should
be very careful that claim is true.  Otherwise, you're misleading people,
and it's difficult for them to respond reasonably.

> > I'm curious as to how invalid UTF-8 byte sequences would get into a
> > transformation.  If the parser did not detect these, that's a problem.
> >  Did
> > you paste these sequences into a document and parse it?  What was the
> > encoding declaration on the document?
>
> The problem was that we copied the output XML into a copy-buffer, and
> pasted it into a new document in a new application.  This copy-paste
> operation altered the characters such that they were no longer valid
> UTF-8.  The UTF-8 encoding declaration was in the document.  I was
> passing them through the XalanTransform sample program, which I believe
> dumps all errors to STDOUT, and we didn't see anything reported by the
> transformer.

Can you send me a private emai, with a copy of that document attached?  I'd
like to see why the parser accepts it if it has invalid UTF-8 byte
sequences.  That's a very big problem.

Thanks,

Dave
Re: Avoiding the escaping UTF-8 unicode text

Reply via email to