Re: Re: Re: re: XML Headaches

David Bovill Mon, 09 Jul 2007 04:48:34 -0700

Is the text actually UTF8 encoded - saying that it contians an an accented e
(é) - and reading docs / doing this by hand may be a bit error prone? The
first thing I'd do is check the XML with a validator and make sure that
works - before looking for bugs?


I've got some documentation with links to the best sources I can find here:
http://handlers.rev-co.de/wiki/XML

This bit would seem relevant:

That means that in a UTF-8 XML document, you cannot simply use a single byte

with decimal value 233 to represent "�" (and there is no predefined &eacute;
entity as there is in HTML); instead, you must either enter the UTF-8
multi-byte escape sequence, or use a special kind of XML reference called a
character reference:

<p>That is everyone's favourite caf&#233;.</p>

When your text consists primarily of unaccented Roman characters, this is
often the easiest way to escape the occasional accented or non-Roman
character. Since "�" appears at position 233 in Unicode (as in ISO-8859-1),
the XML parser will read the string correctly as "That is everyone's
favourite caf�."

I also put yur XML through this online validation service and found a bunch
of errors:  http://www.xml.com/pub/a/tools/ruwf/check.html

Hope this helps.

_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Re: Re: re: XML Headaches

Reply via email to