I'm seeing some (I think) very strange behaviour from the XML library...

(warning, this is quite long, and won't be of much interest to anyone who isn't using the library...)

This is on an intel macintosh, OS 10.4.11

In a button, I have the following script:

-----
on mouseUp
   put toXml() into tXml
   put tXml & cr & fromXml(tXml)
end mouseUp

function toXml
   put "<whatshappening></whatshappening>" into tXml
   put revCreateXmlTree(tXml, true, true, false) into tTree
   put revXmlRootNode(tTree) into tNode

   revAddXmlNode tTree, tNode, "name", "fred"

   put revXmlText(tTree) into tText
   revDeleteXmlTree tTree
   return tText
end toXml

function fromXml pXml
   put revCreateXmlTree(pXml, true, true, false) into tTree
   put revXmlRootNode(tTree) into tNode
   put revXmlFirstChild(tTree, tNode) into tChild

   put revXmlNodeContents(tTree, tChild) into tContent
   revDeleteXmlTree(tTree)
   return tChild & cr & tContent
end fromXml
-----

The output is:

<?xml version="1.0"?>
<whatshappening><name>fred</name></whatshappening>

/whatshappening/name
fred

So all is good. If I change "fred" in the toXml function to "fréd", (acute accent on the 'e'), I get this:

<?xml version="1.0"?>
<whatshappening><name></name></whatshappening>

/whatshappening/name

The content has simply disappeared, so I guess I need to encode non- ascii material. OK, but as what? (ideally UTF-8), and how do I indicate what I've done in my xml document?

However, if I now add an accented string as an attribute:

-----
function toXml
   put "<whatshappening></whatshappening>" into tXml
   put revCreateXmlTree(tXml, true, true, false) into tTree
   put revXmlRootNode(tTree) into tNode

   revAddXmlNode tTree, tNode, "name", "fred"
   revSetXmlAttribute tTree, tNode & "/name", "orig", "fréd"

   put revXmlText(tTree) into tText
   revDeleteXmlTree tTree
   return tText
end toXml

function fromXml pXml
   put revCreateXmlTree(pXml, true, true, false) into tTree
   put revXmlRootNode(tTree) into tNode
   put revXmlFirstChild(tTree, tNode) into tChild

   put revXmlNodeContents(tTree, tChild) into tContent
   put revXmlAttribute(tTree, tChild, "orig") into tAtt
   revDeleteXmlTree(tTree)
   return tChild & cr & tContent & cr & tAtt
end fromXml
-----

I get:

<?xml version="1.0" encoding="ISO-8859-1"?>
<whatshappening><name orig="fréd">fred</name></whatshappening>

/whatshappening/name
fred
frŽd

An encoding attribute has now been aded to the xml header, and some version of the "orig" attribute value (not ISO-8859-1, as far as I can tell) has been produced. ????

So, finally, is there a way to encode xml documents as UTF-8 (or whatever) without having to encode each part myself, and add the encoding attribute to the header myself?

What is slightly worrying is that it seems the library will add an encoding attribute to the header in some circumstances, but not others.

Ken (if you're reading this), does your library deal with this stuff better?

Best,

Mark

_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to