>> The reason is because certain Strings can't be represented in the Text >> node of XML documents. We ran across this problem in practice when >> students started writing programs and copying and pasting content from >> the web, which introduced characters like vertical tabs and other >> characters that can't be represented in XML text nodes. > > What about CDATA, or escaping xml sequences?
This did not work when I tried it. See: https://sourceware.org/bugzilla/show_bug.cgi?id=4462 for an example of the kind of things that "fix" the problem, which is to say, it doesn't: in the example above, the particular "fix" sanitizes the original content of the string. Let's try the following in Racket using the xml library: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > (define msg "\f") > msg "\f" > (require xml) > (xexpr->string `(message ,msg)) "<message>\f</message>" > (define test-file (open-output-file "test-bad.xml")) > (write-xexpr `(message ,msg) test-file) > (close-output-port test-file) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Huh! Unfortunately, that's a bug in Racket's xml library. You're not allowed to put form feed characters in xml text nodes: it violates the XML 1.0 standard. I'll file a bug when I have time. Let's try reading this "test-bad.xml" file from another client library just to show what happens: ################################################################ dannyyoo@melchior:~$ python Python 2.7.3 (default, Feb 27 2014, 19:58:35) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import xml.dom.minidom >>> dom1 = xml.dom.minidom.parse("test-bad.xml") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/xml/dom/minidom.py", line 1920, in parse return expatbuilder.parse(file) File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 924, in parse result = builder.parseFile(fp) File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 207, in parseFile parser.Parse(buffer, 0) xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 9 ################################################################ Better. Or worse, depending on your perspective. Python's xml.dom.minidom library properly reports that the file is malformed. If we stick with XML 1.0, you can't represent this data structure without encoding it external to XML. So when folks say that XML is just like s-expressions or JSON, I pause. ____________________ Racket Users list: http://lists.racket-lang.org/users