Hi!

I have been in the wondrous world of XML-parsing last week and I've found a
missing mechanism in clojure.xml - the escaping the predefined entities:

Example:
(use 'clojure.xml)

(def xmlelem (parse (new org.xml.sax.InputSource (new java.io.StringReader
"<tag greet=\"Clojure&amp;co\"/>")))) ;; parsing string <tag
greet="Clojure&amp;co"/>

xmlelem   ;; looks like this:
{:tag :tag, :attrs {:greet "Clojure&co"}, :content nil}

(emit-element xmlelem)
<tag greet='Clojure&co'/>

This output is not standard compliant and is not even re-parseable with the
code above. The problem is the sole ampersand (&).

Wikipedia [1] says

There are five *predefined entities*:

   - &lt; represents "<"
   - &gt; represents ">"
   - &amp; represents "&"
   - &apos; represents '
   - &quot; represents "


Would it be a good idea to make a small search-and-replace function to call
from emit-element and escape these chars in the string when nescessary?
Today there is no such mechanism in clojure.xml, but it would be simple to
implement.

The singlequote is easily changed in the emit-element function if needed -
is there anyone interested in options for this? The escaper for predefined
entities would be of general interest, I think.

Should I put this in Jira? Should I try to make a patch and send to
someone? This would be my first commit so I would be most grateful if
someone just gave an acknowledge of any kind so I'm not entirely off track.

[1] http://en.wikipedia.org/wiki/Xml

/Linus

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to