I find that when I use the html library I have to make a few simple
changes to html-spec.rkt. It seems that <ins> and <del> are not treated
like <b> and <i> . You can see in this example that while <b> remains in
the enclosing <p>, <ins> does not. I also find that I have to allow
pcdata as a child of <ol> and <ul>. I don't know whether pcdata is
"supposed to" appear there but it often does in the wild.
Jon
#lang racket
(require (prefix-in h: html) (prefix-in x: xml))
(define (xml->list x)
(cond
[(x:pcdata? x) (x:pcdata-string x)]
[(x:entity? x) (list)]
[(x:element? x)
(list (x:element-name x)
(map xml->list (x:element-content x)))]
[(list? x) (map xml->list x)]))
(printf "~s\n" (xml->list (h:read-html-as-xml (open-input-string
"<p>Hello world <b>Testing</b>!</p>"))))
(printf "~s\n" (xml->list (h:read-html-as-xml (open-input-string
"<p>Hello world <ins>Testing</ins>!</p>"))))
--
You received this message because you are subscribed to the Google Groups "Racket
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.