You should double check against the HTML 4.01 spec https://www.w3.org/TR/html4/
Since you mention "in the wild", I think you probably don't want to use the html library but instead want to use http://docs.racket-lang.org/html-parsing/index.html Jay On Thu, Feb 25, 2016 at 1:13 PM, jon stenerson <jonstener...@comcast.net> wrote: > I find that when I use the html library I have to make a few simple changes > to html-spec.rkt. It seems that <ins> and <del> are not treated like <b> and > <i> . You can see in this example that while <b> remains in the enclosing > <p>, <ins> does not. I also find that I have to allow pcdata as a child of > <ol> and <ul>. I don't know whether pcdata is "supposed to" appear there but > it often does in the wild. > > Jon > > > > #lang racket > > (require (prefix-in h: html) (prefix-in x: xml)) > > (define (xml->list x) > (cond > [(x:pcdata? x) (x:pcdata-string x)] > [(x:entity? x) (list)] > [(x:element? x) > (list (x:element-name x) > (map xml->list (x:element-content x)))] > [(list? x) (map xml->list x)])) > > (printf "~s\n" (xml->list (h:read-html-as-xml (open-input-string "<p>Hello > world <b>Testing</b>!</p>")))) > (printf "~s\n" (xml->list (h:read-html-as-xml (open-input-string "<p>Hello > world <ins>Testing</ins>!</p>")))) > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- Jay McCarthy Associate Professor PLT @ CS @ UMass Lowell http://jeapostrophe.github.io "Wherefore, be not weary in well-doing, for ye are laying the foundation of a great work. And out of small things proceedeth that which is great." - D&C 64:33 -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.