Note that HTML4 is quite out of date (from 1999), the most recent HTML
standard from the W3C is here: https://www.w3.org/TR/html/ from 2014.
However, if you plan to reference the standard to build software, the
most useful spec is https://html.spec.whatwg.org/ which is what
browsers and other applications follow.

Sam

On Thu, Feb 25, 2016 at 1:21 PM, Jay McCarthy <jay.mccar...@gmail.com> wrote:
> You should double check against the HTML 4.01 spec
>
> https://www.w3.org/TR/html4/
>
> Since you mention "in the wild", I think you probably don't want to
> use the html library but instead want to use
>
> http://docs.racket-lang.org/html-parsing/index.html
>
> Jay
>
> On Thu, Feb 25, 2016 at 1:13 PM, jon stenerson <jonstener...@comcast.net> 
> wrote:
>> I find that when I use the html library I have to make a few simple changes
>> to html-spec.rkt. It seems that <ins> and <del> are not treated like <b> and
>> <i> . You can see in this example that while <b> remains in the enclosing
>> <p>, <ins> does not. I also find that I have to allow pcdata as a child of
>> <ol> and <ul>. I don't know whether pcdata is "supposed to" appear there but
>> it often does in the wild.
>>
>> Jon
>>
>>
>>
>> #lang racket
>>
>> (require (prefix-in h: html)  (prefix-in x: xml))
>>
>> (define (xml->list x)
>>   (cond
>>         [(x:pcdata? x) (x:pcdata-string x)]
>>         [(x:entity? x) (list)]
>>         [(x:element? x)
>>          (list (x:element-name x)
>>                (map xml->list (x:element-content x)))]
>>         [(list? x) (map xml->list x)]))
>>
>> (printf "~s\n" (xml->list (h:read-html-as-xml (open-input-string "<p>Hello
>> world <b>Testing</b>!</p>"))))
>> (printf "~s\n" (xml->list (h:read-html-as-xml (open-input-string "<p>Hello
>> world <ins>Testing</ins>!</p>"))))
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Racket Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to racket-users+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> Jay McCarthy
> Associate Professor
> PLT @ CS @ UMass Lowell
> http://jeapostrophe.github.io
>
>            "Wherefore, be not weary in well-doing,
>       for ye are laying the foundation of a great work.
> And out of small things proceedeth that which is great."
>                           - D&C 64:33
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to