Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

Jay McCarthy Thu, 07 Jan 2016 12:43:18 -0800

Can you send the code you used? I wouldn't expect the xml library to
work since your example is not XML (missing </br>). I also don't have
high hopes for the html library. If you are parsing html, I recommend
using the `html-parsing` package:
http://pkg-build.racket-lang.org/doc/html-parsing/index.html


Jay

On Thu, Jan 7, 2016 at 3:13 PM, David Storrs <david.sto...@gmail.com> wrote:
> Hi folks,
>
> I'm using the html and xml libraries to parse a page that includes the
> following HTML:
>
> <div class="messageInfo primaryContent">
> <div class="messageContent">
> <article>
> <blockquote class="messageText SelectQuoteContainer ugc baseHtml">
> Message text here <br>
> </blockquote>
> </article>
> </div>
>
> When I parse this, the 'article' tag simply isn't parsed -- it lists the
> contents of the messageContent div as just a series of PCDATA statements
> containing "\n"
>
> Is there a way to extend the library, or do I need to switch to a different
> parser?
>
> Dave
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Jay McCarthy
Associate Professor
PLT @ CS @ UMass Lowell
http://jeapostrophe.github.io

           "Wherefore, be not weary in well-doing,
      for ye are laying the foundation of a great work.
And out of small things proceedeth that which is great."
                          - D&C 64:33

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

Reply via email to