Re: [racket-users] requirements for streaming html parser

Greg Hendershott Thu, 06 Jun 2019 18:52:29 -0700

Although I don't think I currently /need/ a streaming parser for speed
or space reasons, I can imagine using one.


I'd suggest making something where the user supplies an "on-element"
"callback", which is called with each element -- plus the "path" of
ancestor elements. The user's callback can do whatever it wants.

That could be its own, focused library. I won't say "simple" because
you're parsing HTML!! :)

I can imagine other libraries built on top of that. One I would want to
use (or write myself, share, and use) would offer something like CSS
selectors. Not their syntax. Just some simple function combinators to
express the equivalent. (Because xml/path is close, and maybe enough for
XML, but not quite enough for real-world HTML.) In fact I already do
this, on the full HTML. I can imagine doing this on top of a streaming
parser, instead.

So those are my quick thoughts. I hope that's helpful, and also, other
people will have even better feedback for you.


p.s. One suggestion I have, which you might not like: I think it would
be good if you host Racket packages on GitHub, GitLab, or similar other
site you find least objectionable. I respect your rationale for not
doing that, to-date. On the other hand, people these days like to see
the full git commit history, issues, and pull requests. It helps them
evaluate a package, and feel good about future availability. When they
don't, it can be a speed bump to adoption. If you're aware of all this
but still don't want to do that, again I 100% respect that. Just my
opinion and perspective.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/87blza17t5.fsf%40greghendershott.com.
For more options, visit https://groups.google.com/d/optout.

Re: [racket-users] requirements for streaming html parser

Reply via email to