Matthew Butterick wrote on 01/07/2016 04:18 PM:
When we speak of "parsing HTML" we should distinguish between strict parsing (= explicit adherence to a given HTML spec) and permissive parsing (= converting an HTML-ish string into Racket data.) Both have their place.

Alas, I think the W3C had to give up on trying to make people do strict parsing. Not enough people ran the W3C Validator in the earlier days of the Web, and the (since-abandoned) XML-based XHTML standard was started after the strict ship had long since sailed. The W3C has moved behind HTML5 for now.

The `html-parsing` parser was written 15 years ago for doing AI-ish software agent scraping of info from real-world Web pages, so it was necessarily permissive. In some ways, HTML was even worse back then, because Mosaic/Navigator/MSIE tended to accept invalid HTML-- like if the Racket compiler never raised an error or gave a warning message for an error, and simply generated whatever code it wanted to, and programmers worked by mindlessly poking at their source code until the generated code seemed to be doing what they wanted. :) Syntactically, real-world HTML is somewhat better now, because the development tools and the browsers are better. But a permissive parser still makes sense for most purposes, including the massive HTML5 of 15 years later.

Neil V.

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to