Hey team

The page https://tika.apache.org/3.1.0/formats.html#HyperText_Markup_Language 
mentions:

> The output from the HtmlParser class is guaranteed to be well-formed and 
> valid XHTML, and various heuristics are used to prevent things like inline 
> scripts from cluttering the extracted text content.

But HtmlParser links to a non existing class: 
https://tika.apache.org/3.1.0/api/org/apache/tika/parser/html/HtmlParser.html
Should it be 
https://tika.apache.org/3.1.0/api/org/apache/tika/parser/html/JSoupParser.html 
instead?



David Pilato
da...@pilato.fr
06 13 03 08 41

Reply via email to