Hey team The page https://tika.apache.org/3.1.0/formats.html#HyperText_Markup_Language mentions:
> The output from the HtmlParser class is guaranteed to be well-formed and > valid XHTML, and various heuristics are used to prevent things like inline > scripts from cluttering the extracted text content. But HtmlParser links to a non existing class: https://tika.apache.org/3.1.0/api/org/apache/tika/parser/html/HtmlParser.html Should it be https://tika.apache.org/3.1.0/api/org/apache/tika/parser/html/JSoupParser.html instead? David Pilato da...@pilato.fr 06 13 03 08 41