Accidentally dropped user@...

Try TeeContentHandler?

On Wed, Jan 13, 2021 at 3:05 PM Peter Kronenberg <[email protected]>
wrote:

> Cool, didn’t know about that.  But that doesn’t seem to be able to return
> the text that it got.  Can I do both?
>
>
>
> *From:* Tim Allison <[email protected]>
> *Sent:* Wednesday, January 13, 2021 2:34 PM
> *To:* [email protected]
> *Subject:* Re: Getting language of parsed text
>
>
>
> Try the LanguageHandler()?
>
>
>
> On Wed, Jan 13, 2021 at 2:21 PM Peter Kronenberg <
> [email protected]> wrote:
>
> If I use the BodyContentHandler, it’s easy to send the text I get back to
> a language detector
>
>
>
> ContentHandler handler = *new *BodyContentHandler(-1);
>
> parser.parse(stream, handler, metadata, parseContext);
>
>
>
> String str = handler.toString();
>
>
>
> LanguageDetector detector = *new *OptimaizeLangDetector();
> detector.loadModels();
>
> *log*.info(*"Language: " *+ detector.detectAll(str));
>
>
>
>
>
>
>
> However, if I use ToXMLContentHandler(), it obviously has problems detecting 
> the language because of all the XML metadata.  Is there an easy way to get 
> the body of the XHTML output?
>
> I played around the Javax.xml.xpath, et al, but I’m not sure that the 
> document that comes back is a valid XML document.
>
>
>
>
>
>

Reply via email to