Accidentally dropped user@... Try TeeContentHandler?
On Wed, Jan 13, 2021 at 3:05 PM Peter Kronenberg <[email protected]> wrote: > Cool, didn’t know about that. But that doesn’t seem to be able to return > the text that it got. Can I do both? > > > > *From:* Tim Allison <[email protected]> > *Sent:* Wednesday, January 13, 2021 2:34 PM > *To:* [email protected] > *Subject:* Re: Getting language of parsed text > > > > Try the LanguageHandler()? > > > > On Wed, Jan 13, 2021 at 2:21 PM Peter Kronenberg < > [email protected]> wrote: > > If I use the BodyContentHandler, it’s easy to send the text I get back to > a language detector > > > > ContentHandler handler = *new *BodyContentHandler(-1); > > parser.parse(stream, handler, metadata, parseContext); > > > > String str = handler.toString(); > > > > LanguageDetector detector = *new *OptimaizeLangDetector(); > detector.loadModels(); > > *log*.info(*"Language: " *+ detector.detectAll(str)); > > > > > > > > However, if I use ToXMLContentHandler(), it obviously has problems detecting > the language because of all the XML metadata. Is there an easy way to get > the body of the XHTML output? > > I played around the Javax.xml.xpath, et al, but I’m not sure that the > document that comes back is a valid XML document. > > > > > >
