Excellent! I'll look into it!

Thanks!

> On Tue, Dec 20, 2011 at 9:48 AM, Markus Jelsma
> 
> <[email protected]>wrote:
> > Hi,
> > 
> > How can i parse documents with the Boilerpipe content handler and still
> > be able to read all hyperlinks? Right now we parse twice, once to get
> > the text without boilerplate text and once to get all hyperlinks.
> 
> Use the TeeContentHandler and give it your BoilerpipContentHandler and your
> LinkContentHandler. Then use that to pass into the parser.

Reply via email to