Excellent! I'll look into it! Thanks!
> On Tue, Dec 20, 2011 at 9:48 AM, Markus Jelsma > > <[email protected]>wrote: > > Hi, > > > > How can i parse documents with the Boilerpipe content handler and still > > be able to read all hyperlinks? Right now we parse twice, once to get > > the text without boilerplate text and once to get all hyperlinks. > > Use the TeeContentHandler and give it your BoilerpipContentHandler and your > LinkContentHandler. Then use that to pass into the parser.
