Re: Boilerpipe and getting all URL's

Markus Jelsma Tue, 20 Dec 2011 11:26:59 -0800

Excellent! I'll look into it!

Thanks!


> On Tue, Dec 20, 2011 at 9:48 AM, Markus Jelsma
> 
> <[email protected]>wrote:
> > Hi,
> > 
> > How can i parse documents with the Boilerpipe content handler and still
> > be able to read all hyperlinks? Right now we parse twice, once to get
> > the text without boilerplate text and once to get all hyperlinks.
> 
> Use the TeeContentHandler and give it your BoilerpipContentHandler and your
> LinkContentHandler. Then use that to pass into the parser.

Re: Boilerpipe and getting all URL's

Reply via email to