Hi, How can i parse documents with the Boilerpipe content handler and still be able to read all hyperlinks? Right now we parse twice, once to get the text without boilerplate text and once to get all hyperlinks.
Any advice? Thanks -- Markus Jelsma - CTO - Openindex
