Hi, I'm trying to index only the main content (main article) of various websites. For this, I'd like to use Boilerpipe with Nutch.
Markus has been developing a patch (NUTCH-961) that does exactly that. Although, the patch does install without problems, I am not sure how to set the necessary settings. Is there anyone how can shed some light on this? As I understand two variables have to be set: tika.boilerpipe = true tika.boilerpipe.extractor = "ArticleExtractor" I have tried to do this in a file conf/tika.config.file (is this still being used?) and conf/nutch-default.xml within as valid XML within a properly field. Both, didn't activate Boilerpipe. FYI: I am using Nutch 1.5. What should I do to get this thing going? Kind regards, René

