RE: Excluding individual pages?

2019-10-10 Thread Markus Jelsma
Hello Dave, If you have just one specific page you do not want Nutch to index, or Solr to show, you can either create a custom IndexingFilter that returns null (rejecting it) for the specified URL, or add an additional filterQuery to Solr, fq=-id:, filtering the specific URL from the results.

Excluding individual pages?

2019-10-10 Thread Dave Beckstrom
Hi Everyone, I searched and didn't find an answer. Nutch is indexing the content of the page that has the seed urls in it and then that page shows up in the SOLR search results. We don't want that to happen. Is there a way to have nutch crawl the seed url page but not push that page into

Re: Nutch excludeNodes Patch

2019-10-10 Thread Dave Beckstrom
Markus, Thank you so much for the reply! I made the change to parse-plugins.xml and the plug-in is being called now. That plug-in didn't work so I changed to the blacklist-whitelist plug-in and I've got it working thanks to your help! Dave On Wed, Oct 9, 2019 at 4:00 PM Markus Jelsma