Hello Dave,
If you have just one specific page you do not want Nutch to index, or Solr to
show, you can either create a custom IndexingFilter that returns null
(rejecting it) for the specified URL, or add an additional filterQuery to Solr,
fq=-id:, filtering the specific URL from the results.
Hi Everyone,
I searched and didn't find an answer.
Nutch is indexing the content of the page that has the seed urls in it and
then that page shows up in the SOLR search results. We don't want that to
happen.
Is there a way to have nutch crawl the seed url page but not push that page
into
Markus,
Thank you so much for the reply!
I made the change to parse-plugins.xml and the plug-in is being called
now. That plug-in didn't work so I changed to the blacklist-whitelist
plug-in and I've got it working thanks to your help!
Dave
On Wed, Oct 9, 2019 at 4:00 PM Markus Jelsma
3 matches
Mail list logo