admittedly this is a cross-post from stackoverflow, but I don't know if there
are a whole lot of Nutch folks over there.

My question is about crawling HTML navigation menus, but not indexing the
text for those links in Solr.

While I have seen some older discussions from several years ago about making
this an option in later development, but I am not really finding anything
via searching that gives a good indication of how one might exlude site
navigation menu content from the content that Nutch indexes to Solr during a
crawl.

That is, I am seeing the navigation menu text in all content that is getting
indexed and this damages search because then all content will have the same
text in it. Obviously I want to keep using the site navigation for crawling,
but I don't want it indexed. Is there a best practice for accomplishing this
with Nutch? Like a way to wrap the navigation in some kind of tag , for
example?

I am new to Nutch (obviously) so I don't know the best place that this would
be accomplished.

thanks very much.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-how-to-crawl-but-not-index-the-site-navigation-w-Solr-tp4078702.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to