Re: Prevent website parts of being indexed

sebio Wed, 25 Jul 2012 06:29:39 -0700

Ok,

i found out that i had to change parse-default.xml and point the default
parsers to my own modified parser.


Using "bin/nutch parsechecker -dumpText
http://www.mytestsite.com/index.html"; yields the correct parsed text without
the "nutch_noindex" sections.

The solr index still contains the full parsed text.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Prevent-website-parts-of-being-indexed-tp3997213p3997236.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Prevent website parts of being indexed

Reply via email to