Hi all,

Another open source search engine, HtDig, allows web page authors to
mark up a page such that some sections are not indexed.  The syntax
looks like the following:

<!--htdig_noindex-->
... material inside is not indexed ...
<!--/htdig_noindex-->

Does a similar feature exist in Nutch? If the answer is "write a
plugin" does anyone have tips on where to start? Also, how hard is
something like this for a Nutch newbie who doesn't know anything about
HTML parsing? I have a bunch of documents already marked up with the
htdig syntax, and in the interests of interoperability I'm tempted to
follow the syntax exactly.

-Jeff

Reply via email to