Hi all, Another open source search engine, HtDig, allows web page authors to mark up a page such that some sections are not indexed. The syntax looks like the following:
<!--htdig_noindex--> ... material inside is not indexed ... <!--/htdig_noindex--> Does a similar feature exist in Nutch? If the answer is "write a plugin" does anyone have tips on where to start? Also, how hard is something like this for a Nutch newbie who doesn't know anything about HTML parsing? I have a bunch of documents already marked up with the htdig syntax, and in the interests of interoperability I'm tempted to follow the syntax exactly. -Jeff
