Re: how can I index only a portion of html content?

Jayant Kumar Gandhi Mon, 03 Jul 2006 03:23:28 -0700

It is possible in many ways. One of the ways to do it without using
the HTML pasrser plugin is to do cloaking for your bot.


On 7/3/06, Brent Verner <[EMAIL PROTECTED]> wrote:

Hi,

  I'd like to use nutch to index intranet/site content.  The content is all
template-based, and I'd like to index only a portion of the html page.
Specifically, I'd like to only index content/words between a set of comments
in the html page (but I could just as easily surround the content with
another document node that could be more easily matched).  Is this possible
without writing a new html parser plugin?  If so, how?

Thanks!
  Brent



--
www.jkg.in | http://www.jkg.in/contact-me/
Jayant Kr. Gandhi
M.Tech. Computer Tech. Class of 2007,
IIT Delhi

Re: how can I index only a portion of html content?

Reply via email to