Re: [Nutch-general] how can I index only a portion of html content?

Jayant Kumar Gandhi Mon, 03 Jul 2006 03:23:45 -0700

It is possible in many ways. One of the ways to do it without using
the HTML pasrser plugin is to do cloaking for your bot.

On 7/3/06, Brent Verner <[EMAIL PROTECTED]> wrote:
> Hi,
>
>   I'd like to use nutch to index intranet/site content.  The content is all
> template-based, and I'd like to index only a portion of the html page.
> Specifically, I'd like to only index content/words between a set of comments
> in the html page (but I could just as easily surround the content with
> another document node that could be more easily matched).  Is this possible
> without writing a new html parser plugin?  If so, how?
>
> Thanks!
>   Brent
>
>

-- 
www.jkg.in | http://www.jkg.in/contact-me/
Jayant Kr. Gandhi
M.Tech. Computer Tech. Class of 2007,
IIT Delhi

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] how can I index only a portion of html content?

Reply via email to