It is possible in many ways. One of the ways to do it without using the HTML pasrser plugin is to do cloaking for your bot.
On 7/3/06, Brent Verner <[EMAIL PROTECTED]> wrote: > Hi, > > I'd like to use nutch to index intranet/site content. The content is all > template-based, and I'd like to index only a portion of the html page. > Specifically, I'd like to only index content/words between a set of comments > in the html page (but I could just as easily surround the content with > another document node that could be more easily matched). Is this possible > without writing a new html parser plugin? If so, how? > > Thanks! > Brent > > -- www.jkg.in | http://www.jkg.in/contact-me/ Jayant Kr. Gandhi M.Tech. Computer Tech. Class of 2007, IIT Delhi Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
