hi, but how will i get the HTML <div> tag ? is there any nutch method to get from the content the <div> tag ?? thx
> Subject: Re: indexing just certain content > From: e...@lakemeadonline.com > Date: Mon, 5 Oct 2009 13:09:17 -0700 > To: nutch-user@lucene.apache.org > > Adam, > > You could turn off all the indexing plugins and write your own plugin > that only indexes certain meta content from your intranet - giving you > complete control of the fields indexed. > > Eric > > On Oct 5, 2009, at 1:06 PM, BELLINI ADAM wrote: > > > > > hi > > > > does anybody know if it's possible to index just certain content ? i > > mean i need to dont index some garbage and repetitive data on my > > intranet. > > > > in other way if it is possible to tell the indexer dont index the > > content between certain <div> tags > > like: > > > > <div id="bla bla"> > > > > > > plz dont index this bla bla bla > > > > </div> > > > > thx to all > > > > _________________________________________________________________ > > New: Messenger sign-in on the MSN homepage > > http://go.microsoft.com/?linkid=9677403 > _________________________________________________________________ Click less, chat more: Messenger on MSN.ca http://go.microsoft.com/?linkid=9677404