hi,

but how will i get the HTML <div> tag ?
is there any nutch method to get from the content the <div> tag ??
thx




> Subject: Re: indexing just certain content
> From: e...@lakemeadonline.com
> Date: Mon, 5 Oct 2009 13:09:17 -0700
> To: nutch-user@lucene.apache.org
> 
> Adam,
> 
> You could turn off all the indexing plugins and write your own plugin  
> that only indexes certain meta content from your intranet - giving you  
> complete control of the fields indexed.
> 
> Eric
> 
> On Oct 5, 2009, at 1:06 PM, BELLINI ADAM wrote:
> 
> >
> > hi
> >
> > does anybody know if it's possible to index just certain content ? i  
> > mean i need to dont index some garbage and repetitive data on my  
> > intranet.
> >
> > in other way if it is possible to tell the indexer dont index the  
> > content between  certain <div> tags
> > like:
> >
> > <div id="bla bla">
> >
> >
> > plz dont index this  bla  bla bla
> >
> > </div>
> >
> > thx to all
> >                                     
> > _________________________________________________________________
> > New: Messenger sign-in on the MSN homepage
> > http://go.microsoft.com/?linkid=9677403
> 
                                          
_________________________________________________________________
Click less, chat more: Messenger on MSN.ca
http://go.microsoft.com/?linkid=9677404

Reply via email to