On Wednesday 26 October 2011 16:24:15 Bai Shen wrote:
> On Tue, Oct 25, 2011 at 1:25 PM, Markus Jelsma
> 
> <[email protected]>wrote:
> > > Is there a reason to keep a segment around after it's been indexed? 
> > > When following the tutorial, I ended up sending the same segment to
> > > the solr server multiple times because I was using segments/* as my
> > > argument.
> > 
> > Only send the segment(s) that have not been indexed yet unless you have
> > to reindex everything each time.
> 
> -nods-  I didn't mean to send the same segment multiple times.  I just
> didn't quite realize what the index command was doing.
> 
> > > Once I've sent it to the solr server, is there any reason not to delete
> > > that segment?
> > 
> > You can delete a segment if:
> > - you don't do any reindexing;
> > - it's older than fetch interval (default 30 days) and you are sure all
> > URL's
> > in that segment have already been fetched in newer segment(s);
> > - don't need the stored content in the segment.
> 
> What do you mean by reindexing?  Doesn't nutch handle this with it's
> refetching of content after it expires?
> 
> Once I parse the content and update the db, wouldn't the segment be
> irrelevant in regards to whether they get fetched or not?
> 
> The content already gets stored in the solr server to facilitate
> highlighting. So I can't see why we would need to store it in nutch.

Reindexing is useful for development environments or even production when 
Solr's index time analysis changes. If you change for tokens get indexed or 
enable of disable norms, you must reindex from scratch.



-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to