On Tue, Oct 25, 2011 at 1:25 PM, Markus Jelsma <[email protected]>wrote:
> > Is there a reason to keep a segment around after it's been indexed? When > > following the tutorial, I ended up sending the same segment to the solr > > server multiple times because I was using segments/* as my argument. > > Only send the segment(s) that have not been indexed yet unless you have to > reindex everything each time. > > -nods- I didn't mean to send the same segment multiple times. I just didn't quite realize what the index command was doing. > > > > Once I've sent it to the solr server, is there any reason not to delete > > that segment? > > You can delete a segment if: > - you don't do any reindexing; > - it's older than fetch interval (default 30 days) and you are sure all > URL's > in that segment have already been fetched in newer segment(s); > - don't need the stored content in the segment. > > What do you mean by reindexing? Doesn't nutch handle this with it's refetching of content after it expires? Once I parse the content and update the db, wouldn't the segment be irrelevant in regards to whether they get fetched or not? The content already gets stored in the solr server to facilitate highlighting. So I can't see why we would need to store it in nutch.

