An optimize takes lots of cpu and I/O since it has to rewrite your indexes, so only do it when necessary.
You can just use curl to send an optimize message to Solr when you are ready. See: http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_parameters_as_part_of_the_URL Tom -----Original Message----- From: Claudio Devecchi [mailto:[email protected]] Sent: Friday, November 12, 2010 12:13 PM To: [email protected] Subject: Re: Doubt about index size Hi Tom, thanks for your explanation, Do you recommend the index continues this way? Or can I configure it to make optmize automatically? tks On Fri, Nov 12, 2010 at 2:39 PM, Burton-West, Tom <[email protected]>wrote: > Hi Claudio, > > What's happening when you re-index the documents is that Solr/Lucene > implements an update as a delete plus a new index. Because of the nature of > inverted indexes, deleting documents requires a rewrite of the entire index. > In order to avoid rewriting the entire index each time one document is > deleted, deletes are implemented as a list of deleted internal lucene ids. > Documents aren't actually removed from the indexes until the index segment > is merged or an optimize occurs. > > maxDoc's is the total number of documents indexed without taking into > consideration that some of them are marked as deleted > numDocs is the actual number of undeleted documents > > If you run an optimize the index will be rewritten, the index size will go > down and numDocs will equal maxDocs > > Tom Burton-West > > -----Original Message----- > From: Claudio Devecchi [mailto:[email protected]] > Sent: Friday, November 12, 2010 10:50 AM > To: Lista Solr > Subject: Doubt about index size > > Hi everybody, > > I'm doing some indexing testing on solr 1.4.1 and I'm not understanding one > thing, let me try to explain. > > I have 1.2 million xml files and I'm indexing then, when I do it for first > time my index size is around 3 GB and in my statistics on > http://localhost:8983/solr/admin/stats.jsp I have two entries that is: > > numDocs : 1120171 > maxDoc : 1120171 > > Until here is all right, but if I make a index update reindexing all the > same 1120171 documents I have the stats bellow: > > numDocs : 1120171 > maxDoc : 2240342 > > ... and my index size goes around 6GB. > > Why this happen? What happens on index size if I have the same number of > searcheable docs? > > Somebody knows? > > Tks > -- Claudio Devecchi flickr.com/cdevecchi
