Re: [Solr] Reindexing leaving behind 0 live doc segments

Shawn Heisey Thu, 31 Aug 2023 16:19:03 -0700

On 8/31/23 14:45, Rahul Goswami wrote:

I am trying to execute a program to read documents segment-by-segment and
reindex to the same index. I am reading using Lucene apis and indexing
using Solr api (in a core that is currently loaded) to preserve the field
analysis and automatically take care of deletions. The segments being
processed *do not* participate in merge.

In order to make this even possible, you must have changed the lockingto 'none'. Lucene normally prevents opening the same index more thanonce, and it does so for good reason. Operation is undefined in thatsituation. Your index could easily become corrupted.

What you should do is make a copy of the index directory and index fromthere to Solr. If you make one pass with rsync and then repeat thersync, the second pass will complete VERY quickly and should produce agood index. I would use "rsync -avH --delete /path/to/source//path/to/target/" for that.


<snip>

Would opening an IndexReader this way interfere with how Solr manages
IndexReader and file refCounts, thereby preventing file deletion? What am I
missing?

As I mentioned above, Lucene can make no guarantees about how thingswork if the index is opened more than once. The Lucene program wouldlikely not interfere with Solr's reference counts, but it is still aREALLY bad idea to have both Solr and your program open the index.

You could try reloading the core rather than restarting Solr. That willhappen very quickly and might cause Lucene to delete empty segments. Ifthe index is not large, you could ask Solr to optimize the index withmaxSegments=1. You would not want to do that on a really large index.


Thanks,
Shawn

Re: [Solr] Reindexing leaving behind 0 live doc segments

Reply via email to