On 8/31/23 14:45, Rahul Goswami wrote:
I am trying to execute a program to read documents segment-by-segment and
reindex to the same index. I am reading using Lucene apis and indexing
using Solr api (in a core that is currently loaded) to preserve the field
analysis and automatically take care of deletions. The segments being
processed *do not* participate in merge.

In order to make this even possible, you must have changed the locking to 'none'. Lucene normally prevents opening the same index more than once, and it does so for good reason. Operation is undefined in that situation. Your index could easily become corrupted.

What you should do is make a copy of the index directory and index from there to Solr. If you make one pass with rsync and then repeat the rsync, the second pass will complete VERY quickly and should produce a good index. I would use "rsync -avH --delete /path/to/source/ /path/to/target/" for that.

<snip>

Would opening an IndexReader this way interfere with how Solr manages
IndexReader and file refCounts, thereby preventing file deletion? What am I
missing?
As I mentioned above, Lucene can make no guarantees about how things work if the index is opened more than once. The Lucene program would likely not interfere with Solr's reference counts, but it is still a REALLY bad idea to have both Solr and your program open the index.

You could try reloading the core rather than restarting Solr. That will happen very quickly and might cause Lucene to delete empty segments. If the index is not large, you could ask Solr to optimize the index with maxSegments=1. You would not want to do that on a really large index.

Thanks,
Shawn

Reply via email to