Shawn, Thanks for your response. To your question about locking, I am not doing anything explicitly here. If you are alluding to deleting the write.lock file and opening a new IndexWriter, I am not doing that . Only an IndexReader.
Are you suggesting opening an IndexReader from within Solr could interfere with Solr's working and in turn file deletions? I think an answer to this question would really help me understand what is going wrong. I am running Solr 8.11.1 in *standalone mode* and the index is a mix of 7.x and 8.x segments. I am only reindexing 7.x segments by excluding them from participating in merges through a custom merge policy. Also please note that the reindexing process finishes successfully and if I check the /admin/segments Solr endpoint, all the segments have version 8.11 at this point. The process finishes fine with all data integrity checks and the searches working fine, except that these 0 live doc 7.x segments get left behind and cause index bloat. Thanks, Rahul On Thu, Aug 31, 2023 at 10:22 PM Shawn Heisey <elyog...@elyograg.org> wrote: > On 8/31/23 14:45, Rahul Goswami wrote: > > I am trying to execute a program to read documents segment-by-segment and > > reindex to the same index. I am reading using Lucene apis and indexing > > using Solr api (in a core that is currently loaded) to preserve the field > > analysis and automatically take care of deletions. The segments being > > processed *do not* participate in merge. > > In order to make this even possible, you must have changed the locking > to 'none'. Lucene normally prevents opening the same index more than > once, and it does so for good reason. Operation is undefined in that > situation. Your index could easily become corrupted. > > What you should do is make a copy of the index directory and index from > there to Solr. If you make one pass with rsync and then repeat the > rsync, the second pass will complete VERY quickly and should produce a > good index. I would use "rsync -avH --delete /path/to/source/ > /path/to/target/" for that. > > <snip> > > > Would opening an IndexReader this way interfere with how Solr manages > > IndexReader and file refCounts, thereby preventing file deletion? What > am I > > missing? > As I mentioned above, Lucene can make no guarantees about how things > work if the index is opened more than once. The Lucene program would > likely not interfere with Solr's reference counts, but it is still a > REALLY bad idea to have both Solr and your program open the index. > > You could try reloading the core rather than restarting Solr. That will > happen very quickly and might cause Lucene to delete empty segments. If > the index is not large, you could ask Solr to optimize the index with > maxSegments=1. You would not want to do that on a really large index. > > Thanks, > Shawn > >