Re: [Solr] Reindexing leaving behind 0 live doc segments

Rahul Goswami Fri, 01 Sep 2023 15:30:44 -0700

Shawn,
Thanks for your response. To your question about locking, I am not doing
anything explicitly here. If you are alluding to deleting the write.lock
file and opening a new IndexWriter, I am not doing that . Only an
IndexReader.

Are you suggesting opening an IndexReader from within Solr could interfere
with Solr's working and in turn file deletions? I think an answer to this
question would really help me understand what is going wrong.

I am running Solr 8.11.1 in *standalone mode* and the index is a mix of 7.x
and 8.x segments. I am only reindexing 7.x segments by excluding them from
participating in merges through a custom merge policy. Also please note
that the reindexing process finishes successfully and if I check the
/admin/segments Solr endpoint, all the segments have version 8.11 at this
point. The process finishes fine with all data integrity checks and the
searches working fine, except that these 0 live doc 7.x segments get left
behind and cause index bloat.

Thanks,
Rahul

On Thu, Aug 31, 2023 at 10:22 PM Shawn Heisey <elyog...@elyograg.org> wrote:

> On 8/31/23 14:45, Rahul Goswami wrote:
> > I am trying to execute a program to read documents segment-by-segment and
> > reindex to the same index. I am reading using Lucene apis and indexing
> > using Solr api (in a core that is currently loaded) to preserve the field
> > analysis and automatically take care of deletions. The segments being
> > processed *do not* participate in merge.
>
> In order to make this even possible, you must have changed the locking
> to 'none'.  Lucene normally prevents opening the same index more than
> once, and it does so for good reason.  Operation is undefined in that
> situation.  Your index could easily become corrupted.
>
> What you should do is make a copy of the index directory and index from
> there to Solr.  If you make one pass with rsync and then repeat the
> rsync, the second pass will complete VERY quickly and should produce a
> good index.  I would use "rsync -avH --delete /path/to/source/
> /path/to/target/" for that.
>
> <snip>
>
> > Would opening an IndexReader this way interfere with how Solr manages
> > IndexReader and file refCounts, thereby preventing file deletion? What
> am I
> > missing?
> As I mentioned above, Lucene can make no guarantees about how things
> work if the index is opened more than once.  The Lucene program would
> likely not interfere with Solr's reference counts, but it is still a
> REALLY bad idea to have both Solr and your program open the index.
>
> You could try reloading the core rather than restarting Solr.  That will
> happen very quickly and might cause Lucene to delete empty segments.  If
> the index is not large, you could ask Solr to optimize the index with
> maxSegments=1.  You would not want to do that on a really large index.
>
> Thanks,
> Shawn
>
>

Re: [Solr] Reindexing leaving behind 0 live doc segments

Reply via email to