On 6/22/2021 11:29 PM, Stephen Lewis Bianamara wrote:
This was a ton of great info I needed, and more than I initially knew to ask :) Your first response seems to imply an answer to my original question, but I wanted to follow up to be as sure as I can. In the recovery scenario, are there situations where a complete index will copy over next to the original index, thus requiring 2x the disk space? Or is that now outdated? I could imagine for example the replacement is now done on each segment at a smaller scale or something along those lines and so recovery requirements would expect to be on par with merge requirements, or perhaps there is some "bad enough" scenario where a full side-by-side copy is made during recovery. Can you comment on that?
SolrCloud recovery uses the replication handler, configuring it on the fly. This works almost exactly like rsync, and I am pretty sure it mimics the -W option on rsync, copying whole files if the filename already exists but has a different size.
So if the index it's copying is substantially similar to the one you've already got -- at the file level -- the recovery will be fast and not take much space. But if it's very different (again, at the file level, not at the Lucene level) it very well might copy the entire index over, and then delete the files that make up the existing index.
Thanks, Shawn
