Hello, We have a setup where we periodically index a solr “offline” and then copy the data folder to a storage location. When we then deploy our solrs to production, the containers then download that data folder to the right place in the file system before the solr server is started. After the solr is started, it is never updated, we just tear it down and replace on the next cycle. This works ok, but I was wondering if there are any tweaks one could apply to make the indexing go faster, when we know that there will be no searches during the time we are indexing? The corpus we are indexing is around 40 million documents, and most of the time is spent on waiting for commits. We commit every 5 million documents. Does that sound reasonable? Should we commit more often? Or should we just commit at the end?
I am aware that there is a lot of context I have not provided here. I am just looking for any advice I can get for this kind of setup. Kind regards, /Noah