Hello,

We have a setup where we periodically index a solr “offline” and then copy the 
data folder to a storage location. When we then deploy our solrs to production, 
the containers then download that data folder to the right place in the file 
system before the solr server is started. After the solr is started, it is 
never updated, we just tear it down and replace on the next cycle.
This works ok, but I was wondering if there are any tweaks one could apply to 
make the indexing go faster, when we know that there will be no searches during 
the time we are indexing? The corpus we are indexing is around 40 million 
documents, and most of the time is spent on waiting for commits. We commit 
every 5 million documents. Does that sound reasonable? Should we commit more 
often? Or should we just commit at the end?

I am aware that there is a lot of context I have not provided here. I am just 
looking for any advice I can get for this kind of setup.

Kind regards,
/Noah

Reply via email to