We have about 4M documents with a 512 vector field.  For us we have to make
sure that the SOLR node instances have more memory available to Linux than
the SOLR collection size so that it's available to the OS for caching.  I
can see with iotop that there is 0 bytes disk reads while our SOLR node are
working away.  Memory caching makes all the difference - without the memory
the performance isn't good enough (for us to deliver web pages).

We started using dense vector searches back in SOLR 8 and, at that time, I
was experimenting with mounting RAM disks but it was only after that that I
realized that Linux appears to be really good at automatically caching data
so messing around with RAM disks wasn't necessary.  This means you end up
with the choice of how much memory for the JDK and how much memory for
Linux - and that Linux memory choice is really important for performance -
more memory than the Collection(s) size in any case (I'm not sure how much
more memory is required I just know that we are getting away with it now
through a process of trial and error).

Derek

Reply via email to