: SQL DB 4M documents with up to 5000 metadata fields each document [2xXeon : 2.1Ghz, 32GB RAM] : Actual Solr: 1 Core version 4.6, 3.8M documents, schema has 300 metadata : fields to import, size 3.6GB [2xXeon 2.4Ghz, 32GB RAM] : (atm we need 35h to build the index and about 24h for a mass update which : affects the production)
The first question i have is why you are using a version of Solr that's almost 5 years old. The second question you should consider is what your indexing process looks like, and whether it's multithreaded or not, and if the bottleneck is your network/DB. The third question to consider is your solr configuration / schema: how complex the solr side indexing process is -- ie: are these 300 fields all TextFields with complex analyzers? FWIW: I used the script below to build myself 3.8 million documents, with 300 "text fields" consisting of anywhere from 1-10 "words" (integers between 1 and 200) The resulting CSV file was 24GB, and using a simple curl command to index with a single client thread (and a single solr thread) against the solr 7.4 running with the sample techproducts configs took less then 2 hours on my laptop (less CPU & half as much ram compared to your server) while i was doing other stuff. (I would bet your current indexing speed has very little to do with solr and is largey a factor of your source DB and how you are sending the data to solr) -Hoss http://www.lucidworks.com/