: SQL DB 4M documents with up to 5000 metadata fields each document [2xXeon
: 2.1Ghz, 32GB RAM]
: Actual Solr: 1 Core version 4.6, 3.8M documents, schema has 300 metadata
: fields to import, size 3.6GB [2xXeon 2.4Ghz, 32GB RAM]
: (atm we need 35h to build the index and about 24h for a mass update which
: affects the production)

The first question i have is why you are using a version of Solr that's 
almost 5 years old.

The second question you should consider is what your indexing process 
looks like, and whether it's multithreaded or not, and if the bottleneck 
is your network/DB.

The third question to consider is your solr configuration / schema: how 
complex the solr side indexing process is -- ie: are these 300 fields all 
TextFields with complex analyzers?

FWIW: I used the script below to build myself 3.8 million documents, with 
300 "text fields" consisting of anywhere from 1-10 "words" (integers 
between 1 and 200)

The resulting CSV file was 24GB, and using a simple curl command to index 
with a single client thread (and a single solr thread) against the solr 
7.4 running with the sample techproducts configs took less then 2 hours on 
my laptop (less CPU & half as much ram compared to your server) while i 
was doing other stuff.

(I would bet your current indexing speed has very little to do with solr 
and is largey a factor of your source DB and how you are sending the data 
to solr)


-Hoss
http://www.lucidworks.com/

Reply via email to