We have around 5 million items in our index and each item has a description
located on a separate physical database. These item descriptions vary in
size and for the most part are quite large. Currently we are only indexing
items and not their corresponding description and a full import takes around
4 hours. Ideally we want to index both our items and their descriptions but
after some quick profiling I determined that a full import would take in
excess of 24 hours. 

- How would I profile the indexing process to determine if the bottleneck is
Solr or our Database.
- In either case, how would one speed up this process? Is there a way to run
parallel import processes and then merge them together at the end? Possibly
use some sort of distributed computing?

Any ideas. Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p863447.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to