As a data point, I routinely see clients index 5M items on normal hardware in approx. 1 hour (give or take 30 minutes).
Also wanted to add that our main entity (item) consists of 5 sub-entities (ie, joins). 2 of those 5 are fairly small so I am using CachedSqlEntityProcessor for them but the other 3 (which includes item_description) are normal. All the entites minus the item_description connect to datasource1. They currently point to one physical machine although we do have a pool of 3 DB's that could be used if it helps. The other entity, item_description uses a datasource2 which has a pool of 2 DB's that could potentially be used. Not sure if that would help or not. I might as well that the item description will have indexed, stored and term vectors set to true. -- View this message in context: http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865219.html Sent from the Solr - User mailing list archive at Nabble.com.