Thank you for taking the time to do the test. I have been doing similar tests using the post Tool (SimplePostTool) with the real data and was able to get to about 10K documents/second.
I am considering using multiple files (one per client) ftp'd into a solr node and then use a scheduled job to use the post tool and post them to solr. The only issue I have run into so far is that if there is an error in data (e.g. required field missing) the post tool stops processing the rest of the file. On Wed, Aug 19, 2015 at 3:58 PM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > Toke Eskildsen <t...@statsbiblioteket.dk> wrote > > Use more than one cloud. Make them fully independent. > > As I suggested when you asked 4 days ago. That would > > also make it easy to scale: Just measure how much a > > single setup can take and do the math. > > The goal is 250K documents/second. > > I tried modifying the books.csv-example that comes with Solr to use lines > with 400 characters and inflated it to 4 * 1 million entries. I then > started a Solr with the techproduct-example and ingested the 4*1M entries > using curl from 4 prompts a the same time. The longest running took 138 > seconds. 4M/138 seconds = 29K documents/second. > > My machine is a 4 core (8 with HyperThreading) i7 laptop, using SSD. On a > modern server and with custom schema & config, the speed should of course > be better. On the other hand, the rate might slow down as the shards grows. > > Give or take, something like 10 machines could conceivably be enough to > handle the Solr load if the analysis chain is near the books-example in > complexity. Of course real data tests are needed and the CSv-data must be > constructed somehow. > > - Toke Eskildsen >