2012/2/1 prasenjit mukherjee <prasen....@gmail.com>: > I have the following requirements : > > 1. Adds : 20 docs/sec > 2. Searches : 100 searches/sec > 3. Deletes : (20*3600*24*7 ~ 12 mill ) docs/week ( basically a cron > job which deletes all documents more than 7 days old ) > > I am thinking of having 6 shards ( with each having 2 million docs ) > with 1 master and 2 slaves with SolrReplication. Have following > questions : > > 1. With 50 searches/sec per shard with 2 million doc, what would be > the tentative response-time ? I am thinking of keeping it under <100 > ms
That are quite a lot of searches per second considering that you will have to search in 6 shards (the coordination and network latency affects the results). Also the components you use and the complexity of the query (as well as the number of segments in each shard) affects the time. 100 ms is probably a low threshold for your requirements, you will probably need to add more replicas. > 2. What would be a reasonable latency ( pollInterval ) on slave for > SolrReplication ( all slaves connected with a single backplane ). Is 1 > minute pollInterval reasonable ? Yes, but it is not reasonable that each time you poll you get updates. That is, you shouldn't perform commits more than once every 10 minutes. Otherwise we would be talking of near real time indexing, something that is in development in trunk http://wiki.apache.org/solr/NearRealtimeSearch > 3. Is NRT a better/viable option compared to SolrReplication ? That is something in development. AFAIK it works with shards (because nrt refers to indexing and with shards there isn't anything particular with the indexing) but with replication something different will be needed: SolrCloud I think covers these nrt aspects due to its different architecture (not master-slave that in replicas but all peers replicating) > > -Thanks, > Prasenjit