In addition to what Emmanuel mentioned, why not consider 7 shards? If you used one shard/day, your delete problem becomes really easy, just nuke the oldest shard....
Although beware that this solution may affect your TF/IDF calculations on the new shard (i.e. the one you use for *today's* data) until you get enough documents on it. Best Erick On Wed, Feb 1, 2012 at 2:05 PM, Emmanuel Espina <espinaemman...@gmail.com> wrote: > 2012/2/1 prasenjit mukherjee <prasen....@gmail.com>: >> I have the following requirements : >> >> 1. Adds : 20 docs/sec >> 2. Searches : 100 searches/sec >> 3. Deletes : (20*3600*24*7 ~ 12 mill ) docs/week ( basically a cron >> job which deletes all documents more than 7 days old ) >> >> I am thinking of having 6 shards ( with each having 2 million docs ) >> with 1 master and 2 slaves with SolrReplication. Have following >> questions : >> >> 1. With 50 searches/sec per shard with 2 million doc, what would be >> the tentative response-time ? I am thinking of keeping it under <100 >> ms > > That are quite a lot of searches per second considering that you will > have to search in 6 shards (the coordination and network latency > affects the results). Also the components you use and the complexity > of the query (as well as the number of segments in each shard) affects > the time. 100 ms is probably a low threshold for your requirements, > you will probably need to add more replicas. > > >> 2. What would be a reasonable latency ( pollInterval ) on slave for >> SolrReplication ( all slaves connected with a single backplane ). Is 1 >> minute pollInterval reasonable ? > > Yes, but it is not reasonable that each time you poll you get updates. > That is, you shouldn't perform commits more than once every 10 > minutes. Otherwise we would be talking of near real time indexing, > something that is in development in trunk > http://wiki.apache.org/solr/NearRealtimeSearch > > >> 3. Is NRT a better/viable option compared to SolrReplication ? > > That is something in development. AFAIK it works with shards (because > nrt refers to indexing and with shards there isn't anything particular > with the indexing) but with replication something different will be > needed: SolrCloud I think covers these nrt aspects due to its > different architecture (not master-slave that in replicas but all > peers replicating) > >> >> -Thanks, >> Prasenjit