Appreciate your reply. Have some more follow up questions inline. On Thu, Feb 2, 2012 at 12:35 AM, Emmanuel Espina <espinaemman...@gmail.com> wrote: >> 1. Adds : 20 docs/sec >> 2. Searches : 100 searches/sec >> 3. Deletes : (20*3600*24*7 ~ 12 mill ) docs/week ( basically a cron >> job which deletes all documents more than 7 days old ) >> >> I am thinking of having 6 shards ( with each having 2 million docs ) >> with 1 master and 2 slaves with SolrReplication. Have following >> questions : >> >> 1. With 50 searches/sec per shard with 2 million doc, what would be >> the tentative response-time ? I am thinking of keeping it under <100 >> ms > > That are quite a lot of searches per second considering that you will > have to search in 6 shards (the coordination and network latency > affects the results). Also the components you use and the complexity > of the query (as well as the number of segments in each shard) affects > the time. 100 ms is probably a low threshold for your requirements, > you will probably need to add more replicas.
Adding slaves ( using SolrReplication ) is fine as long as it scales linear. I do understand that shards may not scale linearly, mostly because of merging/network overhead, but think will help in reducing response time ( pls correct me if I am wrong ) . I am more worried about response time ( even on a lightly loaded slave ). The main intention of sharding was to reduce the response time. Will it be better to have a 2shardsX6slaves configuration compared to 6shardX2slaves ? Considering my total# docs is 12 million, wIll solr be ok with 6 million docs/shard ? > > >> 2. What would be a reasonable latency ( pollInterval ) on slave for >> SolrReplication ( all slaves connected with a single backplane ). Is 1 >> minute pollInterval reasonable ? > > Yes, but it is not reasonable that each time you poll you get updates. > That is, you shouldn't perform commits more than once every 10 > minutes. Otherwise we would be talking of near real time indexing, > something that is in development in trunk > http://wiki.apache.org/solr/NearRealtimeSearch Hmm. 10 minutes latency is definitely too hight for me ( specially as this is a streaming use case, i.e. show latest stuff first ) In that case I can probably get rid of master-slave and update all the replicated shards. But then I will have to do lot of leg-work ( what if one of the slaves are down etc. etc. ) I was trying to avoid that. Just curious to know what is the stability of NRT ? > > >> 3. Is NRT a better/viable option compared to SolrReplication ? > > That is something in development. AFAIK it works with shards (because > nrt refers to indexing and with shards there isn't anything particular > with the indexing) but with replication something different will be > needed: SolrCloud I think covers these nrt aspects due to its > different architecture (not master-slave that in replicas but all > peers replicating) So it seems SolrReplication is out ( if my pollInteterval < 5 minute ), right ? Let me look into SolrCloud. Any suggestions which one is more stable SolrCloud/NRT ? -Thanks, Prasenjit