2012/2/1 prasenjit mukherjee <prasen....@gmail.com>:
> I have the following requirements :
>
> 1. Adds : 20 docs/sec
> 2. Searches : 100 searches/sec
> 3. Deletes : (20*3600*24*7 ~ 12 mill ) docs/week ( basically a cron
> job which deletes all documents more than 7 days old )
>
> I am thinking of having 6 shards ( with each having 2 million docs )
> with 1 master and 2 slaves with SolrReplication. Have following
> questions :
>
> 1. With  50 searches/sec per shard with 2 million doc, what would be
> the tentative response-time  ?  I am thinking of keeping it under <100
> ms

That are quite a lot of searches per second considering that you will
have to search in 6 shards (the coordination and network latency
affects the results). Also the components you use and the complexity
of the query (as well as the number of segments in each shard) affects
the time. 100 ms is probably a low threshold for your requirements,
you will probably need to add more replicas.


> 2. What would be a reasonable latency ( pollInterval ) on slave for
> SolrReplication ( all slaves connected with a single backplane ). Is 1
> minute pollInterval reasonable ?

Yes, but it is not reasonable that each time you poll you get updates.
That is, you shouldn't perform commits more than once every 10
minutes. Otherwise we would be talking of near real time indexing,
something that is in development in trunk
http://wiki.apache.org/solr/NearRealtimeSearch


> 3. Is NRT a better/viable option compared to SolrReplication ?

That is something in development. AFAIK it works with shards (because
nrt refers to indexing and with shards there isn't anything particular
with the indexing) but with replication something different will be
needed: SolrCloud I think covers these nrt aspects due to its
different architecture (not master-slave that in replicas but all
peers replicating)

>
> -Thanks,
> Prasenjit

Reply via email to