One of our customers needs to index 15 billions document in a collection.
As this volume is not usual for me, I need some advices about solrcloud
sizing (how much servers, nodes, shards, replicas, memory, ...)

Some inputs :

   - Collection size : 15 billions document
   - Collection update : 8 millions new documents / days + 8 millions
   deleted documents / days
   - Updates occur during the night without queries
   - Queries occur during the day without updates
   - Document size is nearly 300 bytes
   - Document fields are mainly string including one date field
   - The same terms will occurs several time for a given field (from 10 to
   100.000)
   - Query will use a date period and a filter query on one or more fields
   - 10.000 queries / minutes
   - expected response time < 500ms
   - 1 billion documents indexed = 5Gb index size
   - no ssd drives

So, what is you advice about :

# of shards : 15 billions documents -> 16 shards ?
# of replicas ?
# of nodes = # of shards ?
heap memory per node ?
direct memory per node ?

Thank your advices ?

Dominique

Reply via email to