One of our customers needs to index 15 billions document in a collection. As this volume is not usual for me, I need some advices about solrcloud sizing (how much servers, nodes, shards, replicas, memory, ...)
Some inputs : - Collection size : 15 billions document - Collection update : 8 millions new documents / days + 8 millions deleted documents / days - Updates occur during the night without queries - Queries occur during the day without updates - Document size is nearly 300 bytes - Document fields are mainly string including one date field - The same terms will occurs several time for a given field (from 10 to 100.000) - Query will use a date period and a filter query on one or more fields - 10.000 queries / minutes - expected response time < 500ms - 1 billion documents indexed = 5Gb index size - no ssd drives So, what is you advice about : # of shards : 15 billions documents -> 16 shards ? # of replicas ? # of nodes = # of shards ? heap memory per node ? direct memory per node ? Thank your advices ? Dominique