All in all the index is about 250GB and it’s sharded in two dedicated VMs with 24GB of memory and it’s performing ok so far (queries take about 7 seconds, the worst cases about 10). At some point in the past we needed to transition to SolrCloud because a single Solr core, of course, wouldn’t scale.
> On 22 Feb 2018, at 01:43, Shawn Heisey <apa...@elyograg.org> wrote: > > On 2/21/2018 12:08 PM, Alfonso Muñoz-Pomer Fuentes wrote: >> Some more details about my collection: >> - Approximately 200M documents >> - 1.2M different values in the field I’m faceting over >> >> The query I’m doing is over a single bucket, which after applying q and fq >> the 1.2M values are reduced to, at most 60K (often times half that value). >> From your replies I assume I’m not going to hit a bottleneck any time soon. >> Thanks a lot. > > Two hundred million documents is going to be a pretty big index even if > the documents are small. The server is going to need a lot of spare > memory (not assigned to programs) for good general performance. > > As I understand it, facet performance is going to be heavily determined > by the 1.2 million unique values in the field you're using. Facet > performance is probably going to be very similar whether your query > matches 60K or 1 million. > > Thanks, > Shawn > -- Alfonso Muñoz-Pomer Fuentes Senior Lead Software Engineer @ Expression Atlas Team European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Tel:+ 44 (0) 1223 49 2633 Skype: amunozpomer