Re: What is “high cardinality” in facet streams?

Alfonso Muñoz-Pomer Fuentes Thu, 22 Feb 2018 01:46:29 -0800

All in all the index is about 250GB and it’s sharded in two dedicated VMs with 
24GB of memory and it’s performing ok so far (queries take about 7 seconds, the 
worst cases about 10). At some point in the past we needed to transition to 
SolrCloud because a single Solr core, of course, wouldn’t scale.


> On 22 Feb 2018, at 01:43, Shawn Heisey <apa...@elyograg.org> wrote:
> 
> On 2/21/2018 12:08 PM, Alfonso Muñoz-Pomer Fuentes wrote:
>> Some more details about my collection:
>> - Approximately 200M documents
>> - 1.2M different values in the field I’m faceting over
>> 
>> The query I’m doing is over a single bucket, which after applying q and fq 
>> the 1.2M values are reduced to, at most 60K (often times half that value). 
>> From your replies I assume I’m not going to hit a bottleneck any time soon. 
>> Thanks a lot.
> 
> Two hundred million documents is going to be a pretty big index even if
> the documents are small.  The server is going to need a lot of spare
> memory (not assigned to programs) for good general performance.
> 
> As I understand it, facet performance is going to be heavily determined
> by the 1.2 million unique values in the field you're using.  Facet
> performance is probably going to be very similar whether your query
> matches 60K or 1 million.
> 
> Thanks,
> Shawn
> 

--
Alfonso Muñoz-Pomer Fuentes
Senior Lead Software Engineer @ Expression Atlas Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Tel:+ 44 (0) 1223 49 2633
Skype: amunozpomer

Re: What is “high cardinality” in facet streams?

Reply via email to