Re: Filtering large amount of values

Mikhail Khludnev Thu, 14 May 2020 04:59:04 -0700

Hi, Artur.

Please, don't tell me that you obtain docValues per every doc? It's deadly
slow see https://issues.apache.org/jira/browse/LUCENE-9328 for related
problem.
Make sure you obtain them once per segment, when leaf reader is injected.
Recently there are some new method(s) for {!terms} I'm wondering if any of
them might solve the problem.


On Thu, May 14, 2020 at 2:36 PM Rudenko, Artur <artur.rude...@verint.com>
wrote:

> Hi,
> We have a requirement of implementing a boolean filter with up to 500k
> values.
>
> We took the approach of post filter.
>
> Our environment has 7 servers of 128gb ram and 64cpus each server. We have
> 20-40m very large documents. Each solr instance has 64 shards with 2
> replicas and JVM memory xms and xmx set to 31GB.
>
> We are seeing that using single post filter with 1000 on 20m documents
> takes about 4.5 seconds.
>
> Logic in our collect method:
> numericDocValues =
> reader.getNumericDocValues(FileFilterPostQuery.this.metaField);
>
>                     if (numericDocValues != null &&
> numericDocValues.advanceExact(docNumber)) {
>                         longVal = numericDocValues.longValue();
>                     } else {
>                         return;
>                     }
>                 }
>
>                 if (numericValuesSet.contains(longVal)) {
>                     super.collect(docNumber);
>                 }
>
>
> Is it the best we can get?
>
>
> Thanks,
> Artur Rudenko
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Filtering large amount of values

Reply via email to