Hi, Artur. Please, don't tell me that you obtain docValues per every doc? It's deadly slow see https://issues.apache.org/jira/browse/LUCENE-9328 for related problem. Make sure you obtain them once per segment, when leaf reader is injected. Recently there are some new method(s) for {!terms} I'm wondering if any of them might solve the problem.
On Thu, May 14, 2020 at 2:36 PM Rudenko, Artur <artur.rude...@verint.com> wrote: > Hi, > We have a requirement of implementing a boolean filter with up to 500k > values. > > We took the approach of post filter. > > Our environment has 7 servers of 128gb ram and 64cpus each server. We have > 20-40m very large documents. Each solr instance has 64 shards with 2 > replicas and JVM memory xms and xmx set to 31GB. > > We are seeing that using single post filter with 1000 on 20m documents > takes about 4.5 seconds. > > Logic in our collect method: > numericDocValues = > reader.getNumericDocValues(FileFilterPostQuery.this.metaField); > > if (numericDocValues != null && > numericDocValues.advanceExact(docNumber)) { > longVal = numericDocValues.longValue(); > } else { > return; > } > } > > if (numericValuesSet.contains(longVal)) { > super.collect(docNumber); > } > > > Is it the best we can get? > > > Thanks, > Artur Rudenko > > > This electronic message may contain proprietary and confidential > information of Verint Systems Inc., its affiliates and/or subsidiaries. The > information is intended to be for the use of the individual(s) or > entity(ies) named above. If you are not the intended recipient (or > authorized to receive this e-mail for the intended recipient), you may not > use, copy, disclose or distribute to anyone this message or any information > contained in this message. If you have received this electronic message in > error, please notify us by replying to this e-mail. > -- Sincerely yours Mikhail Khludnev