On Sun, May 17, 2020 at 4:57 PM Rudenko, Artur <artur.rude...@verint.com> wrote:
> Hi Mikhail, > > Thank you for the help, with you suggestion we actually managed to improve > the results. > > We now get and store the docValues in this method instead of inside > collect() method: > > @Override > protected void doSetNextReader(LeafReaderContext context) throws > IOException { > super.doSetNextReader(context); > sortedDocValues = DocValues.getSorted(context.reader(), > FileFilterPostQuery.this.metaField); > } > > We see a big improvement. Is this the most efficient way? > Who knows... Since it's a post filter, we have to return "false" in getCache method. Is > there a way to implement it with cache? > if getCache()==true this query will be used as standalone query ignoring filterCollector. In this case retrieved docs will be cached. > Thanks, > Artur Rudenko > > -----Original Message----- > From: Mikhail Khludnev <m...@apache.org> > Sent: Thursday, May 14, 2020 2:57 PM > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: Filtering large amount of values > > Hi, Artur. > > Please, don't tell me that you obtain docValues per every doc? It's deadly > slow see https://issues.apache.org/jira/browse/LUCENE-9328 for related > problem. > Make sure you obtain them once per segment, when leaf reader is injected. > Recently there are some new method(s) for {!terms} I'm wondering if any of > them might solve the problem. > > On Thu, May 14, 2020 at 2:36 PM Rudenko, Artur <artur.rude...@verint.com> > wrote: > > > Hi, > > We have a requirement of implementing a boolean filter with up to 500k > > values. > > > > We took the approach of post filter. > > > > Our environment has 7 servers of 128gb ram and 64cpus each server. We > > have 20-40m very large documents. Each solr instance has 64 shards > > with 2 replicas and JVM memory xms and xmx set to 31GB. > > > > We are seeing that using single post filter with 1000 on 20m documents > > takes about 4.5 seconds. > > > > Logic in our collect method: > > numericDocValues = > > reader.getNumericDocValues(FileFilterPostQuery.this.metaField); > > > > if (numericDocValues != null && > > numericDocValues.advanceExact(docNumber)) { > > longVal = numericDocValues.longValue(); > > } else { > > return; > > } > > } > > > > if (numericValuesSet.contains(longVal)) { > > super.collect(docNumber); > > } > > > > > > Is it the best we can get? > > > > > > Thanks, > > Artur Rudenko > > > > > > This electronic message may contain proprietary and confidential > > information of Verint Systems Inc., its affiliates and/or > > subsidiaries. The information is intended to be for the use of the > > individual(s) or > > entity(ies) named above. If you are not the intended recipient (or > > authorized to receive this e-mail for the intended recipient), you may > > not use, copy, disclose or distribute to anyone this message or any > > information contained in this message. If you have received this > > electronic message in error, please notify us by replying to this e-mail. > > > > > -- > Sincerely yours > Mikhail Khludnev > > > This electronic message may contain proprietary and confidential > information of Verint Systems Inc., its affiliates and/or subsidiaries. The > information is intended to be for the use of the individual(s) or > entity(ies) named above. If you are not the intended recipient (or > authorized to receive this e-mail for the intended recipient), you may not > use, copy, disclose or distribute to anyone this message or any information > contained in this message. If you have received this electronic message in > error, please notify us by replying to this e-mail. > -- Sincerely yours Mikhail Khludnev