Re: Filtering large amount of values

Mikhail Khludnev Sun, 17 May 2020 08:13:24 -0700

On Sun, May 17, 2020 at 4:57 PM Rudenko, Artur <artur.rude...@verint.com>
wrote:


> Hi Mikhail,
>
> Thank you for the help, with you suggestion we actually managed to improve
> the results.
>
> We now get and store the docValues in this method instead of inside
> collect() method:
>
> @Override
> protected void doSetNextReader(LeafReaderContext context) throws
> IOException {
>     super.doSetNextReader(context);
>     sortedDocValues = DocValues.getSorted(context.reader(),
> FileFilterPostQuery.this.metaField);
> }
>
> We see a big improvement. Is this the most efficient way?
>
Who knows...

Since it's a post filter, we have to return "false" in getCache method. Is
> there a way to implement it with cache?
>
if getCache()==true this query will be used as standalone query ignoring
filterCollector. In this case retrieved docs will be cached.


> Thanks,
> Artur Rudenko
>
> -----Original Message-----
> From: Mikhail Khludnev <m...@apache.org>
> Sent: Thursday, May 14, 2020 2:57 PM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Filtering large amount of values
>
> Hi, Artur.
>
> Please, don't tell me that you obtain docValues per every doc? It's deadly
> slow see https://issues.apache.org/jira/browse/LUCENE-9328 for related
> problem.
> Make sure you obtain them once per segment, when leaf reader is injected.
> Recently there are some new method(s) for {!terms} I'm wondering if any of
> them might solve the problem.
>
> On Thu, May 14, 2020 at 2:36 PM Rudenko, Artur <artur.rude...@verint.com>
> wrote:
>
> > Hi,
> > We have a requirement of implementing a boolean filter with up to 500k
> > values.
> >
> > We took the approach of post filter.
> >
> > Our environment has 7 servers of 128gb ram and 64cpus each server. We
> > have 20-40m very large documents. Each solr instance has 64 shards
> > with 2 replicas and JVM memory xms and xmx set to 31GB.
> >
> > We are seeing that using single post filter with 1000 on 20m documents
> > takes about 4.5 seconds.
> >
> > Logic in our collect method:
> > numericDocValues =
> > reader.getNumericDocValues(FileFilterPostQuery.this.metaField);
> >
> >                     if (numericDocValues != null &&
> > numericDocValues.advanceExact(docNumber)) {
> >                         longVal = numericDocValues.longValue();
> >                     } else {
> >                         return;
> >                     }
> >                 }
> >
> >                 if (numericValuesSet.contains(longVal)) {
> >                     super.collect(docNumber);
> >                 }
> >
> >
> > Is it the best we can get?
> >
> >
> > Thanks,
> > Artur Rudenko
> >
> >
> > This electronic message may contain proprietary and confidential
> > information of Verint Systems Inc., its affiliates and/or
> > subsidiaries. The information is intended to be for the use of the
> > individual(s) or
> > entity(ies) named above. If you are not the intended recipient (or
> > authorized to receive this e-mail for the intended recipient), you may
> > not use, copy, disclose or distribute to anyone this message or any
> > information contained in this message. If you have received this
> > electronic message in error, please notify us by replying to this e-mail.
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Filtering large amount of values

Reply via email to