Re: Slower document retrieval in 8.7.0 comparing to 7.5.0

Ignacio Vera Sat, 23 Jan 2021 01:08:49 -0800

Hi!

This slowdown is expected, see LUCENE-9477
<https://issues.apache.org/jira/browse/LUCENE-9447> & LUCENE-9486
<https://issues.apache.org/jira/browse/LUCENE-9486>.The trade-off here is
index size vs fetch time, we have introduced a more aggressive compression
strategy for stored fields with the cost of a small increase in fetch
times. In your example, you can see that the index size has been reduced
around 20%.


If your workflow depends on those fetch times, you can always override the
stored field format through a filter codec and add your custom
compression parameters?

Cheers,

Ignacio




On Sat, Jan 23, 2021 at 8:36 AM Rob Audenaerde <rob.audenae...@gmail.com>
wrote:

> I did some testing for you :)
>
> I modified your code to run in a JMH benchmark; and changed the number of
> retrieved docs to 1000 out of 1M in the index. This is what I got:
>
> Lucene 7.5
> Benchmark                                 Mode  Cnt   Score   Error  Units
> DocRetrievalBenchmark.retrieveDocuments  thrpt    4  37.147 ± 6.218  ops/s
>
> Lucene 8.7
> Benchmark                                 Mode  Cnt   Score   Error  Units
> DocRetrievalBenchmark.retrieveDocuments  thrpt    4  18.680 ± 5.755  ops/s
>
> This is much in line with your observations, (lucene 8.7 seems almost twice
> as slow) so something is going on when running out-of-the-box.
>
> The code can be found : (not really beautiful, but gets the job done. If
> you want to switch lucene-versions, edit the pom and make sure to set the
> proper index version)
> https://gist.github.com/d2a-raudenaerde/93a490e5b0d17b2fa88862473429aeb3
>
> JMH details:
> # JMH version: 1.21
> # VM version: JDK 11.0.9.1, OpenJDK 64-Bit Server VM,
> 11.0.9.1+1-Ubuntu-0ubuntu1.20.04
> # VM invoker: /usr/lib/jvm/java-11-openjdk-amd64/bin/java
> # VM options: -Xms2G -Xmx2G
> # Warmup: 2 iterations, 10 s each
> # Measurement: 4 iterations, 10 s each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: org.audenaerde.lucene.DocRetrievalBenchmark.retrieveDocuments
>
>
> On Fri, Jan 22, 2021 at 4:22 PM Martynas L <martynas....@gmail.com> wrote:
>
> > Just played with my reading sample. I do not have a goal to show the
> exact
> > numbers, but it is a fact that document retrieval IndexSearcher.doc(int)
> is
> > much slower.
> > All our performance tests showed performance degradation after changing
> to
> > 8.7.0, even without measurement we can "see/feel" the operations
> involving
> > documents retrieval became slower.
> >
> >
> >
> > On Fri, Jan 22, 2021 at 4:48 PM Rob Audenaerde <rob.audenae...@gmail.com
> >
> > wrote:
> >
> > > Hi Martynas
> > >
> > > How did you measure that?
> > >
> > > I ask, because writing a good benchmark is not an easy task,  since
> there
> > > are so many factors (class loading times, JIT effects, etc). You should
> > use
> > > Java Microbenchmark Harness or similar; and set up a random document
> > > retrieval task, with warm-up etc.etc.
> > >
> > > (I'm not aware of any big slowdowns, but as you see them, the best way
> is
> > > to build a robust benchmark and then start comparing)
> > >
> > > -Rob
> > >
> > >
> > > On Fri, Jan 22, 2021 at 3:43 PM Martynas L <martynas....@gmail.com>
> > wrote:
> > >
> > > > Even retrieving single document 8.7.0 is more than x2 slower
> > > >
> > > > On Fri, Jan 22, 2021 at 2:28 PM Diego Ceccarelli (BLOOMBERG/ LONDON)
> <
> > > > dceccarel...@bloomberg.net> wrote:
> > > >
> > > > > >  I think it will be similar ratio retrieving any number of
> > documents.
> > > > >
> > > > > I'm not sure this is true, if you retrieve a huge amount of
> documents
> > > you
> > > > > might cause troubles to the GC.
> > > > >
> > > > > From: java-user@lucene.apache.org At: 01/22/21 12:11:19To:
> > > > > java-user@lucene.apache.org
> > > > > Subject: Re: Slower document retrieval in 8.7.0 comparing to 7.5.0
> > > > >
> > > > > The accent should not be on retrieved documents number, but on the
> > > > duration
> > > > > ratio - 8.7.0 is 3 times slower. I think it will be similar ratio
> > > > > retrieving any number of documents.
> > > > >
> > > > > On Fri, Jan 22, 2021 at 1:39 PM Rob Audenaerde <
> > > rob.audenae...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Martrynas,
> > > > > >
> > > > > > In your sample code you are retrieving all (1 million!) documents
> > > from
> > > > > the
> > > > > > index, that surely is not a good match for lucene  :)
> > > > > >
> > > > > > Is that a good reflection of your use-case?
> > > > > >
> > > > > > On Fri, Jan 22, 2021 at 9:52 AM Martynas L <
> martynas....@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > >  Please see the sample at
> > > > > > >
> > > > >
> > >
> https://drive.google.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE
> > > > > > >
> > > > > > > IndexGenerator - creates a dummy index.
> > > > > > > IndexReader - retrieves documents - duration time with 7.5.0
> > > version
> > > > is
> > > > > > > ~2s, while ~6s with 8.7.0
> > > > > > >
> > > > > > > Regards,
> > > > > > > Martynas
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jan 21, 2021 at 8:21 PM Rob Audenaerde <
> > > > > rob.audenae...@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > There is no attachment in the previous email that I can see?
> > > Maybe
> > > > > you
> > > > > > > can
> > > > > > > > post it online?
> > > > > > > >
> > > > > > > > On Thu, Jan 21, 2021 at 4:54 PM Martynas L <
> > > martynas....@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > Are there any comments on this issue?
> > > > > > > > > If there is no workaround, we will be forced to rollback to
> > the
> > > > > 7.5.0
> > > > > > > > > version.
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Martynas
> > > > > > > > >
> > > > > > > > > On Tue, Jan 12, 2021 at 12:27 PM Martynas L <
> > > > > martynas....@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > Please see attached sample.
> > > > > > > > > > IndexGenerator - creates a dummy index.
> > > > > > > > > > IndexReader - retrieves documents - duration time with
> > 7.5.0
> > > > > > version
> > > > > > > is
> > > > > > > > > > ~2s, while ~6s with 8.7.0
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Martynas
> > > > > > > > > >
> > > > > > > > > > On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <
> > > > > > v.dam...@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> I think it would be useful to have an example of a
> > document
> > > > and,
> > > > > > if
> > > > > > > > > >> possible, an example of query that takes too long.
> > > > > > > > > >>
> > > > > > > > > >> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <
> > > > > > martynas....@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hello,
> > > > > > > > > >> >
> > > > > > > > > >> > I am sorry for the delay.
> > > > > > > > > >> >
> > > > > > > > > >> > Not sure what you mean by "workload". We have a
> > > performance
> > > > > > tests,
> > > > > > > > > which
> > > > > > > > > >> > started failing after upgrading to 8.7.0.
> > > > > > > > > >> > So I just tried to query the index (built form the
> same
> > > > > source)
> > > > > > to
> > > > > > > > get
> > > > > > > > > >> all
> > > > > > > > > >> > documents and compare the performance with 7.5.0.
> > > > > > > > > >> >
> > > > > > > > > >> > Document "size" is a sum of all stored string lengths
> > > > (3402519
> > > > > > > > > >> documents):
> > > > > > > > > >> >
> > > > > > > > > >> > doc size 903 - 88s vs 22s
> > > > > > > > > >> >
> > > > > > > > > >> > doc size 36 (only one field loaded, used
> > > searcher.doc(docID,
> > > > > > > > > >> > Collections.singleton("fieldName"))) - 78s vs 16s
> > > > > > > > > >> >
> > > > > > > > > >> > doc size 439 (some fields made not stored) - 46s vs
> > 14.5s
> > > > > > > > > >> >
> > > > > > > > > >> > Best regards,
> > > > > > > > > >> > Martynas
> > > > > > > > > >> >
> > > > > > > > > >> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand <
> > > > > jpou...@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > Hello Martynas,
> > > > > > > > > >> > >
> > > > > > > > > >> > > There have indeed been changes related to stored
> > fields
> > > in
> > > > > > 8.7.
> > > > > > > > What
> > > > > > > > > >> does
> > > > > > > > > >> > > your workload look like and how large are your
> > documents
> > > > on
> > > > > > > > average?
> > > > > > > > > >> > >
> > > > > > > > > >> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L <
> > > > > > > martynas....@gmail.com
> > > > > > > > >
> > > > > > > > > >> > wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > > Hi,
> > > > > > > > > >> > > > We've migrated from 7.5.0 to 8.7.0 and find out
> that
> > > the
> > > > > > index
> > > > > > > > > >> > > "searching"
> > > > > > > > > >> > > > is significantly (4-5 times) slower in the latest
> > > > version.
> > > > > > > > > >> > > > It seems that
> > > > > > > > > >> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> > > > > > > > > >> > > > is slower.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Is it possible to have similar performance with
> > 8.7.0?
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Best regards,
> > > > > > > > > >> > > > Martynas
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > > --
> > > > > > > > > >> > > Adrien
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> Vincenzo D'Amore
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Slower document retrieval in 8.7.0 comparing to 7.5.0

Reply via email to