Hi! This slowdown is expected, see LUCENE-9477 <https://issues.apache.org/jira/browse/LUCENE-9447> & LUCENE-9486 <https://issues.apache.org/jira/browse/LUCENE-9486>.The trade-off here is index size vs fetch time, we have introduced a more aggressive compression strategy for stored fields with the cost of a small increase in fetch times. In your example, you can see that the index size has been reduced around 20%.
If your workflow depends on those fetch times, you can always override the stored field format through a filter codec and add your custom compression parameters? Cheers, Ignacio On Sat, Jan 23, 2021 at 8:36 AM Rob Audenaerde <rob.audenae...@gmail.com> wrote: > I did some testing for you :) > > I modified your code to run in a JMH benchmark; and changed the number of > retrieved docs to 1000 out of 1M in the index. This is what I got: > > Lucene 7.5 > Benchmark Mode Cnt Score Error Units > DocRetrievalBenchmark.retrieveDocuments thrpt 4 37.147 ± 6.218 ops/s > > Lucene 8.7 > Benchmark Mode Cnt Score Error Units > DocRetrievalBenchmark.retrieveDocuments thrpt 4 18.680 ± 5.755 ops/s > > This is much in line with your observations, (lucene 8.7 seems almost twice > as slow) so something is going on when running out-of-the-box. > > The code can be found : (not really beautiful, but gets the job done. If > you want to switch lucene-versions, edit the pom and make sure to set the > proper index version) > https://gist.github.com/d2a-raudenaerde/93a490e5b0d17b2fa88862473429aeb3 > > JMH details: > # JMH version: 1.21 > # VM version: JDK 11.0.9.1, OpenJDK 64-Bit Server VM, > 11.0.9.1+1-Ubuntu-0ubuntu1.20.04 > # VM invoker: /usr/lib/jvm/java-11-openjdk-amd64/bin/java > # VM options: -Xms2G -Xmx2G > # Warmup: 2 iterations, 10 s each > # Measurement: 4 iterations, 10 s each > # Timeout: 10 min per iteration > # Threads: 1 thread, will synchronize iterations > # Benchmark mode: Throughput, ops/time > # Benchmark: org.audenaerde.lucene.DocRetrievalBenchmark.retrieveDocuments > > > On Fri, Jan 22, 2021 at 4:22 PM Martynas L <martynas....@gmail.com> wrote: > > > Just played with my reading sample. I do not have a goal to show the > exact > > numbers, but it is a fact that document retrieval IndexSearcher.doc(int) > is > > much slower. > > All our performance tests showed performance degradation after changing > to > > 8.7.0, even without measurement we can "see/feel" the operations > involving > > documents retrieval became slower. > > > > > > > > On Fri, Jan 22, 2021 at 4:48 PM Rob Audenaerde <rob.audenae...@gmail.com > > > > wrote: > > > > > Hi Martynas > > > > > > How did you measure that? > > > > > > I ask, because writing a good benchmark is not an easy task, since > there > > > are so many factors (class loading times, JIT effects, etc). You should > > use > > > Java Microbenchmark Harness or similar; and set up a random document > > > retrieval task, with warm-up etc.etc. > > > > > > (I'm not aware of any big slowdowns, but as you see them, the best way > is > > > to build a robust benchmark and then start comparing) > > > > > > -Rob > > > > > > > > > On Fri, Jan 22, 2021 at 3:43 PM Martynas L <martynas....@gmail.com> > > wrote: > > > > > > > Even retrieving single document 8.7.0 is more than x2 slower > > > > > > > > On Fri, Jan 22, 2021 at 2:28 PM Diego Ceccarelli (BLOOMBERG/ LONDON) > < > > > > dceccarel...@bloomberg.net> wrote: > > > > > > > > > > I think it will be similar ratio retrieving any number of > > documents. > > > > > > > > > > I'm not sure this is true, if you retrieve a huge amount of > documents > > > you > > > > > might cause troubles to the GC. > > > > > > > > > > From: java-user@lucene.apache.org At: 01/22/21 12:11:19To: > > > > > java-user@lucene.apache.org > > > > > Subject: Re: Slower document retrieval in 8.7.0 comparing to 7.5.0 > > > > > > > > > > The accent should not be on retrieved documents number, but on the > > > > duration > > > > > ratio - 8.7.0 is 3 times slower. I think it will be similar ratio > > > > > retrieving any number of documents. > > > > > > > > > > On Fri, Jan 22, 2021 at 1:39 PM Rob Audenaerde < > > > rob.audenae...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > Hi Martrynas, > > > > > > > > > > > > In your sample code you are retrieving all (1 million!) documents > > > from > > > > > the > > > > > > index, that surely is not a good match for lucene :) > > > > > > > > > > > > Is that a good reflection of your use-case? > > > > > > > > > > > > On Fri, Jan 22, 2021 at 9:52 AM Martynas L < > martynas....@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > Please see the sample at > > > > > > > > > > > > > > > > https://drive.google.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE > > > > > > > > > > > > > > IndexGenerator - creates a dummy index. > > > > > > > IndexReader - retrieves documents - duration time with 7.5.0 > > > version > > > > is > > > > > > > ~2s, while ~6s with 8.7.0 > > > > > > > > > > > > > > Regards, > > > > > > > Martynas > > > > > > > > > > > > > > > > > > > > > On Thu, Jan 21, 2021 at 8:21 PM Rob Audenaerde < > > > > > rob.audenae...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > There is no attachment in the previous email that I can see? > > > Maybe > > > > > you > > > > > > > can > > > > > > > > post it online? > > > > > > > > > > > > > > > > On Thu, Jan 21, 2021 at 4:54 PM Martynas L < > > > martynas....@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > Are there any comments on this issue? > > > > > > > > > If there is no workaround, we will be forced to rollback to > > the > > > > > 7.5.0 > > > > > > > > > version. > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > Martynas > > > > > > > > > > > > > > > > > > On Tue, Jan 12, 2021 at 12:27 PM Martynas L < > > > > > martynas....@gmail.com> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > Please see attached sample. > > > > > > > > > > IndexGenerator - creates a dummy index. > > > > > > > > > > IndexReader - retrieves documents - duration time with > > 7.5.0 > > > > > > version > > > > > > > is > > > > > > > > > > ~2s, while ~6s with 8.7.0 > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Martynas > > > > > > > > > > > > > > > > > > > > On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore < > > > > > > v.dam...@gmail.com > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > >> I think it would be useful to have an example of a > > document > > > > and, > > > > > > if > > > > > > > > > >> possible, an example of query that takes too long. > > > > > > > > > >> > > > > > > > > > >> On Mon, Dec 21, 2020 at 1:47 PM Martynas L < > > > > > > martynas....@gmail.com> > > > > > > > > > >> wrote: > > > > > > > > > >> > > > > > > > > > >> > Hello, > > > > > > > > > >> > > > > > > > > > > >> > I am sorry for the delay. > > > > > > > > > >> > > > > > > > > > > >> > Not sure what you mean by "workload". We have a > > > performance > > > > > > tests, > > > > > > > > > which > > > > > > > > > >> > started failing after upgrading to 8.7.0. > > > > > > > > > >> > So I just tried to query the index (built form the > same > > > > > source) > > > > > > to > > > > > > > > get > > > > > > > > > >> all > > > > > > > > > >> > documents and compare the performance with 7.5.0. > > > > > > > > > >> > > > > > > > > > > >> > Document "size" is a sum of all stored string lengths > > > > (3402519 > > > > > > > > > >> documents): > > > > > > > > > >> > > > > > > > > > > >> > doc size 903 - 88s vs 22s > > > > > > > > > >> > > > > > > > > > > >> > doc size 36 (only one field loaded, used > > > searcher.doc(docID, > > > > > > > > > >> > Collections.singleton("fieldName"))) - 78s vs 16s > > > > > > > > > >> > > > > > > > > > > >> > doc size 439 (some fields made not stored) - 46s vs > > 14.5s > > > > > > > > > >> > > > > > > > > > > >> > Best regards, > > > > > > > > > >> > Martynas > > > > > > > > > >> > > > > > > > > > > >> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand < > > > > > jpou...@gmail.com > > > > > > > > > > > > > > > > wrote: > > > > > > > > > >> > > > > > > > > > > >> > > Hello Martynas, > > > > > > > > > >> > > > > > > > > > > > >> > > There have indeed been changes related to stored > > fields > > > in > > > > > > 8.7. > > > > > > > > What > > > > > > > > > >> does > > > > > > > > > >> > > your workload look like and how large are your > > documents > > > > on > > > > > > > > average? > > > > > > > > > >> > > > > > > > > > > > >> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas L < > > > > > > > martynas....@gmail.com > > > > > > > > > > > > > > > > > > >> > wrote: > > > > > > > > > >> > > > > > > > > > > > >> > > > Hi, > > > > > > > > > >> > > > We've migrated from 7.5.0 to 8.7.0 and find out > that > > > the > > > > > > index > > > > > > > > > >> > > "searching" > > > > > > > > > >> > > > is significantly (4-5 times) slower in the latest > > > > version. > > > > > > > > > >> > > > It seems that > > > > > > > > > >> > > > org.apache.lucene.search.IndexSearcher#doc(int) > > > > > > > > > >> > > > is slower. > > > > > > > > > >> > > > > > > > > > > > > >> > > > Is it possible to have similar performance with > > 8.7.0? > > > > > > > > > >> > > > > > > > > > > > > >> > > > Best regards, > > > > > > > > > >> > > > Martynas > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > -- > > > > > > > > > >> > > Adrien > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> -- > > > > > > > > > >> Vincenzo D'Amore > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >