Yes, the numbers should be similar. Thanks for digging in. If there is a big regression, it could be worthwile running the same test on v8.0, 8.1, 8.2 etc to plot in what version the regression happened. Perhaps it could even be scripted :)
Solr unfortunately do not yet have a comprehensive official nightly benchmark run that would catch such regressions. But the community is working on it, there are some tests on http://mostly.cool maintained by Ishan, but I have no idea how to add new tests.. Jan > 31. mai 2023 kl. 23:50 skrev Rahul Goswami <rahul196...@gmail.com>: > > Sure, I can do that. Let me create an index with a few million docs, call > RTG with a few million iterations on it and note the times between 7.x and > 8.x. I assume this should be sufficient (?) > > On Wed, May 31, 2023 at 5:19 PM Jan Høydahl <jan....@cominvent.com> wrote: > >> Would be nice to determine whether RTG is orders of magnitude slower in >> 8.x than 7.x and is the main culprit. Then we could isolate the testing to >> RTG only and not involce Atomic Update? >> >> Jan >> >>> 31. mai 2023 kl. 21:33 skrev Rahul Goswami <rahul196...@gmail.com>: >>> >>> I don’t have any nested documents. And the results are consistent across >>> multiple runs. I tried looking for similar issues in the mailing list, >> but >>> couldn’t find anything relevant . So if you do happen to find any JIRAs >>> addressing it that would be really helpful (thanks!). >>> >>> To Jan’s question about RTG taking more time in Solr 8.x, I can say with >>> good certainty that this is the case. Although it does look into >>> transaction logs first, thread dumps suggest that it is the next phase >>> (when it doesn't find the doc in tlog) which seems to be time consuming . >>> It tries to look up the document via the current searcher >>> (searcher.getFirstMatch() ). Proceeding further in the stack, it is this >>> call where many threads are spending time: >>> >>> >> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/codecs/blocktree/SegmentTermsEnum.java#L485 >>> >>> Although this call is the same in 7.7.2 and 8.11.1 quite likely >>> something changed in Lucene's FST.java which is causing the slowness. I >> am >>> trying to dig further and might also ask folks on the Lucene mailing >> list. >>> Thanks. >>> >>> >>> >>> On Wed, May 31, 2023 at 11:36 AM Srijan <shree...@gmail.com> wrote: >>> >>>> I would love some profiling as well. I know 8.8 or 8.9 had some >> performance >>>> problems with atomic update but this was later addressed. I cant find >> the >>>> jira atm though. Also I am on 8.11.1 and atomic update is not an issue >> for >>>> me. >>>> >>>> By the way, do you happen to have nested docs? >>>> >>>> >>>> On Wed, May 31, 2023, 11:20 Jan Høydahl <jan....@cominvent.com> wrote: >>>> >>>>> Hi >>>>> >>>>> MMap is most important for searching. Indexing bypasses the cache by >>>> using >>>>> direct IO. >>>>> >>>>> I have noticed slow real time get on Solr 8.x during atomic update >>>> myself. >>>>> Would be interesting with a comparison with profiling. RTG gets the >>>>> document from transaction log I believe? Could there be some RTG >> changes >>>> in >>>>> 8.x that caused such slowdown? >>>>> >>>>> Jan Høydahl >>>>> >>>>>> 31. mai 2023 kl. 16:57 skrev Rahul Goswami <rahul196...@gmail.com>: >>>>>> >>>>>> Thanks for the response Shawn. We are using Windows server with >> pretty >>>>> huge >>>>>> indexes (multiple TB cores). With Mmap, I have observed that the >>>> machine >>>>>> just completely freezes with high CPU and memory usage to a point >> where >>>>> it >>>>>> becomes impossible to even connect to it. SimpleFS works out well for >>>> us >>>>> in >>>>>> this case. >>>>>> >>>>>> As noted in my first email, even with SimpleFS, Solr 7 completes the >>>>> crawl >>>>>> in nearly 1/5th the time taken in Solr 8. Hence there should be >>>> something >>>>>> OUTSIDE the directory factory in the code which is causing this. >>>>>> >>>>>> Thanks, >>>>>> Rahul >>>>>> >>>>>> >>>>>>> On Tue, May 30, 2023 at 10:47 PM Shawn Heisey <apa...@elyograg.org> >>>>> wrote: >>>>>>> >>>>>>>> On 5/30/23 15:34, Rahul Goswami wrote: >>>>>>>> Environment details: - Java 11 on Windows server - Xms1536m Xmx3072m >>>> - >>>>>>>> Indexing client code running 15 parallel threads indexing in batches >>>> of >>>>>>>> 1000 - using SimpleFSDirectoryFactory (since Mmap doesn't quite work >>>>>>>> well on Windows for our index sizes which commonly run north of 1 >> TB) >>>>>>> >>>>>>> Don't change the directoryFactory. You *WANT* Solr to use MMAP for >>>> your >>>>>>> indexes. Not using MMAP is likely to slow things down considerably. >>>>>>> MMAP should work just fine on 64-bit Windows with 64-bit Java. Which >>>> of >>>>>>> course requires 64-bit hardware. >>>>>>> >>>>>>> 32 bit systems and software cannot properly deal with data larger >> than >>>>>>> about 2GB. >>>>>>> >>>>>>> Thanks, >>>>>>> Shawn >>>>>>> >>>>> >>>> >> >>