Sure, I can do that. Let me create an index with a few million docs, call RTG with a few million iterations on it and note the times between 7.x and 8.x. I assume this should be sufficient (?)
On Wed, May 31, 2023 at 5:19 PM Jan Høydahl <jan....@cominvent.com> wrote: > Would be nice to determine whether RTG is orders of magnitude slower in > 8.x than 7.x and is the main culprit. Then we could isolate the testing to > RTG only and not involce Atomic Update? > > Jan > > > 31. mai 2023 kl. 21:33 skrev Rahul Goswami <rahul196...@gmail.com>: > > > > I don’t have any nested documents. And the results are consistent across > > multiple runs. I tried looking for similar issues in the mailing list, > but > > couldn’t find anything relevant . So if you do happen to find any JIRAs > > addressing it that would be really helpful (thanks!). > > > > To Jan’s question about RTG taking more time in Solr 8.x, I can say with > > good certainty that this is the case. Although it does look into > > transaction logs first, thread dumps suggest that it is the next phase > > (when it doesn't find the doc in tlog) which seems to be time consuming . > > It tries to look up the document via the current searcher > > (searcher.getFirstMatch() ). Proceeding further in the stack, it is this > > call where many threads are spending time: > > > > > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/codecs/blocktree/SegmentTermsEnum.java#L485 > > > > Although this call is the same in 7.7.2 and 8.11.1 quite likely > > something changed in Lucene's FST.java which is causing the slowness. I > am > > trying to dig further and might also ask folks on the Lucene mailing > list. > > Thanks. > > > > > > > > On Wed, May 31, 2023 at 11:36 AM Srijan <shree...@gmail.com> wrote: > > > >> I would love some profiling as well. I know 8.8 or 8.9 had some > performance > >> problems with atomic update but this was later addressed. I cant find > the > >> jira atm though. Also I am on 8.11.1 and atomic update is not an issue > for > >> me. > >> > >> By the way, do you happen to have nested docs? > >> > >> > >> On Wed, May 31, 2023, 11:20 Jan Høydahl <jan....@cominvent.com> wrote: > >> > >>> Hi > >>> > >>> MMap is most important for searching. Indexing bypasses the cache by > >> using > >>> direct IO. > >>> > >>> I have noticed slow real time get on Solr 8.x during atomic update > >> myself. > >>> Would be interesting with a comparison with profiling. RTG gets the > >>> document from transaction log I believe? Could there be some RTG > changes > >> in > >>> 8.x that caused such slowdown? > >>> > >>> Jan Høydahl > >>> > >>>> 31. mai 2023 kl. 16:57 skrev Rahul Goswami <rahul196...@gmail.com>: > >>>> > >>>> Thanks for the response Shawn. We are using Windows server with > pretty > >>> huge > >>>> indexes (multiple TB cores). With Mmap, I have observed that the > >> machine > >>>> just completely freezes with high CPU and memory usage to a point > where > >>> it > >>>> becomes impossible to even connect to it. SimpleFS works out well for > >> us > >>> in > >>>> this case. > >>>> > >>>> As noted in my first email, even with SimpleFS, Solr 7 completes the > >>> crawl > >>>> in nearly 1/5th the time taken in Solr 8. Hence there should be > >> something > >>>> OUTSIDE the directory factory in the code which is causing this. > >>>> > >>>> Thanks, > >>>> Rahul > >>>> > >>>> > >>>>> On Tue, May 30, 2023 at 10:47 PM Shawn Heisey <apa...@elyograg.org> > >>> wrote: > >>>>> > >>>>>> On 5/30/23 15:34, Rahul Goswami wrote: > >>>>>> Environment details: - Java 11 on Windows server - Xms1536m Xmx3072m > >> - > >>>>>> Indexing client code running 15 parallel threads indexing in batches > >> of > >>>>>> 1000 - using SimpleFSDirectoryFactory (since Mmap doesn't quite work > >>>>>> well on Windows for our index sizes which commonly run north of 1 > TB) > >>>>> > >>>>> Don't change the directoryFactory. You *WANT* Solr to use MMAP for > >> your > >>>>> indexes. Not using MMAP is likely to slow things down considerably. > >>>>> MMAP should work just fine on 64-bit Windows with 64-bit Java. Which > >> of > >>>>> course requires 64-bit hardware. > >>>>> > >>>>> 32 bit systems and software cannot properly deal with data larger > than > >>>>> about 2GB. > >>>>> > >>>>> Thanks, > >>>>> Shawn > >>>>> > >>> > >> > >