Sorry to have dropped the ball on this past couple of days. I created the
JIRA with information shared on this mail thread.

https://issues.apache.org/jira/browse/SOLR-16838

Running further benchmarks reveals that the slowness is in
the searcher.getFirstMatch() call inside getInputDocument() .

*Solr 7.7.2*
2023-06-01 21:17:34.492 WARN  (qtp1034094674-41) [   x:techproducts]
o.a.s.u.p.LogUpdateProcessorFactory RTG timing stats:: tlogFetchTime:
508053 ; *searcherFetchTime: 3229011 *

*Solr 8*
2023-06-01 20:43:31.767 WARN  (qtp391506011-56) [   x:techproducts]
o.a.s.u.p.LogUpdateProcessorFactory RTG timing stats:: tlogFetchTime:
410873 ; *searcherFetchTime: 33296008 *

Starting a discussion on the Lucene list as well.

Thanks,
Rahul


On Thu, Jun 1, 2023 at 9:17 PM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> > Solr unfortunately do not yet have a comprehensive official nightly
> benchmark run that would catch such regressions. But the community is
> working on it, there are some tests on http://mostly.cool maintained by
> Ishan, but I have no idea how to add new tests..
>
> I'll publish steps to add new tests soon.
>
> For now, check out solr-bench from GitHub.com/fullstorydev/solr-bench
>
> Run an existing test as follows:
>
> mvn clean compile assembly:single
>
> ./stress.sh -c <commitid> suites/cluster-test.json
>
> Results will be written out in suites/results/.
>
>
>
> On Fri, 2 Jun, 2023, 2:44 am Jan Høydahl, <jan....@cominvent.com> wrote:
>
> > Hi,
> >
> > Next step I believe is for you to open a Solr JIRA issue for this
> > regression, as it is obviously an issue.
> > Then we can use that JIRA to coordinate, even if the fix ends up being in
> > Lucene code.
> >
> > Once we have a Solr JIRA, it may make sense to open a mail thread on
> > users@lucene list to get some Lucene expertice involved in digging.
> >
> > Jan
> >
> > > 1. jun. 2023 kl. 18:15 skrev Rahul Goswami <rahul196...@gmail.com>:
> > >
> > > So I ran the test to index 5 million docs this time (batches of 1000
> docs
> > > in 15 parallel threads). To eliminate the network overhead and get as
> > > accurate a benchmark as possible, I used an AtomicLong to measure the
> > time
> > > around the RTG call in DistibutedUpdateProcessor across all calls (
> > >
> >
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.7.2/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java#L1416
> > ).
> > > Did this for both Solr 7.7.2 and Solr 8.11.1 and built the
> solr-core.jar
> > to
> > > replace it in the solr webapp lib.
> > >
> > > RTG in Solr 8.x is ~10x slower. Here are the numbers (times are in
> > > milliseconds):
> > >
> > > *Solr 7.7.2 *: 2023-06-01 15:39:48.272 WARN  (qtp1034094674-24) [
> > > x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *Total rtg
> > time:7293486*
> > > *Solr 8.11.1: *2023-06-01 04:46:24.758 WARN  (qtp391506011-71) [
> > > x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *Total rtg
> > > time:72029877*
> > >
> > > Let me know if I can help further with debugging. At this point I am
> > > suspecting the slowness to be on Lucene's side with the findTargetArc()
> > > method in FST.java. I am running more analysis on my side in parallel
> and
> > > will share if/when I find more.
> > >
> > > -Rahul
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Jun 1, 2023 at 3:53 AM Jan Høydahl <jan....@cominvent.com>
> > wrote:
> > >
> > >> Yes, the numbers should be similar. Thanks for digging in. If there
> is a
> > >> big regression, it could be worthwile running the same test on v8.0,
> > 8.1,
> > >> 8.2 etc to plot in what version the regression happened. Perhaps it
> > could
> > >> even be scripted :)
> > >>
> > >> Solr unfortunately do not yet have a comprehensive official nightly
> > >> benchmark run that would catch such regressions. But the community is
> > >> working on it, there are some tests on http://mostly.cool maintained
> by
> > >> Ishan, but I have no idea how to add new tests..
> > >>
> > >> Jan
> > >>
> > >>> 31. mai 2023 kl. 23:50 skrev Rahul Goswami <rahul196...@gmail.com>:
> > >>>
> > >>> Sure, I can do that. Let me create an index with a few million docs,
> > call
> > >>> RTG with a few million iterations on it and note the times between
> 7.x
> > >> and
> > >>> 8.x. I assume this should be sufficient (?)
> > >>>
> > >>> On Wed, May 31, 2023 at 5:19 PM Jan Høydahl <jan....@cominvent.com>
> > >> wrote:
> > >>>
> > >>>> Would be nice to determine whether RTG is orders of magnitude slower
> > in
> > >>>> 8.x than 7.x and is the main culprit.  Then we could isolate the
> > >> testing to
> > >>>> RTG only and not involce Atomic Update?
> > >>>>
> > >>>> Jan
> > >>>>
> > >>>>> 31. mai 2023 kl. 21:33 skrev Rahul Goswami <rahul196...@gmail.com
> >:
> > >>>>>
> > >>>>> I don’t have any nested documents. And the results are consistent
> > >> across
> > >>>>> multiple runs. I tried looking for similar issues in the mailing
> > list,
> > >>>> but
> > >>>>> couldn’t find anything relevant . So if you do happen to find any
> > JIRAs
> > >>>>> addressing it that would be really helpful (thanks!).
> > >>>>>
> > >>>>> To Jan’s question about RTG taking more time in Solr 8.x, I can say
> > >> with
> > >>>>> good certainty that this is the case. Although it does look into
> > >>>>> transaction logs first, thread dumps suggest that it is the next
> > phase
> > >>>>> (when it doesn't find the doc in tlog) which seems to be time
> > >> consuming .
> > >>>>> It tries to look up the document via the current searcher
> > >>>>> (searcher.getFirstMatch() ). Proceeding further in the stack, it is
> > >> this
> > >>>>> call where many threads are spending time:
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/codecs/blocktree/SegmentTermsEnum.java#L485
> > >>>>>
> > >>>>> Although this call is the same in 7.7.2 and 8.11.1 quite likely
> > >>>>> something changed in Lucene's FST.java which is causing the
> > slowness. I
> > >>>> am
> > >>>>> trying to dig further and might also ask folks on the Lucene
> mailing
> > >>>> list.
> > >>>>> Thanks.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Wed, May 31, 2023 at 11:36 AM Srijan <shree...@gmail.com>
> wrote:
> > >>>>>
> > >>>>>> I would love some profiling as well. I know 8.8 or 8.9 had some
> > >>>> performance
> > >>>>>> problems with atomic update but this was later addressed. I cant
> > find
> > >>>> the
> > >>>>>> jira atm though. Also I am on 8.11.1 and atomic update is not an
> > issue
> > >>>> for
> > >>>>>> me.
> > >>>>>>
> > >>>>>> By the way, do you happen to have nested docs?
> > >>>>>>
> > >>>>>>
> > >>>>>> On Wed, May 31, 2023, 11:20 Jan Høydahl <jan....@cominvent.com>
> > >> wrote:
> > >>>>>>
> > >>>>>>> Hi
> > >>>>>>>
> > >>>>>>> MMap is most important for searching. Indexing bypasses the cache
> > by
> > >>>>>> using
> > >>>>>>> direct IO.
> > >>>>>>>
> > >>>>>>> I have noticed slow real time get on Solr 8.x during atomic
> update
> > >>>>>> myself.
> > >>>>>>> Would be interesting with a comparison with profiling. RTG gets
> the
> > >>>>>>> document from transaction log I believe? Could there be some RTG
> > >>>> changes
> > >>>>>> in
> > >>>>>>> 8.x that caused such slowdown?
> > >>>>>>>
> > >>>>>>> Jan Høydahl
> > >>>>>>>
> > >>>>>>>> 31. mai 2023 kl. 16:57 skrev Rahul Goswami <
> rahul196...@gmail.com
> > >:
> > >>>>>>>>
> > >>>>>>>> Thanks for the response Shawn. We are using Windows server with
> > >>>> pretty
> > >>>>>>> huge
> > >>>>>>>> indexes (multiple TB cores). With Mmap, I have observed that the
> > >>>>>> machine
> > >>>>>>>> just completely freezes with high CPU and memory usage to a
> point
> > >>>> where
> > >>>>>>> it
> > >>>>>>>> becomes impossible to even connect to it. SimpleFS works out
> well
> > >> for
> > >>>>>> us
> > >>>>>>> in
> > >>>>>>>> this case.
> > >>>>>>>>
> > >>>>>>>> As noted in my first email, even with SimpleFS, Solr 7 completes
> > the
> > >>>>>>> crawl
> > >>>>>>>> in nearly 1/5th the time taken in Solr 8. Hence there should be
> > >>>>>> something
> > >>>>>>>> OUTSIDE the directory factory in the code which is causing this.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Rahul
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> On Tue, May 30, 2023 at 10:47 PM Shawn Heisey <
> > apa...@elyograg.org
> > >>>
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> On 5/30/23 15:34, Rahul Goswami wrote:
> > >>>>>>>>>> Environment details: - Java 11 on Windows server - Xms1536m
> > >> Xmx3072m
> > >>>>>> -
> > >>>>>>>>>> Indexing client code running 15 parallel threads indexing in
> > >> batches
> > >>>>>> of
> > >>>>>>>>>> 1000 - using SimpleFSDirectoryFactory (since Mmap doesn't
> quite
> > >> work
> > >>>>>>>>>> well on Windows for our index sizes which commonly run north
> of
> > 1
> > >>>> TB)
> > >>>>>>>>>
> > >>>>>>>>> Don't change the directoryFactory.  You *WANT* Solr to use MMAP
> > for
> > >>>>>> your
> > >>>>>>>>> indexes.  Not using MMAP is likely to slow things down
> > >> considerably.
> > >>>>>>>>> MMAP should work just fine on 64-bit Windows with 64-bit Java.
> > >> Which
> > >>>>>> of
> > >>>>>>>>> course requires 64-bit hardware.
> > >>>>>>>>>
> > >>>>>>>>> 32 bit systems and software cannot properly deal with data
> larger
> > >>>> than
> > >>>>>>>>> about 2GB.
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Shawn
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Reply via email to