> Solr unfortunately do not yet have a comprehensive official nightly
benchmark run that would catch such regressions. But the community is
working on it, there are some tests on http://mostly.cool maintained by
Ishan, but I have no idea how to add new tests..

I'll publish steps to add new tests soon.

For now, check out solr-bench from GitHub.com/fullstorydev/solr-bench

Run an existing test as follows:

mvn clean compile assembly:single

./stress.sh -c <commitid> suites/cluster-test.json

Results will be written out in suites/results/.



On Fri, 2 Jun, 2023, 2:44 am Jan Høydahl, <jan....@cominvent.com> wrote:

> Hi,
>
> Next step I believe is for you to open a Solr JIRA issue for this
> regression, as it is obviously an issue.
> Then we can use that JIRA to coordinate, even if the fix ends up being in
> Lucene code.
>
> Once we have a Solr JIRA, it may make sense to open a mail thread on
> users@lucene list to get some Lucene expertice involved in digging.
>
> Jan
>
> > 1. jun. 2023 kl. 18:15 skrev Rahul Goswami <rahul196...@gmail.com>:
> >
> > So I ran the test to index 5 million docs this time (batches of 1000 docs
> > in 15 parallel threads). To eliminate the network overhead and get as
> > accurate a benchmark as possible, I used an AtomicLong to measure the
> time
> > around the RTG call in DistibutedUpdateProcessor across all calls (
> >
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.7.2/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java#L1416
> ).
> > Did this for both Solr 7.7.2 and Solr 8.11.1 and built the solr-core.jar
> to
> > replace it in the solr webapp lib.
> >
> > RTG in Solr 8.x is ~10x slower. Here are the numbers (times are in
> > milliseconds):
> >
> > *Solr 7.7.2 *: 2023-06-01 15:39:48.272 WARN  (qtp1034094674-24) [
> > x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *Total rtg
> time:7293486*
> > *Solr 8.11.1: *2023-06-01 04:46:24.758 WARN  (qtp391506011-71) [
> > x:techproducts] o.a.s.u.p.LogUpdateProcessorFactory *Total rtg
> > time:72029877*
> >
> > Let me know if I can help further with debugging. At this point I am
> > suspecting the slowness to be on Lucene's side with the findTargetArc()
> > method in FST.java. I am running more analysis on my side in parallel and
> > will share if/when I find more.
> >
> > -Rahul
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Jun 1, 2023 at 3:53 AM Jan Høydahl <jan....@cominvent.com>
> wrote:
> >
> >> Yes, the numbers should be similar. Thanks for digging in. If there is a
> >> big regression, it could be worthwile running the same test on v8.0,
> 8.1,
> >> 8.2 etc to plot in what version the regression happened. Perhaps it
> could
> >> even be scripted :)
> >>
> >> Solr unfortunately do not yet have a comprehensive official nightly
> >> benchmark run that would catch such regressions. But the community is
> >> working on it, there are some tests on http://mostly.cool maintained by
> >> Ishan, but I have no idea how to add new tests..
> >>
> >> Jan
> >>
> >>> 31. mai 2023 kl. 23:50 skrev Rahul Goswami <rahul196...@gmail.com>:
> >>>
> >>> Sure, I can do that. Let me create an index with a few million docs,
> call
> >>> RTG with a few million iterations on it and note the times between 7.x
> >> and
> >>> 8.x. I assume this should be sufficient (?)
> >>>
> >>> On Wed, May 31, 2023 at 5:19 PM Jan Høydahl <jan....@cominvent.com>
> >> wrote:
> >>>
> >>>> Would be nice to determine whether RTG is orders of magnitude slower
> in
> >>>> 8.x than 7.x and is the main culprit.  Then we could isolate the
> >> testing to
> >>>> RTG only and not involce Atomic Update?
> >>>>
> >>>> Jan
> >>>>
> >>>>> 31. mai 2023 kl. 21:33 skrev Rahul Goswami <rahul196...@gmail.com>:
> >>>>>
> >>>>> I don’t have any nested documents. And the results are consistent
> >> across
> >>>>> multiple runs. I tried looking for similar issues in the mailing
> list,
> >>>> but
> >>>>> couldn’t find anything relevant . So if you do happen to find any
> JIRAs
> >>>>> addressing it that would be really helpful (thanks!).
> >>>>>
> >>>>> To Jan’s question about RTG taking more time in Solr 8.x, I can say
> >> with
> >>>>> good certainty that this is the case. Although it does look into
> >>>>> transaction logs first, thread dumps suggest that it is the next
> phase
> >>>>> (when it doesn't find the doc in tlog) which seems to be time
> >> consuming .
> >>>>> It tries to look up the document via the current searcher
> >>>>> (searcher.getFirstMatch() ). Proceeding further in the stack, it is
> >> this
> >>>>> call where many threads are spending time:
> >>>>>
> >>>>>
> >>>>
> >>
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/codecs/blocktree/SegmentTermsEnum.java#L485
> >>>>>
> >>>>> Although this call is the same in 7.7.2 and 8.11.1 quite likely
> >>>>> something changed in Lucene's FST.java which is causing the
> slowness. I
> >>>> am
> >>>>> trying to dig further and might also ask folks on the Lucene mailing
> >>>> list.
> >>>>> Thanks.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, May 31, 2023 at 11:36 AM Srijan <shree...@gmail.com> wrote:
> >>>>>
> >>>>>> I would love some profiling as well. I know 8.8 or 8.9 had some
> >>>> performance
> >>>>>> problems with atomic update but this was later addressed. I cant
> find
> >>>> the
> >>>>>> jira atm though. Also I am on 8.11.1 and atomic update is not an
> issue
> >>>> for
> >>>>>> me.
> >>>>>>
> >>>>>> By the way, do you happen to have nested docs?
> >>>>>>
> >>>>>>
> >>>>>> On Wed, May 31, 2023, 11:20 Jan Høydahl <jan....@cominvent.com>
> >> wrote:
> >>>>>>
> >>>>>>> Hi
> >>>>>>>
> >>>>>>> MMap is most important for searching. Indexing bypasses the cache
> by
> >>>>>> using
> >>>>>>> direct IO.
> >>>>>>>
> >>>>>>> I have noticed slow real time get on Solr 8.x during atomic update
> >>>>>> myself.
> >>>>>>> Would be interesting with a comparison with profiling. RTG gets the
> >>>>>>> document from transaction log I believe? Could there be some RTG
> >>>> changes
> >>>>>> in
> >>>>>>> 8.x that caused such slowdown?
> >>>>>>>
> >>>>>>> Jan Høydahl
> >>>>>>>
> >>>>>>>> 31. mai 2023 kl. 16:57 skrev Rahul Goswami <rahul196...@gmail.com
> >:
> >>>>>>>>
> >>>>>>>> Thanks for the response Shawn. We are using Windows server with
> >>>> pretty
> >>>>>>> huge
> >>>>>>>> indexes (multiple TB cores). With Mmap, I have observed that the
> >>>>>> machine
> >>>>>>>> just completely freezes with high CPU and memory usage to a point
> >>>> where
> >>>>>>> it
> >>>>>>>> becomes impossible to even connect to it. SimpleFS works out well
> >> for
> >>>>>> us
> >>>>>>> in
> >>>>>>>> this case.
> >>>>>>>>
> >>>>>>>> As noted in my first email, even with SimpleFS, Solr 7 completes
> the
> >>>>>>> crawl
> >>>>>>>> in nearly 1/5th the time taken in Solr 8. Hence there should be
> >>>>>> something
> >>>>>>>> OUTSIDE the directory factory in the code which is causing this.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Rahul
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Tue, May 30, 2023 at 10:47 PM Shawn Heisey <
> apa...@elyograg.org
> >>>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> On 5/30/23 15:34, Rahul Goswami wrote:
> >>>>>>>>>> Environment details: - Java 11 on Windows server - Xms1536m
> >> Xmx3072m
> >>>>>> -
> >>>>>>>>>> Indexing client code running 15 parallel threads indexing in
> >> batches
> >>>>>> of
> >>>>>>>>>> 1000 - using SimpleFSDirectoryFactory (since Mmap doesn't quite
> >> work
> >>>>>>>>>> well on Windows for our index sizes which commonly run north of
> 1
> >>>> TB)
> >>>>>>>>>
> >>>>>>>>> Don't change the directoryFactory.  You *WANT* Solr to use MMAP
> for
> >>>>>> your
> >>>>>>>>> indexes.  Not using MMAP is likely to slow things down
> >> considerably.
> >>>>>>>>> MMAP should work just fine on 64-bit Windows with 64-bit Java.
> >> Which
> >>>>>> of
> >>>>>>>>> course requires 64-bit hardware.
> >>>>>>>>>
> >>>>>>>>> 32 bit systems and software cannot properly deal with data larger
> >>>> than
> >>>>>>>>> about 2GB.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Shawn
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Reply via email to