Yes, the numbers should be similar. Thanks for digging in. If there is a big 
regression, it could be worthwile running the same test on v8.0, 8.1, 8.2 etc 
to plot in what version the regression happened. Perhaps it could even be 
scripted :)

Solr unfortunately do not yet have a comprehensive official nightly benchmark 
run that would catch such regressions. But the community is working on it, 
there are some tests on http://mostly.cool maintained by Ishan, but I have no 
idea how to add new tests..

Jan

> 31. mai 2023 kl. 23:50 skrev Rahul Goswami <rahul196...@gmail.com>:
> 
> Sure, I can do that. Let me create an index with a few million docs, call
> RTG with a few million iterations on it and note the times between 7.x and
> 8.x. I assume this should be sufficient (?)
> 
> On Wed, May 31, 2023 at 5:19 PM Jan Høydahl <jan....@cominvent.com> wrote:
> 
>> Would be nice to determine whether RTG is orders of magnitude slower in
>> 8.x than 7.x and is the main culprit.  Then we could isolate the testing to
>> RTG only and not involce Atomic Update?
>> 
>> Jan
>> 
>>> 31. mai 2023 kl. 21:33 skrev Rahul Goswami <rahul196...@gmail.com>:
>>> 
>>> I don’t have any nested documents. And the results are consistent across
>>> multiple runs. I tried looking for similar issues in the mailing list,
>> but
>>> couldn’t find anything relevant . So if you do happen to find any JIRAs
>>> addressing it that would be really helpful (thanks!).
>>> 
>>> To Jan’s question about RTG taking more time in Solr 8.x, I can say with
>>> good certainty that this is the case. Although it does look into
>>> transaction logs first, thread dumps suggest that it is the next phase
>>> (when it doesn't find the doc in tlog) which seems to be time consuming .
>>> It tries to look up the document via the current searcher
>>> (searcher.getFirstMatch() ). Proceeding further in the stack, it is this
>>> call where many threads are spending time:
>>> 
>>> 
>> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/codecs/blocktree/SegmentTermsEnum.java#L485
>>> 
>>> Although this call is the same in 7.7.2 and 8.11.1 quite likely
>>> something changed in Lucene's FST.java which is causing the slowness. I
>> am
>>> trying to dig further and might also ask folks on the Lucene mailing
>> list.
>>> Thanks.
>>> 
>>> 
>>> 
>>> On Wed, May 31, 2023 at 11:36 AM Srijan <shree...@gmail.com> wrote:
>>> 
>>>> I would love some profiling as well. I know 8.8 or 8.9 had some
>> performance
>>>> problems with atomic update but this was later addressed. I cant find
>> the
>>>> jira atm though. Also I am on 8.11.1 and atomic update is not an issue
>> for
>>>> me.
>>>> 
>>>> By the way, do you happen to have nested docs?
>>>> 
>>>> 
>>>> On Wed, May 31, 2023, 11:20 Jan Høydahl <jan....@cominvent.com> wrote:
>>>> 
>>>>> Hi
>>>>> 
>>>>> MMap is most important for searching. Indexing bypasses the cache by
>>>> using
>>>>> direct IO.
>>>>> 
>>>>> I have noticed slow real time get on Solr 8.x during atomic update
>>>> myself.
>>>>> Would be interesting with a comparison with profiling. RTG gets the
>>>>> document from transaction log I believe? Could there be some RTG
>> changes
>>>> in
>>>>> 8.x that caused such slowdown?
>>>>> 
>>>>> Jan Høydahl
>>>>> 
>>>>>> 31. mai 2023 kl. 16:57 skrev Rahul Goswami <rahul196...@gmail.com>:
>>>>>> 
>>>>>> Thanks for the response Shawn. We are using Windows server with
>> pretty
>>>>> huge
>>>>>> indexes (multiple TB cores). With Mmap, I have observed that the
>>>> machine
>>>>>> just completely freezes with high CPU and memory usage to a point
>> where
>>>>> it
>>>>>> becomes impossible to even connect to it. SimpleFS works out well for
>>>> us
>>>>> in
>>>>>> this case.
>>>>>> 
>>>>>> As noted in my first email, even with SimpleFS, Solr 7 completes the
>>>>> crawl
>>>>>> in nearly 1/5th the time taken in Solr 8. Hence there should be
>>>> something
>>>>>> OUTSIDE the directory factory in the code which is causing this.
>>>>>> 
>>>>>> Thanks,
>>>>>> Rahul
>>>>>> 
>>>>>> 
>>>>>>> On Tue, May 30, 2023 at 10:47 PM Shawn Heisey <apa...@elyograg.org>
>>>>> wrote:
>>>>>>> 
>>>>>>>> On 5/30/23 15:34, Rahul Goswami wrote:
>>>>>>>> Environment details: - Java 11 on Windows server - Xms1536m Xmx3072m
>>>> -
>>>>>>>> Indexing client code running 15 parallel threads indexing in batches
>>>> of
>>>>>>>> 1000 - using SimpleFSDirectoryFactory (since Mmap doesn't quite work
>>>>>>>> well on Windows for our index sizes which commonly run north of 1
>> TB)
>>>>>>> 
>>>>>>> Don't change the directoryFactory.  You *WANT* Solr to use MMAP for
>>>> your
>>>>>>> indexes.  Not using MMAP is likely to slow things down considerably.
>>>>>>> MMAP should work just fine on 64-bit Windows with 64-bit Java.  Which
>>>> of
>>>>>>> course requires 64-bit hardware.
>>>>>>> 
>>>>>>> 32 bit systems and software cannot properly deal with data larger
>> than
>>>>>>> about 2GB.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Shawn
>>>>>>> 
>>>>> 
>>>> 
>> 
>> 

Reply via email to