Lots of work has been done in this area in 8.x. 
See https://issues.apache.org/jira/browse/SOLR-12638 as an example (Support 
atomic updates of nested/child documents for nested-enabled schema, fixed in 
8.1), which adds support for atomic update of nested docs. Touching all that 
code may of course have somehow have slowed down the critical path, e.g. due to 
added synchronization or similar? If so, Solr 8.0.x would perform your indexing 
benchmark faster than 8.x. 

Jan

> 31. mai 2023 kl. 21:33 skrev Rahul Goswami <rahul196...@gmail.com>:
> 
> I don’t have any nested documents. And the results are consistent across
> multiple runs. I tried looking for similar issues in the mailing list, but
> couldn’t find anything relevant . So if you do happen to find any JIRAs
> addressing it that would be really helpful (thanks!).
> 
> To Jan’s question about RTG taking more time in Solr 8.x, I can say with
> good certainty that this is the case. Although it does look into
> transaction logs first, thread dumps suggest that it is the next phase
> (when it doesn't find the doc in tlog) which seems to be time consuming .
> It tries to look up the document via the current searcher
> (searcher.getFirstMatch() ). Proceeding further in the stack, it is this
> call where many threads are spending time:
> 
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/codecs/blocktree/SegmentTermsEnum.java#L485
> 
> Although this call is the same in 7.7.2 and 8.11.1 quite likely
> something changed in Lucene's FST.java which is causing the slowness. I am
> trying to dig further and might also ask folks on the Lucene mailing list.
> Thanks.
> 
> 
> 
> On Wed, May 31, 2023 at 11:36 AM Srijan <shree...@gmail.com> wrote:
> 
>> I would love some profiling as well. I know 8.8 or 8.9 had some performance
>> problems with atomic update but this was later addressed. I cant find the
>> jira atm though. Also I am on 8.11.1 and atomic update is not an issue for
>> me.
>> 
>> By the way, do you happen to have nested docs?
>> 
>> 
>> On Wed, May 31, 2023, 11:20 Jan Høydahl <jan....@cominvent.com> wrote:
>> 
>>> Hi
>>> 
>>> MMap is most important for searching. Indexing bypasses the cache by
>> using
>>> direct IO.
>>> 
>>> I have noticed slow real time get on Solr 8.x during atomic update
>> myself.
>>> Would be interesting with a comparison with profiling. RTG gets the
>>> document from transaction log I believe? Could there be some RTG changes
>> in
>>> 8.x that caused such slowdown?
>>> 
>>> Jan Høydahl
>>> 
>>>> 31. mai 2023 kl. 16:57 skrev Rahul Goswami <rahul196...@gmail.com>:
>>>> 
>>>> Thanks for the response Shawn. We are using Windows server with pretty
>>> huge
>>>> indexes (multiple TB cores). With Mmap, I have observed that the
>> machine
>>>> just completely freezes with high CPU and memory usage to a point where
>>> it
>>>> becomes impossible to even connect to it. SimpleFS works out well for
>> us
>>> in
>>>> this case.
>>>> 
>>>> As noted in my first email, even with SimpleFS, Solr 7 completes the
>>> crawl
>>>> in nearly 1/5th the time taken in Solr 8. Hence there should be
>> something
>>>> OUTSIDE the directory factory in the code which is causing this.
>>>> 
>>>> Thanks,
>>>> Rahul
>>>> 
>>>> 
>>>>> On Tue, May 30, 2023 at 10:47 PM Shawn Heisey <apa...@elyograg.org>
>>> wrote:
>>>>> 
>>>>>> On 5/30/23 15:34, Rahul Goswami wrote:
>>>>>> Environment details: - Java 11 on Windows server - Xms1536m Xmx3072m
>> -
>>>>>> Indexing client code running 15 parallel threads indexing in batches
>> of
>>>>>> 1000 - using SimpleFSDirectoryFactory (since Mmap doesn't quite work
>>>>>> well on Windows for our index sizes which commonly run north of 1 TB)
>>>>> 
>>>>> Don't change the directoryFactory.  You *WANT* Solr to use MMAP for
>> your
>>>>> indexes.  Not using MMAP is likely to slow things down considerably.
>>>>> MMAP should work just fine on 64-bit Windows with 64-bit Java.  Which
>> of
>>>>> course requires 64-bit hardware.
>>>>> 
>>>>> 32 bit systems and software cannot properly deal with data larger than
>>>>> about 2GB.
>>>>> 
>>>>> Thanks,
>>>>> Shawn
>>>>> 
>>> 
>> 

Reply via email to