From: Shawn Heisey
Sent: Thursday, May 10, 9:43 PM
Subject: Re: Solr soft commits
To: solr-user@lucene.apache.org


On 5/10/2018 9:48 AM, Shivam Omar wrote: > I need some help in understanding 
solr soft commits. As soft commits are about visibility and are fast in nature. 
They are advised for nrt use cases. Soft commits *MIGHT* be faster than hard 
commits.  There are situations where the performance of a soft commit and a 
hard commit with openSearcher=true will be about the same, particularly if 
indexing is very heavy.

Thanks Shawn, So there are cases when soft commit will not be faster than the 
hard commit with openSearcher=true. We have a case where we have to do bulk 
deletions in that case will soft commit be faster than hard commits.

> I want to understand does soft commit also honor merge policies and do 
> segment merging for docs in memory. For example, in case, I keep hard commit 
> interval very high and allow few million documents to be in memory by using 
> soft commit with no hard commit, can it affect solr query time performance. 
> Segments in memory are very likely not eligible for merging, but I do not 
> actually know whether that is the case. Using soft commits will NOT keep 
> millions of documents in memory.  Solr uses the NRTCachingDirectoryFactory 
> from Lucene by default, and uses it with default values, which are far too 
> low to accommodate millions of documents.  See the Javadoc for the directory 
> to see what those defaults are: 
> https://lucene.apache.org/core/7_3_0/core/org/apache/lucene/store/NRTCachingDirectory.html
>  That page shows a directory creation with memory values of 5 and 60 MB, but 
> the defaults in the factory code (which is what Solr normally uses) are 4 and 
> 48.  I'm pretty sure that you can increase these values in solrconfig.xml, 
> but really large values are not recommended.  Large enough values to 
> accommodate millions of documents would require the Java heap to also be 
> large, likely with no real performance advantage. If segment sizes exceed 
> these values, then they will not be cached in memory.  Older segments and 
> segments that do not meet the size requirements are flushed to disk.

Does it mean post crossing the memory threshold soft commits will lead lucene 
to flush data to disk as in hard commit. Also does a soft commit has a query 
time performance cost than doing a hard commit.

Thanks, Shawn

DISCLAIMER
This email and any files transmitted with it are intended solely for the person 
or the entity to whom they are addressed and may contain information which is 
Confidential and Privileged. Any misuse of the information contained in this 
email, including but not limited to retransmission or dissemination of the said 
information by person or entities other than the intended recipient is 
unauthorized and strictly prohibited. If you are not the intended recipient of 
this email, please delete this email and contact the sender immediately.

Reply via email to