RE: Concurrent updates/commits

Jonathan Rochkind Wed, 09 Feb 2011 07:44:53 -0800

Solr does handle concurrency fine. But there is NOT "transaction isolation" 
like you'll get from an rdbms. All 'pending' changes are (conceptually, anyway) 
held in a single queue, and any commit will commit ALL of them. There isn't 
going to be any data corruption issues or anything from concurrent adds (unless 
there's a bug in Solr, there isn't supposed to be) -- but there is no kind of 
transactions or isolation between different concurrent adders. So, sure, 
everyone can add concurrently -- but any time any of those actors issues a 
commit, all pending adds are committed.


In addition, there are problems with Solr's basic architecture and _too 
frequent_ commits (whether made by different processes or not, doesn''t 
matter). When a new commit happens, Solr fires up a new index searcher and 
warms it up on the new version of the index. Until the new index searcher is 
fully warmed, the old index searcher is still serving queries.  Which can also 
mean that there are, for this period, TWO versions of all your caches in RAM 
and such. So let's say it takes 5 minutes for the new index to be fully warmed. 
 But if you have commits happening every 1 minute -- then you'll end up with 
FIVE 'new indexes' being warmed -- meaning potentially 5 times the RAM usage 
(quickly running into a JVM out of memory error), lots of CPU activity going on 
warming indexes that will never actually been used (because even though they 
aren't even done being warmed and ready to use, they've already been superseded 
by a later commit).   

I don't know of any good way to deal with this except less frequent commits. 
One way to get less frequent commits is to use Solr replication, and 'stage' 
all your commits in a 'master' index, but only replicate to 'slave' at a 
frequency slow enough so the new index is fully warmed before the next commit 
happens. 

Some new features in trunk (both lucene and solr) for 'near real time'  search 
ameliorate this problem somewhat, depending on the nature of your commits. 

Jonathan
________________________________________
From: Savvas-Andreas Moysidis [savvas.andreas.moysi...@googlemail.com]
Sent: Wednesday, February 09, 2011 10:34 AM
To: solr-user@lucene.apache.org
Subject: Concurrent updates/commits

Hello,

This topic has probably been covered before here, but we're still not very
clear about how multiple commits work in Solr.
We currently have a requirement to make our domain objects searchable
immediately after the get updated in the database by some user action. This
could potentially cause multiple updates/commits to be fired to Solr and we
are trying to investigate how Solr handles those multiple requests.

This thread:
http://search-lucene.com/m/0cab31f10Mh/concurrent+commits&subj=commit+concurrency+full+text+search

suggests that Solr will handle all of the lower level details and that "Before
a *COMMIT* is done , lock is obtained and its released  after the
operation"
which in my understanding means that Solr will serialise all update/commit
requests?

However, the Solr book, in the "Commit, Optimise, Rollback" section reads:
"if more than one Solr client were to submit modifications and commit them
at similar times, it is possible for part of one client's set of changes to
be committed before that client told Solr to commit"
which suggests that requests are *not* serialised.

Our questions are:
- Does Solr handle concurrent requests or do we need to add synchronisation
logic around our code?
- If Solr *does* handle concurrent requests, does it serialise each request
or has some other strategy for processing those?


Thanks,
- Savvas

RE: Concurrent updates/commits

Reply via email to