Re: Solr Implementations

Erick Erickson Fri, 26 Aug 2011 05:31:04 -0700

See below

On Thu, Aug 25, 2011 at 4:22 PM, zarni aung <zau...@gmail.com> wrote:
> First, I would like to apologize if this is a repeat question but can't seem
> to get the right answer anywhere.
>
>   - What happens to pending documents when the server dies abruptly?  I
>   understand that when the server shuts down gracefully, it will commit the
>   pending documents and close the IndexWriter.  For the case where the server
>   just crashes,  I am assuming that the pending documents are lost but would
>   it also corrupt the index files?  If so, when the server comes back online
>   what is the state?  I would think that a full re-indexing is in order.
>
>


This is generally not a problem, your pending updates are simply lost. A lot
of work has gone into making sure that the indexes aren't corrupted in this
situation. You can use the checkindex utility if you're worried.

A brief outline here. Solr only writes new segments, it does NOT modify existing
segments. There is a file that lets Solr know what the current valid
segments are.
During indexing (including merging, optimization, etc), only NEW segments are
written and the file that tells Solr what's current is left alone
during the new segment
writes.

The very last thing that's done is the segments file (i.e. the file
that tells Solr what's
current) is updated, and it's very small. I suppose there's a
vanishingly small chance
that that file could be corrupted when begin written, and it may even
be that a temp
file is written first then files renamed (but I don't know that for sure)...

So, the point of this long digression is that if your server gets
killed, upon restart it
should see a consistent picture of the index as of the last completed
commit, any
interim docs will be lost.

>   - What are the dangers of having n-number of ReadOnly Solr instances
>   pointing to the same data directory?  (Shared by a SAN)?  Will there be
>   issues with locking?  This is a scenario with replication.  The Read-Only
>   instances are pointing to the same data directory on a SAN.
>

This is not a problem. You should have only one *writer*
pointing to the index, but readers are OK. Applying the discussion above to
readers, note that the segments available to any reader are never changed. So
having N Solr instances reading from these unchanging files is no problem.

That said, this will be slower than using Solr's replication (which is
preferred) for
two reasons.
1> any networked filesystem will have some inherent speed issues.
2> all these read requests will have to be queued somehow.

But if your performance is acceptable with this setup it'll work.


Best
Erick

> Thank you very much.
>
> Z
>

Re: Solr Implementations

Reply via email to