On 3/9/2011 12:05 PM, Otis Gospodnetic wrote:
But check this! In some cases one is not allowed to save content to
disk (think
copyrights). I'm not making this up - we actually have a customer with this
"cannot save to disk" (but can index) requirement.
Do they realize that a Solr index is on disk, and if you save it to a
Solr index it's being saved to disk? If they prohibited you from
putting the doc in a stored field in Solr, I guess that would at least
be somewhat consistent, although annoying.
But I don't think it's our customers jobs to tell us HOW to implement
our software to get the results they want. They can certainly make you
promise not to distribute or use copyrighted material, and they can even
ask to see your security procedures to make sure it doesn't get out.
But if you need to buffer documents to achieve the application they
want, but they won't let you... Solr can't help you with that.
As I suggested before though, I might rather buffer to a NoSQL store
like MongoDB or CouchDB instead of actually to disk. Perhaps your
customer won't notice those stores keep data on disk just like they
haven't noticed Solr does. I am not an expert in various kinds of NoSQL
stores, but I think some of them in fact specialize in the area of
concern here: Absolute failover reliability through replication.
Solr is not a store.
So buffering to disk is not an option, and buffering in memory is not practical
because of the input document rate and their size.
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
From: Otis Gospodnetic [otis_gospodne...@yahoo.com]
Sent: Tuesday, March 08, 2011 11:45 PM
To: solr-user@lucene.apache.org
Subject: True master-master fail-over without data gaps
Hello,
What are some common or good ways to handle indexing (master) fail-over?
Imagine you have a continuous stream of incoming documents that you have to
index without losing any of them (or with losing as few of them as possible).
How do you set up you masters?
In other words, you can't just have 2 masters where the secondary is the
Repeater (or Slave) of the primary master and replicates the index
periodically:
you need to have 2 masters that are in sync at all times!
How do you achieve that?
* Do you just put N masters behind a LB VIP, configure them both to point to
the
index on some shared storage (e.g. SAN), and count on the LB to fail-over to
the
secondary master when the primary becomes unreachable?
If so, how do you deal with index locks? You use the Native lock and count
on
it disappearing when the primary master goes down? That means you count on
the
whole JVM process dying, which may not be the case...
* Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters
with 2 separate indices in sync, while making sure you write to only 1 of
them
via LB VIP or otherwise?
* Or ...
This thread is on a similar topic, but is inconclusive:
http://search-lucene.com/m/aOsyN15f1qd1
Here is another similar thread, but this one doesn't cover how 2 masters are
kept in sync at all times:
http://search-lucene.com/m/aOsyN15f1qd1
Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/