On 3/9/2011 12:05 PM, Otis Gospodnetic wrote:
But check this! In some cases one is not allowed to save content to disk (think
copyrights).  I'm not making this up - we actually have a customer with this
"cannot save to disk" (but can index) requirement.

Do they realize that a Solr index is on disk, and if you save it to a Solr index it's being saved to disk? If they prohibited you from putting the doc in a stored field in Solr, I guess that would at least be somewhat consistent, although annoying.

But I don't think it's our customers jobs to tell us HOW to implement our software to get the results they want. They can certainly make you promise not to distribute or use copyrighted material, and they can even ask to see your security procedures to make sure it doesn't get out. But if you need to buffer documents to achieve the application they want, but they won't let you... Solr can't help you with that.

As I suggested before though, I might rather buffer to a NoSQL store like MongoDB or CouchDB instead of actually to disk. Perhaps your customer won't notice those stores keep data on disk just like they haven't noticed Solr does. I am not an expert in various kinds of NoSQL stores, but I think some of them in fact specialize in the area of concern here: Absolute failover reliability through replication.

Solr is not a store.

So buffering to disk is not an option, and buffering in memory is not practical
because of the input document rate and their size.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



From: Otis Gospodnetic [otis_gospodne...@yahoo.com]
Sent:  Tuesday, March 08, 2011 11:45 PM
To: solr-user@lucene.apache.org
Subject:  True master-master fail-over without data gaps

Hello,

What are  some common or good ways to handle indexing (master) fail-over?
Imagine you  have a continuous stream of incoming documents that you have to
index without  losing any of them (or with losing as few of them as possible).
How do you  set up you masters?
In other words, you can't just have 2 masters where the  secondary is the
Repeater (or Slave) of the primary master and replicates the  index
periodically:
you need to have 2 masters that are in sync at all  times!
How do you achieve that?

* Do you just put N masters behind a  LB VIP, configure them both to point to
the
index on some shared storage  (e.g. SAN), and count on the LB to fail-over to
the
secondary master when the  primary becomes unreachable?
If so, how do you deal with index locks?   You use the Native lock and count
on
it disappearing when the primary master  goes down?  That means you count on
the
whole JVM process dying, which  may not be the case...

* Or do you use tools like DRBD, Corosync,  Pacemaker, etc. to keep 2 masters
with 2 separate indices in sync, while  making sure you write to only 1 of
them
via LB VIP or otherwise?

* Or  ...


This thread is on a similar topic, but is inconclusive:
   http://search-lucene.com/m/aOsyN15f1qd1

Here is another similar  thread, but this one doesn't cover how 2 masters are
kept in sync at all  times:
   http://search-lucene.com/m/aOsyN15f1qd1

Thanks,
Otis
----
Sematext  :: http://sematext.com/ ::  Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


Reply via email to