Hi, ----- Original Message ---- > From: Walter Underwood <wun...@wunderwood.org>
> On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: > > > You mean it's not possible to have 2 masters that are in nearly real-time >sync? > > How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their >edit > > > logs) in sync to avoid the current NN SPOF, for example, so I'm thinking >this > > > could be doable with Solr masters, too, no? > > If you add fault-tolerant, you run into the CAP Theorem. Consistency, >availability, partition: choose two. You cannot have it all. Right, so I'll take Consistency and Availability, and I'll put my 2 masters in the same rack (which has redundant switches, power supply, etc.) and thus minimize/avoid partitioning. Assuming the above actually works, I think my Q remains: How do you set up 2 Solr masters so they are in near real-time sync? DRBD? But here is maybe a simpler scenario that more people may be considering: Imagine 2 masters on 2 different servers in 1 rack, pointing to the same index on the shared storage (SAN) that also happens to live in the same rack. 2 Solr masters are behind 1 LB VIP that indexer talks to. The VIP is configured so that all requests always get routed to the primary master (because only 1 master can be modifying an index at a time), except when this primary is down, in which case the requests are sent to the secondary master. So in this case my Q is around automation of this, around Lucene index locks, around the need for manual intervention, and such. Concretely, if you have these 2 master instances, the primary master has the Lucene index lock in the index dir. When the secondary master needs to take over (i.e., when it starts receiving documents via LB), it needs to be able to write to that same index. But what if that lock is still around? One could use the Native lock to make the lock disappear if the primary master's JVM exited unexpectedly, and in that case everything *should* work and be completely transparent, right? That is, the secondary will start getting new docs, it will use its IndexWriter to write to that same shared index, which won't be locked for writes because the lock is gone, and everyone will be happy. Did I miss something important here? Assuming the above is correct, what if the lock is *not* gone because the primary master's JVM is actually not dead, although maybe unresponsive, so LB thinks the primary master is dead. Then the LB will route indexing requests to the secondary master, which will attempt to write to the index, but be denied because of the lock. So a human needs to jump in, remove the lock, and manually reindex failed docs if the upstream component doesn't buffer docs that failed to get indexed and doesn't retry indexing them automatically. Is this correct or is there a way to avoid humans here? Thanks, Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/