I opened https://issues.apache.org/jira/browse/SOLR-6837
Probably best to have further conversations on the Jira issue. On Thu, Dec 11, 2014 at 6:46 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Hi Gili, > > Great question! > > A write in Solr, by default, is only guaranteed to exist in 1 place i.e. > the leader and the safety valves that we have to preserve these writes are: > > 1. The leaderVoteWait time for which leader election is suspended until > enough live replicas are available > 2. The two-way peer-sync between leader candidate and other replicas > > The other safety valve is on the client side with the "min_rf" parameter > introduced by SOLR-5468 in Solr 4.9. If you set this param to 2 while > making the request then Solr will return the number of replicas to which it > could successfully send the update. Then depending on the response you can > make a decision to retry the update at a later time assuming it is > idempotent. This kinda puts the onus ensuring consistency on the client > side which is not ideal but better than nothing. See SOLR-5468 for more > discussion on this topic. > > In your particular example, none of these safeties are invoked because you > start node2 while node1 was down and node2 goes ahead with leader election > after the wait period. Also since node1 was down during leader election, > peer sync doesn't happen and then node2 becomes the leader. > > When node1 comes back online and joins as a replica, it recovers from the > leader using peer-sync (which returns the newest 100 updates) and finds > that there's nothing newer on the leader. However, there are no checks to > make sure that the replica doesn't have a newer update itself which is why > you end up with the inconsistent replica. If there were a lot of updates on > node2 (more than 100) while node1 was down, in which case peer-sync isn't > applicable, then it'd would have done a replication recovery and this > inconsistency would have been resolved. > > So yeah we have a valid consistency bug such that we have inconsistent > replicas in a steady state. I wonder if the right way is to bump min_rf to > a higher value or peer-sync both ways during replica recovery. I'll need to > think more on this. > > > On Thu, Dec 11, 2014 at 4:21 PM, Gili Nachum <gilinac...@gmail.com> wrote: > >> I know Solr CAP properties are CP, but I don't see it happening over a >> very >> basic test - doing something wrong? >> >> With two Solr nodes, I index doc1 to both, stop node2, update doc1, stop >> node1, start node2, start node1, and I get two different versions of the >> doc depending on which replica I query. >> I would expect node2 to update to itself. >> Attaching Solr logs from both nodes. >> >> *Config* >> Solr 4.7.2 / Jetty. >> SoldCloud on two nodes, and 3 ZK, all running in localhost. >> single collection: single shard with two replicas. >> >> *Reproducing:* >> start node1 9.148.58.114:8983 >> start node2 9.148.58.114:8984 >> Cluster state: node1 leader. node2 active. >> >> index value 'A' (id="change me"). >> query and expect 'A' -> success >> >> Stop node2 >> Cluster state: node1 leader. node2 gone. >> query and expect 'A' -> success >> >> Update document value from 'A'->'B' >> query and expect 'B' -> success >> >> Stop node1 >> then >> Start node2. >> Cluster state: node1 gone. node2 down. >> >> * 104510 [coreZkRegister-1-thread-1] INFO >> org.apache.solr.cloud.ShardLeaderElectionContext Waiting until we see more >> replicas up for shard shard1: total=2 found=1 timeoutin=5.27665925E14ms* >> >> wait 3m. >> >> * 184679 [coreZkRegister-1-thread-1] INFO >> org.apache.solr.cloud.ShardLeaderElectionContext I am the new leader: >> http://9.148.58.114:8984/solr/quick-results-collection_shard1_replica2/ >> <http://9.148.58.114:8984/solr/quick-results-collection_shard1_replica2/> >> shard1 * >> Cluster state: node1 gone. node2 leader. >> >> query and expect 'A' (old value) -> success >> >> start node1 >> Cluster state: node1 actove. node2 leader. >> >> *Inconsistency: * >> * Querying node1 always returns 'B'. * >> >> http://localhost:8983/solr/quick-results-collection_shard1_replica1/select?q=*%3A*&wt=json&indent=true >> * Querying node1 always returns 'A'. * >> >> http://localhost:8984/solr/quick-results-collection_shard1_replica2/select?q=*%3A*&wt=json&indent=true >> > > > > -- > Regards, > Shalin Shekhar Mangar. > -- Regards, Shalin Shekhar Mangar.