Kevin:

Well, let's certainly raise it as a JIRA, blocker or not I'm not sure.
I _think_ the new LIR work done in Solr 7.3 might make it possible to
detect this condition but I'm not totally sure what to do about it.

So let's say the leader gets an update while a follower is down. (one
leader and one follower for simplicity). Now say the leader dies and
the follower is restarted. What should happen? Should Solr refuse to
start? Would FORCELEADER work if the user was willing to lose data?

Let's move the discussion to the JIRA though.
On Tue, Nov 6, 2018 at 10:58 AM Kevin Risden <kris...@apache.org> wrote:
>
> Erick Erickson - I don't have much time to chase this down. Do you think
> this a blocker for 7.6? It seems pretty serious.
>
> Jeremy - This would be a good JIRA to create - we can move the conversation
> there to try to get the right people involved.
>
> Kevin Risden
>
>
> On Fri, Nov 2, 2018 at 7:57 AM Jeremy Smith <jas2...@cornell.edu> wrote:
>
> > Hi Susheel,
> >
> >      Yes, it appears that under certain conditions, if a follower is down
> > when the leader gets an update, the follower will not receive that update
> > when it comes back (or maybe it receives the update and it's then
> > overwritten by its own transaction logs, I'm not sure).  Furthermore, if
> > that follower then becomes the leader, it will replicate its own out of
> > date value back to the former leader, even though the version number is
> > lower.
> >
> >
> >    -Jeremy
> >
> > ________________________________
> > From: Susheel Kumar <susheel2...@gmail.com>
> > Sent: Thursday, November 1, 2018 2:57:00 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: SolrCloud Replication Failure
> >
> > Are we saying it has to do something with stop and restarting replica's
> > otherwise I haven't seen/heard any issues with document updates and
> > forwarding to replica's...
> >
> > Thanks,
> > Susheel
> >
> > On Thu, Nov 1, 2018 at 12:58 PM Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> > > So  this seems like it absolutely needs a JIRA....
> > > On Thu, Nov 1, 2018 at 9:39 AM
> > Kevin Risden
> > <kris...@apache.org> wrote:
> > > >
> > > > I pushed 3 branches that modifies test.sh to test 5.5, 6.6, and 7.5
> > > locally
> > > > without docker. I still see the same behavior where the latest updates
> > > > aren't on the replicas. I still don't know what is happening but it
> > > happens
> > > > without Docker :(
> > > >
> > > >
> > >
> > https://github.com/risdenk/test-solr-start-stop-replica-consistency/branches
> > > >
> > > > Kevin Risden
> > > >
> > > >
> > > > On Thu, Nov 1, 2018 at 11:41 AM Kevin Risden <kris...@apache.org>
> > wrote:
> > > >
> > > > > Erick - Yea thats a fair point. Would be interesting to see if this
> > > fails
> > > > > without Docker.
> > > > >
> > > > > Kevin Risden
> > > > >
> > > > >
> > > > > On Thu, Nov 1, 2018 at 11:06 AM Erick Erickson <
> > > erickerick...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Kevin:
> > > > >>
> > > > >> You're also using Docker, right? Docker is not "officially"
> > supported
> > > > >> although there's some movement in that direction and if this is only
> > > > >> reproducible in Docker than it's a clue where to look....
> > > > >>
> > > > >> Erick
> > > > >> On Wed, Oct 31, 2018 at 7:24 PM
> > > > >> Kevin Risden
> > > > >> <kris...@apache.org> wrote:
> > > > >> >
> > > > >> > I haven't dug into why this is happening but it definitely
> > > reproduces. I
> > > > >> > removed the local requirements (port mapping and such) from the
> > > gist you
> > > > >> > posted (very helpful). I confirmed this fails locally and on
> > Travis
> > > CI.
> > > > >> >
> > > > >> >
> > https://github.com/risdenk/test-solr-start-stop-replica-consistency
> > > > >> >
> > > > >> > I don't even see the first update getting applied from num 10 ->
> > 20.
> > > > >> After
> > > > >> > the first update there is no more change.
> > > > >> >
> > > > >> > Kevin Risden
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith <jas2...@cornell.edu
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> > > Thanks Erick, this is 7.5.0.
> > > > >> > > ________________________________
> > > > >> > > From: Erick Erickson <erickerick...@gmail.com>
> > > > >> > > Sent: Wednesday, October 31, 2018 8:20:18 PM
> > > > >> > > To: solr-user
> > > > >> > > Subject: Re: SolrCloud Replication Failure
> > > > >> > >
> > > > >> > > What version of solr? This code was pretty much rewriten in 7.3
> > > IIRC
> > > > >> > >
> > > > >> > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith <jas2...@cornell.edu
> > > wrote:
> > > > >> > >
> > > > >> > > > Hi all,
> > > > >> > > >
> > > > >> > > >      We are currently running a moderately large instance of
> > > > >> standalone
> > > > >> > > > solr and are preparing to switch to solr cloud to help us
> > scale
> > > > >> up.  I
> > > > >> > > have
> > > > >> > > > been running a number of tests using docker locally and ran
> > > into an
> > > > >> issue
> > > > >> > > > where replication is consistently failing.  I have pared down
> > > the
> > > > >> test
> > > > >> > > case
> > > > >> > > > as minimally as I could.  Here's a link for the
> > > docker-compose.yml
> > > > >> (I put
> > > > >> > > > it in a directory called solrcloud_simple) and a script to run
> > > the
> > > > >> test:
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Here's the basic idea behind the test:
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard,
> > > and 2
> > > > >> > > > replicas (each node gets a replica).  Just use the default
> > > schema,
> > > > >> > > although
> > > > >> > > > I've also tried our schema and got the same result.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > 2) Shut down solr-2
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > 3) Add 100 simple docs, just id and a field called num.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > 4) Start solr-2 and check that it received the documents.  It
> > > did!
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > 5) Update a document, commit, and check that solr-2 received
> > the
> > > > >> update.
> > > > >> > > > It did!
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > 6) Stop solr-2, update the same document, start solr-2, and
> > make
> > > > >> sure
> > > > >> > > that
> > > > >> > > > it received the update.  It did!
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > 7) Repeat step 6 with a new value.  This time solr-2 reverts
> > > back
> > > > >> to what
> > > > >> > > > it had in step 5.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > I believe the main issue comes from this in the logs:
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > solr-2_1  | 2018-10-31 17:04:26.135 INFO
> > > > >> > > > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr
> > > > >> > > > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test
> > > > >> s:shard1
> > > > >> > > > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync
> > > PeerSync:
> > > > >> > > > core=test_shard1_replica_n2 url=http://solr-2:8082/solr  Our
> > > > >> versions
> > > > >> > > are
> > > > >> > > > newer. ourHighThreshold=1615861330901729280
> > > > >> > > > otherLowThreshold=1615861314086764545
> > > ourHighest=1615861330901729280
> > > > >> > > > otherHighest=1615861335081353216
> > > > >> > > >
> > > > >> > > > PeerSync thinks the versions on solr-2 are newer for some
> > > reason,
> > > > >> so it
> > > > >> > > > doesn't try to sync from solr-1.  In the final state, solr-2
> > > will
> > > > >> always
> > > > >> > > > have a lower version for the updated doc than solr-1.  I've
> > > tried
> > > > >> this
> > > > >> > > with
> > > > >> > > > different commit strategies, both auto and manual, and it
> > > doesn't
> > > > >> seem to
> > > > >> > > > make any difference.
> > > > >> > > >
> > > > >> > > > Is this a bug with solr, an issue with using docker, or am I
> > > just
> > > > >> > > > expecting too much from solr?
> > > > >> > > >
> > > > >> > > > Thanks for any insights you may have,
> > > > >> > > >
> > > > >> > > > Jeremy
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >>
> > > > >
> > >
> >

Reply via email to