On 8/5/20 11:58 PM, Han Zhou wrote:
> 
> 
> On Wed, Aug 5, 2020 at 12:48 PM Dumitru Ceara <dce...@redhat.com
> <mailto:dce...@redhat.com>> wrote:
>>
>> On 8/5/20 7:48 PM, Han Zhou wrote:
>> >
>> >
>> > On Wed, Aug 5, 2020 at 8:28 AM Dumitru Ceara <dce...@redhat.com
> <mailto:dce...@redhat.com>
>> > <mailto:dce...@redhat.com <mailto:dce...@redhat.com>>> wrote:
>> >>
>> >> Every time a follower has to install a snapshot received from the
>> >> leader, it should also replace the data in memory. Right now this only
>> >> happens when snapshots are installed that also change the schema.
>> >>
>> >> This can lead to inconsistent DB data on follower nodes and the
> snapshot
>> >> may fail to get applied.
>> >>
>> >> CC: Han Zhou <hz...@ovn.org <mailto:hz...@ovn.org>
> <mailto:hz...@ovn.org <mailto:hz...@ovn.org>>>
>> >> Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after
>> > raft install_snapshot.")
>> >> Signed-off-by: Dumitru Ceara <dce...@redhat.com
> <mailto:dce...@redhat.com>
>> > <mailto:dce...@redhat.com <mailto:dce...@redhat.com>>>
>> >
>> > Thanks Dumitru! This is a great finding, and sorry for my mistake.
>> > This patch looks good to me. Just one minor comment below on the test
>> > case. Otherwise:
>> >
>> > Acked-by: Han Zhou <hz...@ovn.org <mailto:hz...@ovn.org>
> <mailto:hz...@ovn.org <mailto:hz...@ovn.org>>>
>> >
>>
>> Thanks Han for the review! I fixed the test case as you suggested and
>> sent v2.
>>
>> I was wondering if this is also the root cause for the issue you
>> reported a while back during the OVN meeting. In my scenario, if a
>> follower ends up in this situation, and if the DB gets compacted online
>> afterwards, the DB file also becomes inconsistent and in some cases
>> (after the DB server is restarted) all write transactions from clients
>> are rejected with "ovsdb-error: inconsistent data".
>>
> Yes, I believe it is the root cause. I thought this patch was exactly
> for that issue. Is it also for something else?
> 

This patch is for the issue I described above: inconsistent DB on
follower followed by online compacting of the DB which corrupts the DB
file too. I wasn't sure if this was also what you were hitting in your
deployment, I just wanted to check if there are any other known
potential issues we need to investigate.

>> Related to that I also sent the following patch to make the ovsdb-server
>> storage state available via appctl commands:
>>
>>
> https://patchwork.ozlabs.org/project/openvswitch/patch/1596467128-13004-1-git-send-email-dce...@redhat.com/
>>
> 
> I will take a look.
> 
> Thanks,
> Han
> 

Thanks!
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to