On Thu, 2025-02-13 at 09:51 +0100, Dumitru Ceara wrote:
> On 2/13/25 9:39 AM, Felix Huettner wrote:
> 
> > > > > 
> > > > > Thanks, I will add it. In the meantime I'm also fighting with
> > > > > segfaults
> > > > > that appeared when I was adding tests. Tests add "stuff" to
> > > > > the OVN
> > > > > in
> > > > > big batches and It crashes fairly consistently (though not
> > > > > always).
> > > > > This didn't occur in my manual testing.
> > > > > Sometimes the crashes are silent, but from time to time,
> > > > > there are
> > > > > transaction errors complaining about referencing non-existing
> > > > > DPs
> > > > > or
> > > > > PBs (both for logical ports and tracked ports). I don't
> > > > > presume
> > > > > that
> > > > > your patch will fix that, since it happens with pure LBs too.
> > > > > I wonder if it's enough to have dependency on lr_stateful. Is
> > > > > it
> > > > > possible that we need to depend on something else? Like
> > > > > "en_sync_to_sb_pb"?
> > > > > 
> > > > 
> > > > Would it be possible to share such a test?  I can look into it
> > > > tomorrow.
> > > 
> > > Sure thing, It's in this branch
> > > https://github.com/mkalcok/ovn/commit/8e1394dfbebc4f48255c77937015e07fbcbc5172
> > > . It's based on your updated v5 branch with only one more commit
> > > that
> > > adds the test. It won't pass either way because the branch is
> > > missing
> > > advertisement of neighboring LBs. I have that change in another
> > > branch,
> > > but for the purpose of investigating the crash, this should be
> > > good
> > > enough. I can get it to crash about every other run. The tell-
> > > tell sign
> > > is when the test hangs (because it waits on --wait=sb in some
> > > step).
> > 
> > Hi Martin, Hi Dumitru,
> > 
> > i think i found the issue.
> > Fix is at
> > https://github.com/felixhuettner/ovn/commit/5e1b6051ddb0857382951d6e3a383e657063f303
> > 
> > However this is "just" a use-after-free and i don't think it would
> > generate transaction errors.
> > 
> 
> Yes, that's my bad, good catch Felix!
> 
> Well, as Martin replied in his other email [0], the transaction
> errors
> are about rows with UUID "a5a5a5a5-a5a5-a5a5-a5a5-a5a5a5a5a5a5". 
> That's
> just "poison" inserted at free (as Ales pointed out too [1]).
> 
> [0]
> https://mail.openvswitch.org/pipermail/ovs-dev/2025-February/421022.html
> [1]
> https://mail.openvswitch.org/pipermail/ovs-dev/2025-February/421023.html
> 
> Regards,
> Dumitru
> 
Hi Felix, Dumitru, Ales,

I think you nailed it. I applied the change from Felix and tests are
passing consistently \o/
I'll get back to adding tests and I'll post new version asap.

Martin.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to