On Thu, 2025-02-13 at 13:55 +0100, Dumitru Ceara wrote:
> On 2/13/25 10:05 AM, [email protected] wrote:
> > On Thu, 2025-02-13 at 09:51 +0100, Dumitru Ceara wrote:
> > > On 2/13/25 9:39 AM, Felix Huettner wrote:
> > > 
> > > > > > > 
> > > > > > > Thanks, I will add it. In the meantime I'm also fighting
> > > > > > > with
> > > > > > > segfaults
> > > > > > > that appeared when I was adding tests. Tests add "stuff"
> > > > > > > to
> > > > > > > the OVN
> > > > > > > in
> > > > > > > big batches and It crashes fairly consistently (though
> > > > > > > not
> > > > > > > always).
> > > > > > > This didn't occur in my manual testing.
> > > > > > > Sometimes the crashes are silent, but from time to time,
> > > > > > > there are
> > > > > > > transaction errors complaining about referencing non-
> > > > > > > existing
> > > > > > > DPs
> > > > > > > or
> > > > > > > PBs (both for logical ports and tracked ports). I don't
> > > > > > > presume
> > > > > > > that
> > > > > > > your patch will fix that, since it happens with pure LBs
> > > > > > > too.
> > > > > > > I wonder if it's enough to have dependency on
> > > > > > > lr_stateful. Is
> > > > > > > it
> > > > > > > possible that we need to depend on something else? Like
> > > > > > > "en_sync_to_sb_pb"?
> > > > > > > 
> > > > > > 
> > > > > > Would it be possible to share such a test?  I can look into
> > > > > > it
> > > > > > tomorrow.
> > > > > 
> > > > > Sure thing, It's in this branch
> > > > > https://github.com/mkalcok/ovn/commit/8e1394dfbebc4f48255c77937015e07fbcbc5172
> > > > > . It's based on your updated v5 branch with only one more
> > > > > commit
> > > > > that
> > > > > adds the test. It won't pass either way because the branch is
> > > > > missing
> > > > > advertisement of neighboring LBs. I have that change in
> > > > > another
> > > > > branch,
> > > > > but for the purpose of investigating the crash, this should
> > > > > be
> > > > > good
> > > > > enough. I can get it to crash about every other run. The
> > > > > tell-
> > > > > tell sign
> > > > > is when the test hangs (because it waits on --wait=sb in some
> > > > > step).
> > > > 
> > > > Hi Martin, Hi Dumitru,
> > > > 
> > > > i think i found the issue.
> > > > Fix is at
> > > > https://github.com/felixhuettner/ovn/commit/5e1b6051ddb0857382951d6e3a383e657063f303
> > > > 
> > > > However this is "just" a use-after-free and i don't think it
> > > > would
> > > > generate transaction errors.
> > > > 
> > > 
> > > Yes, that's my bad, good catch Felix!
> > > 
> > > Well, as Martin replied in his other email [0], the transaction
> > > errors
> > > are about rows with UUID "a5a5a5a5-a5a5-a5a5-a5a5-a5a5a5a5a5a5". 
> > > That's
> > > just "poison" inserted at free (as Ales pointed out too [1]).
> > > 
> > > [0]
> > > https://mail.openvswitch.org/pipermail/ovs-dev/2025-February/421022.html
> > > [1]
> > > https://mail.openvswitch.org/pipermail/ovs-dev/2025-February/421023.html
> > > 
> > > Regards,
> > > Dumitru
> > > 
> > Hi Felix, Dumitru, Ales,
> > 
> > I think you nailed it. I applied the change from Felix and tests
> > are
> > passing consistently \o/
> > I'll get back to adding tests and I'll post new version asap.
> > 
> 
> Thanks, Martin!
> 
> Based on some internal early feedback I got from Ales I split the
> first
> commit into two changes for better readability.  I pushed the
> resulting
> commits here:
> 
> - northd: Expose the is_l3dgw_port() function to other modules.
>   https://github.com/dceara/ovn/commit/c7097bd
> - northd: Check and populate NAT entry fields early in the I-P
> engine.
>   https://github.com/dceara/ovn/commit/086711b
> 
> It would be great if you could rebase your changes on top of these
> and
> include them in v6.
> 
> Best regards,
> Dumitru

OK, I'll rebase it. I'm currently checking the ML thread to see if I
didn't forget to include some requested changes and I'll push shortly.

Martin.
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to