On 2/13/25 10:05 AM, [email protected] wrote: > On Thu, 2025-02-13 at 09:51 +0100, Dumitru Ceara wrote: >> On 2/13/25 9:39 AM, Felix Huettner wrote: >> >>>>>> >>>>>> Thanks, I will add it. In the meantime I'm also fighting with >>>>>> segfaults >>>>>> that appeared when I was adding tests. Tests add "stuff" to >>>>>> the OVN >>>>>> in >>>>>> big batches and It crashes fairly consistently (though not >>>>>> always). >>>>>> This didn't occur in my manual testing. >>>>>> Sometimes the crashes are silent, but from time to time, >>>>>> there are >>>>>> transaction errors complaining about referencing non-existing >>>>>> DPs >>>>>> or >>>>>> PBs (both for logical ports and tracked ports). I don't >>>>>> presume >>>>>> that >>>>>> your patch will fix that, since it happens with pure LBs too. >>>>>> I wonder if it's enough to have dependency on lr_stateful. Is >>>>>> it >>>>>> possible that we need to depend on something else? Like >>>>>> "en_sync_to_sb_pb"? >>>>>> >>>>> >>>>> Would it be possible to share such a test? I can look into it >>>>> tomorrow. >>>> >>>> Sure thing, It's in this branch >>>> https://github.com/mkalcok/ovn/commit/8e1394dfbebc4f48255c77937015e07fbcbc5172 >>>> . It's based on your updated v5 branch with only one more commit >>>> that >>>> adds the test. It won't pass either way because the branch is >>>> missing >>>> advertisement of neighboring LBs. I have that change in another >>>> branch, >>>> but for the purpose of investigating the crash, this should be >>>> good >>>> enough. I can get it to crash about every other run. The tell- >>>> tell sign >>>> is when the test hangs (because it waits on --wait=sb in some >>>> step). >>> >>> Hi Martin, Hi Dumitru, >>> >>> i think i found the issue. >>> Fix is at >>> https://github.com/felixhuettner/ovn/commit/5e1b6051ddb0857382951d6e3a383e657063f303 >>> >>> However this is "just" a use-after-free and i don't think it would >>> generate transaction errors. >>> >> >> Yes, that's my bad, good catch Felix! >> >> Well, as Martin replied in his other email [0], the transaction >> errors >> are about rows with UUID "a5a5a5a5-a5a5-a5a5-a5a5-a5a5a5a5a5a5". >> That's >> just "poison" inserted at free (as Ales pointed out too [1]). >> >> [0] >> https://mail.openvswitch.org/pipermail/ovs-dev/2025-February/421022.html >> [1] >> https://mail.openvswitch.org/pipermail/ovs-dev/2025-February/421023.html >> >> Regards, >> Dumitru >> > Hi Felix, Dumitru, Ales, > > I think you nailed it. I applied the change from Felix and tests are > passing consistently \o/ > I'll get back to adding tests and I'll post new version asap. >
Thanks, Martin! Based on some internal early feedback I got from Ales I split the first commit into two changes for better readability. I pushed the resulting commits here: - northd: Expose the is_l3dgw_port() function to other modules. https://github.com/dceara/ovn/commit/c7097bd - northd: Check and populate NAT entry fields early in the I-P engine. https://github.com/dceara/ovn/commit/086711b It would be great if you could rebase your changes on top of these and include them in v6. Best regards, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
