On Thu, Aug 11, 2022 at 10:55 AM Ilya Maximets <i.maxim...@ovn.org> wrote: > > If the PACKET_OUT from controller ends up with sending packet to > a bond interface, the main thread will take locks in the following > order: > handle_openflow > --> take ofproto_mutex > handle_packet_out > packet_xlate > output_normal > bond_update_post_recirc_rules > --> take rwlock in bond.c > > If at the same time revalidator thread is processing other packet > with the output to the same bond: > xlate_actions > output_normal > bond_update_post_recirc_rules > --> take rwlock in bond.c > update_recirc_rules > ofproto_dpif_add_internal_flow > ofproto_flow_mod > --> take ofproto_mutex > > So, it is possible for these 2 threads to lock each other by > taking one lock and waiting for another thread to release the > second lock. > > It might also be possible for the main thread to lock itself > up by trying to acquire ofproto_mutex for the second time, if > it will actually proceed with update_recirc_rules() after > taking the bond rwlock. > > The problem appears to be that bond_update_post_recirc_rules() > is called during the flow translation even if side effects are > prohibited, which is the case for openflow PACKET_OUT handling. > > Skipping actual flow updates during the flow translation if > side effects are disabled to avoid the deadlock. > > Since flow are not installed now when actions translated for > very first packet, installing initial flows in bond_reconfigure(). > This will cover the case of allocating a new recirc_id. > > Also checking if we need to update flows in bond_run() to cover > link state changes. > > Reported-at: https://github.com/openvswitch/ovs-issues/issues/259 > Fixes: adcf00ba35a0 ("ofproto/bond: Implement bond megaflow using > recirculation") > Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> > --- > > I'm not sure how to create a unit test for the lock race with > revalidator, but it might be possible to trigger the self-lockup > issue on the main thread. I'll try to work on the test while > waiting for review. >
Hello Ilya, I've tested this and confirmed that it solves the problem for me. Acked-by: Mike Pattrick <m...@redhat.com> I created a reproducer that doesn't require dpdk or real interfaces: ip netns add sidea ip netns add sideb ovs-vsctl add-br br-sidea ovs-vsctl add-br br-sideb ip link add portab0 type veth peer name portbb0 ip link add portab1 type veth peer name portbb1 ip link set dev portab0 up ip link set dev portab1 up ip link set dev portbb0 up ip link set dev portbb1 up ovs-vsctl add-bond br-sidea bonda portab0 portab1 -- set port bonda lacp=active -- set port bonda bond_mode=balance-tcp ovs-vsctl add-bond br-sideb bondb portbb0 portbb1 -- set port bondb lacp=active -- set port bondb bond_mode=balance-tcp ip link add porta2 type veth peer name ovs-porta2 ip link add portb2 type veth peer name ovs-portb2 ip link set dev ovs-porta2 up ip link set dev ovs-portb2 up ip link set porta2 netns sidea ip link set portb2 netns sideb ip netns exec sidea ip link set dev porta2 up ip netns exec sideb ip link set dev portb2 up ip netns exec sidea ip addr add 172.31.1.1/24 dev porta2 ip netns exec sideb ip addr add 172.31.1.2/24 dev portb2 ovs-vsctl add-port br-sidea ovs-porta2 ovs-vsctl add-port br-sideb ovs-portb2 ovs-ofctl add-flow br-sidea "actions=normal" ovs-ofctl add-flow br-sideb "actions=normal" while [[ 1 ]]; do ovs-ofctl packet-out br-sidea "packet=ffffffffffff000000000000080045000054fbc340004001e4a3ac1f0101ac1f01020800e9af9e540001ce8e036300000000da35050000000000101112131415161718191a1b1c1d1e1f202122231415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637, actions=resubmit(,0)" 2>/dev/null >/dev/null ovs-ofctl packet-out br-sideb "packet=ffffffffffff000000000000080045000054de00000040014267ac1f0102ac1f01010000efecd19a00019c90036300000000d5b00a0000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637, actions=resubmit(,0)" done & while [[ 1 ]]; do ovs-vsctl --may-exist br-sidea ovs-vsctl --if-exists del-port bonda ovs-vsctl add-bond br-sidea bonda portab0 portab1 -- set port bonda lacp=active -- set port bonda bond_mode=balance-tcp ovs-vsctl add-port br-sidea ovs-porta2 ovs-ofctl add-flow br-sidea "actions=normal" done & ip netns exec sidea ping -f 172.31.1.2 On my machine it triggers a deadlock fairly consistently in under 3 seconds. Do you think it's worth turning something like this into a test? Or would this be too slow and resource intensive? Cheers, M _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev