On Thu, Nov 26, 2020 at 06:35:44PM +0530, Numan Siddique wrote:
> On Thu, Nov 26, 2020 at 11:30 AM Numan Siddique <num...@ovn.org> wrote:
> >
> > On Thu, Nov 26, 2020 at 10:54 AM Ben Pfaff <b...@ovn.org> wrote:
> > >
> > > On Wed, Nov 25, 2020 at 01:13:22PM +0530, Numan Siddique wrote:
> > > > On Wed, Nov 25, 2020 at 4:21 AM Ben Pfaff <b...@ovn.org> wrote:
> > > > >
> > > > > The tests "superseding ACLs with conjunction" and "ARP replies for 
> > > > > SNAT
> > > > > external ips" trigger bugs in the ovn-controller incremental 
> > > > > processing
> > > > > logic.  This works around those bugs.
> > > > >
> > > >
> > > > > Signed-off-by: Ben Pfaff <b...@ovn.org>
> > > >
> > > > Can you please try test case - "ARP replies for SNAT external ips"
> > > > with the latest OVN master ?
> > > >
> > > > The commit 
> > > > https://github.com/ovn-org/ovn/commit/53f60c7ab742cba0b3dd84b73658e0bbd44ec145
> > > > should solve this issue.
> > > >
> > > > I will take a look into the other test case - "superseding ACLs with
> > > > conjunction".
> > >
> > > It does solve the issues that this was meant to fix.
> > >
> > > The following tests still segfault in ovn-controlle:
> > >
> > > 269: ovn -- controller I-P handling with monitoring disabled -- 
> > > ovn-northd-ddlog FAILED (ovs-macros.at:253)
> > > 301: ovn -- ovn-controller incremental processing    FAILED 
> > > (ovn-performance.at:542)
> > >
> > > with backtraces that look like the following.  If this is because of a
> > > bug I introduced into ovsdb-idl, I think it has to be a subtle one...
> > >
> > > #0  0x0000000000413e00 in handle_deleted_lport (pb=0x110c550,
> > >     b_ctx_in=0x7ffea1c813d0, b_ctx_out=0x7ffea1c81380)
> > >     at ../controller/binding.c:1982
> > > #1  0x000000000041628e in binding_handle_port_binding_changes (
> > >     b_ctx_in=b_ctx_in@entry=0x7ffea1c813d0,
> > >     b_ctx_out=b_ctx_out@entry=0x7ffea1c81380) at 
> > > ../controller/binding.c:2153
> > > #2  0x0000000000434650 in runtime_data_sb_port_binding_handler (
> > >     node=0x7ffea1c82730, data=0x10ad150) at 
> > > ../controller/ovn-controller.c:1471
> > > #3  0x00007f0016dff4ab in engine_compute (recompute_allowed=<optimized 
> > > out>,
> > >     node=<optimized out>) at ../lib/inc-proc-eng.c:306
> > > #4  engine_run_node (recompute_allowed=true, node=0x7ffea1c82730)
> > >     at ../lib/inc-proc-eng.c:352
> > > #5  engine_run (recompute_allowed=recompute_allowed@entry=true)
> > >     at ../lib/inc-proc-eng.c:377
> > > #6  0x0000000000411a4d in main (argc=<optimized out>, argv=<optimized 
> > > out>)
> > >     at ../controller/ovn-controller.c:2747
> >
> > With your IDL CS patch series, I'm seeing 100% failure for
> > "ovn-controller incremental processing" test case.
> > I think ovn-controller should not segfault. Thanks for the backtrace.
> > I will look into it.
> >
> 
> Hi Ben,
> 
> The crash is seen because in binding.c, we access port_binding->datapath 
> column.
> 
> Since the 'datapath' column of the Port_Binding table has a  strong
> reference to the Datapath_binding table, this column
> should never be NULL, right ?
> 
> Since the crash is seen with the tracked data, maybe your IDL CS
> patchset needs some handling in the tracked code in IDL ?

My OVS series could change the order in which updates to rows in a
single set of updates were applied to the IDL.  This order wasn't
predictable anyway (it just depended on the ordering of randomly
generated UUIDs), but apparently something in the IDL was sensitive to
it.  There's a probably a bug in the IDL related to this.

I posted a v2 of my patchset.  It exactly reproduces the application
order that the IDL previously used.  It's an improvement in another way
since the data structures are simpler and better.  And this workaround
patch can be dropped.

Thanks,

Ben.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to