On Wed, Jul 05, 2017 at 05:03:06PM +0200, Eelco Chaudron wrote:
> On 23/06/17 10:37, Eelco Chaudron wrote:
> >On 06/23/2017 02:12 AM, Ben Pfaff wrote:
> >>On Thu, Jun 22, 2017 at 04:04:44PM -0300, Flavio Leitner wrote:
> >>>On Thu, Jun 22, 2017 at 01:04:59AM +0800, Huanle Han wrote:
> >>>>Hi,all
> >>>>
> >>>>I get this problem with latest(dbd8112) branch-2.7 code on my Ubuntu.
> >>>>root@ubuntu:/var/log/# ovs-vsctl show
> >>>>adf2ea99-0c53-4180-914f-7dadaa71302b
> >>>>     Bridge test
> >>>>         Port test
> >>>>             Interface test
> >>>>                 type: internal
> >>>>     Bridge "manage"
> >>>>         Port "manage"
> >>>>             Interface "manage"
> >>>>                 type: internal
> >>>>                 error: "could not open network device manage (File
> >>>>exists)"
> >>>>         Port "veth0"
> >>>>             Interface "veth0"
> >>>>         Port "eth0"
> >>>>             Interface "eth0"
> >>>>     ovs_version: "2.7.0"
> >>>>
> >>>>How to reproduce:
> >>>>1. add bridge "manage", up and add ip on it
> >>>>2. restart ovs-vswitchd
> >>>>3. "ovs-vsctl show" displays error message.
> >>>>
> >>>>The reason:
> >>>>In following "netdev_open" call on ovs-vswitchd start, input "type"
> >>>>is NULL
> >>>>and "manage" is opened as a "system" netdev_class iface incorrectly.
> >>>>
> >>>>#0  netdev_open (name=0x7fffffffe2bc "manage", type=0x0,
> >>>>netdevp=0x7fffffffc3b0) at ../lib/netdev.c:396
> >>>>#1  0x000000000052c492 in get_src_addr (ip6_dst=0x7fffffffe2ac,
> >>>>output_bridge=0x7fffffffe2bc "manage", psrc=0x8f3490) at
> >>>>../lib/ovs-router.c:141
> >>>>#2  0x000000000052c85d in ovs_router_insert__ (priority=104 'h',
> >>>>ip6_dst=0x7fffffffe29c, plen=104 'h', output_bridge=0x7fffffffe2bc
> >>>>"manage", gw=0x7fffffffe2ac) at ../lib/ovs-router.c:202
> >>>>#3  0x000000000052c980 in ovs_router_insert (ip_dst=0x7fffffffe29c,
> >>>>plen=104 'h', output_bridge=0x7fffffffe2bc "manage",
> >>>>gw=0x7fffffffe2ac) at
> >>>>../lib/ovs-router.c:228
> >>>>#4  0x000000000058f63a in route_table_handle_msg
> >>>>(change=0x7fffffffe290) at
> >>>>../lib/route-table.c:295
> >>>>#5  0x000000000058f1da in route_table_reset () at
> >>>>../lib/route-table.c:174
> >>>>#6  0x000000000058f034 in route_table_init () at
> >>>>../lib/route-table.c:110
> >>>>#7  0x0000000000495838 in dp_initialize () at ../lib/dpif.c:126
> >>>>#8  0x0000000000495b40 in dp_enumerate_types (types=0x7fffffffe3a0) at
> >>>>../lib/dpif.c:244
> >>>>#9  0x000000000042eb1c in enumerate_types (types=0x7fffffffe3a0) at
> >>>>../ofproto/ofproto-dpif.c:267
> >>>>#10 0x000000000041b81c in ofproto_enumerate_types
> >>>>(types=0x7fffffffe3a0) at
> >>>>../ofproto/ofproto.c:432
> >>>>#11 0x000000000040df1e in bridge_run__ () at ../vswitchd/bridge.c:3020
> >>>>#12 0x000000000040e196 in bridge_run () at ../vswitchd/bridge.c:3082
> >>>>#13 0x00000000004138ef in main (argc=1, argv=0x7fffffffe578) at
> >>>>../vswitchd/ovs-vswitchd.c:119
> >>>>
> >>>>After then, ovs fails to netdev_open "manage" with type == "internal".
> >>>>"File exists" error is reported.
> >>>>I think commit d3b8f50(netdev: Fix netdev_open() to adhere to class
> >>>>type if
> >>>>given) introduces this problem. It need be improved.
> >>>Your analysis is correct.  One solution is to wait vswitchd to
> >>>configure first and only then enable ovs-route.  The problem is that
> >>>more modules might use netdev_open() and it doesn't sound like a good
> >>>idea to have a control per module.
> >>>
> >>>Another option is to map the device's info to a class instead of using
> >>>"system", so we could use internal classes for vports, for instance.
> >>>However, that doesn't guarantee it will match with what is configured
> >>>in the DB.
> >>This sounds like the kind of problem that I expected the following
> >>commit might cause.
> >>
> >>     commit d3b8f5052292b3ba9084ffed097e90b87f2950f5
> >>     Author: Eelco Chaudron <[email protected]>
> >>     Date:   Thu Jun 1 14:38:09 2017 +0200
> >>
> >>         netdev: Fix netdev_open() to adhere to class type if given
> >>
> >>If we can't fix it somehow, we might need to revert.
> >
> >Reverting the above patch does not solve the real issue here. If we revert
> >we
> >donot get the error, but the wrong class is used, hence the wrong
> >callbacks
> >get called.
> >
> >The main issue is with netdev_open() being called with type = NULL before
> >the interface is actually configured in the system. We could track these
> >"auto" generated interfaces, and once netdev_open() gets called with a
> >valid type reconfigure (re-create) it. netdev_remove() could work here
> >but not sure if its safe due to reference counting.
> >
> Some background info; I see a lot of netdev_open() with a NULL type, but
> they just grep some data and closed it again. I found that the tunnel code
> is keeping a reference to the netdev (for the IP assigned) and hence it is
> not going away before the "real" interface opens it.
> 
> The change below is based on tracking the "classless" opened devices, and
> once they get opened with a "real" class, re-create them. I did some basic
> testing and it seem to work fine, also went over related code and could not
> find any corner case.
> 
> So please take a peek, and if it all makes sense I can send out an official
> patch.

It's not exactly pleasing, but it's the best solution I've seen so far.

I wonder whether there's a way to better figure out the class type in
advance.

Let's try the solution you're proposing.  You want to send it out?

Thanks,

Ben.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to