Re: [ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

2018-11-21 Thread Han Zhou
On Tue, Nov 20, 2018 at 5:21 AM Mark Michelson  wrote:
>
> Hi Daniel,
>
> I agree with Numan that this seems like a good approach to take.
>
> On 11/16/2018 12:41 PM, Daniel Alvarez Sanchez wrote:
> >
> > On Sat, Nov 10, 2018 at 12:21 AM Ben Pfaff  > > wrote:
> >  >
> >  > On Mon, Oct 29, 2018 at 05:21:13PM +0530, Numan Siddique wrote:
> >  > > On Mon, Oct 29, 2018 at 5:00 PM Daniel Alvarez Sanchez
> > mailto:dalva...@redhat.com>>
> >  > > wrote:
> >  > >
> >  > > > Hi,
> >  > > >
> >  > > > After digging further. The problem seems to be reduced to
reusing an
> >  > > > old gateway IP address for a dnat_and_snat entry.
> >  > > > When a gateway port is bound to a chassis, its entry will show
up in
> >  > > > the MAC_Binding table (at least when that Logical Switch is
connected
> >  > > > to more than one Logical Router). After deleting the Logical
Router
> >  > > > and all its ports, this entry will remain there. If a new Logical
> >  > > > Router is created and a Floating IP (dnat_and_snat) is assigned
to a
> >  > > > VM with the old gw IP address, it will become unreachable.
> >  > > >
> >  > > > A workaround now from networking-ovn (OpenStack integration) is
to
> >  > > > delete MAC_Binding entries for that IP address upon a FIP
creation. I
> >  > > > think that this however should be done from OVN, what do you
folks
> >  > > > think?
> >  > > >
> >  > > >
> >  > > Agree. Since the MAC_Binding table row is created by
ovn-controller, it
> >  > > should
> >  > > be handled properly within OVN.
> >  >
> >  > I see that this has been sitting here for a while.  The solution
seems
> >  > reasonable to me.  Are either of you working on it?
> >
> > I started working on it. I came up with a solution (see patch below)
> > which works but I wanted to give you a bit more of context and get your
> > feedback:
> >
> >
> > ^ localnet
> > |
> > +---+---+
> > |   |
> >  +--+  pub  +--+
> >  |  |   |  |
> >  |  +---+  |
> >  | 172.24.4.0/24 |
> >  | |
> > 172.24.4.220 | | 172.24.4.221
> >  +---+---+ +---+---+
> >  |   | |   |
> >  |  LR0  | |  LR1  |
> >  |   | |   |
> >  +---+---+ +---+---+
> >   10.0.0.254 | | 20.0.0.254
> >  | |
> >  +---+---+ +---+---+
> >  |   | |   |
> > 10.0.0.0/24  |  SW0  | |  SW1  |
> > 20.0.0.0/24 
> >  |   | |   |
> >  +---+---+ +---+---+
> >  | |
> >  | |
> >  +---+---+ +---+---+
> >  |   | |   |
> >  |  VM0  | |  VM1  |
> >  |   | |   |
> >  +---+ +---+
> >  10.0.0.10 20.0.0.10
> >172.24.4.100   172.24.4.200
> >
> >
> > When I ping VM1 floating IP from the external network, a new entry for
> > 172.24.4.221 in the LR0 datapath appears in the MAC_Binding table:
> >
> > _uuid   : 85e30e87-3c59-423e-8681-ec4cfd9205f9
> > datapath: ac5984b9-0fea-485f-84d4-031bdeced29b
> > ip  : "172.24.4.221"
> > logical_port: "lrp02"
> > mac : "00:00:02:01:02:04"
> >
> >
> > Now, if LR1 gets removed and the old gateway IP (172.24.4.221) is reused
> > for VM2 FIP with different MAC and new gateway IP is created (for
> > example 172.24.4.222 00:00:02:01:02:99),  VM2 FIP becomes unreachable
> > from VM1 until the old MAC_Binding entry gets deleted as pinging
> > 172.24.4.221 will use the wrong address ("00:00:02:01:02:04").
> >
> > With the patch below, removing LR1 results in deleting all MAC_Binding
> > entries for every datapath where '172.24.4.221' appears in the 'ip'
> > column so the problem goes away.
> >
> > Another solution would be implementing some kind of 'aging' for
> > MAC_Binding entries but perhaps it's more complex.
> > Looking forward for your comments :)
> >
> >
> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > index 58bef7d..a86733e 100644
> > --- a/ovn/northd/ovn-northd.c
> > +++ b/ovn/northd/ovn-northd.c
> > @@ -2324,6 +2324,18 @@ cleanup_mac_bindings(struct northd_context *ctx,
> > struct hmap *ports)
> >   }
> >   }
> >
> > +static void
> > +delete_mac_binding_by_ip(struct northd_context *ctx, const char *ip)
> > +{
> > +const struct sbrec_mac_binding *b, *n;
> > +

[ovs-discuss] Issue when using bond command in OVS

2018-11-21 Thread Sara Escribano
Hi everyone, I'm having several troubles with *bonding* functionalities.
I'm beginning in this world and I made a topology like this one on my
virtual machine:

   ++
   |ns1 |
   ++
  |vpeerns1
  |
  |
  |vns1
 +---+  +---+
 ++vpeerns4 vns4 |   |vpeerns3  vns3|   |vns2
vpeerns2++
 |ns4 |--|br1|--|br0|---|ns2
|
 ++  |   |  |   |
 ++
 +---+  +---+
   |vpeerns5   -  |vns6
   |  |
   |  |
   |  |
   |  |
   |  |
 ++
   |vns5  |
 || namespaces [nsX]
 +---+|
 ++
 ++vpeerns7 vns7 |   ||
 +---+
 |ns7 |--|br2||
 |   |
 ++  |   |vpeerns6
 |   |  bridges [brX]
 +---+
 |   |

 +---+




(I hope the topology can be understood)

When everything is connected directly to br0, br1 or br2, all devices
are working perfectly, connectivity exists between them and I can make
*ping* between all namespaces.
The problem comes when I want to use *bonding* for vpeerns3 and
vpeerns5 interfaces (forgive the names).

I used these commands to make it (everything worked before this):

*ovs-vsctl add-bond br1 bond1 vpeerns3 vpeerns5 *
*ovs-vsctl set port bond1 bond_mode=balance-tcp*


And now, nothing crosses to ns4 and from ns4. The rest of the pings
are working perfectly (because I set before ARP cache tables) but that
one is not working. The funny thing is when I capture traffic in
vpeerns3/vpeerns5 (I send a ping from ns1 to ns4, for example), there
are echo request messages there, but nothing if I capture on br1, or
ns4, it's like air is between vpeerns3/vpeerns5 and br1.

What could be wrong here??

Thanks in advance
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] Reg. ofport set to -1

2018-11-21 Thread Vishal Deep Ajmera
Hi,

Currently in OVS we set "-1" for ofport field in interface table of OVS DB when 
we fail to initialize the interface. This means if we had allocated any ofport 
number earlier, it is released. When the above failed interface gets 
initialized successfully later it gets a "new" ofport number again. Due to 
change in ofport number, any SDN controller will have to change all the 
openflows corresponding to this interface (due to change in port number). I am 
not sure why we release the ofport number for the interface, unless it is 
deleted from the interface table itself.

Does it make sense to preserve the ofport number as long as the interface is 
present in OVS DB and release it once it is deleted? This will help controller 
not to modify existing flows but only replay (resync) the same, once interface 
comes back alive.

Need some guidance here how to handle such cases.

Warm Regards,
Vishal Ajmera


___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] A coredump cause by the race condition by handler、upcall and revalidation thread

2018-11-21 Thread Zhengjingzhou

We've found a coredump in a daily testcase(ovs2.7.0+dpdk),
After a deep analyze we found it coredumped after a ovs service restart(with 
birdges and ports), and the revalidator and upcall threads race at the 
ukey_state. However ,it seems hard to reproduce the coredump.We conclude a 
possiblity
1. handlerA deals the packet1, prepare to put a flow,wait(lock)
2. handlerB deals another packet2(same mac with packet1), find the device has 
been deleted, generate a flow
3. handlerA meet a error(EEXIST, in function flow_put_on_pmd), prepare to 
transition ukeystate to evicted(in function handle_upcalls), wait to aquire the 
lock
for (i = 0; i < n_ops; i++) {
struct udpif_key *ukey = ops[i].ukey;

if (ukey) {
ovs_mutex_lock(>mutex);
if (ops[i].dop.error) {
transition_ukey(ukey, UKEY_EVICTED); ///doubt may be here 
should add a state condition? as beflow
}
else if (ukey->state < UKEY_OPERATIONAL) {
transition_ukey(ukey, UKEY_OPERATIONAL);
}

4. revalidate thread C transitions the ukeystate to deleted(by expiration or 
other deals), releases the ukey lock
5. handlerA accuires the ukey lock(step 3) ,finds the pre state is deleted, 
abort

which the stack is
#0  0x7fba3737f417 in raise () from /usr/lib64/libc.so.6
#1  0x7fba37380b08 in abort () from /usr/lib64/libc.so.6
#2  0x005746be in ovs_abort_valist (err_no=err_no@entry=0, 
format=format@entry=0x6b0028 "Invalid ukey transition %d->%d (last transitioned 
from thread %u at %s)", args=args@entry=0x7fb995628ea0)
at lib/util.c:341
#3  0x0057b8b0 in vlog_abort_valist (function=, 
line=, module_=,
message=0x6b0028 "Invalid ukey transition %d->%d (last transitioned from 
thread %u at %s)", args=args@entry=0x7fb995628ea0) at lib/vlog.c:1229
#4  0x0057b93a in vlog_abort (function=function@entry=0x6b0d90 
<__func__.28159> "transition_ukey_at", line=line@entry=1741, 
module=module@entry=0x9ac220 ,
message=message@entry=0x6b0028 "Invalid ukey transition %d->%d (last 
transitioned from thread %u at %s)") at lib/vlog.c:1243
#5  0x0049a85d in transition_ukey_at (ukey=ukey@entry=0x7fb97c005120, 
dst=dst@entry=UKEY_EVICTED, where=where@entry=0x6b0648 
"ofproto/ofproto_dpif_upcall.c:1467")
at ofproto/ofproto_dpif_upcall.c:1739
#6  0x0049da45 in handle_upcalls (n_upcalls=64, upcalls=0x7fb99564dc70, 
udpif=) at ofproto/ofproto_dpif_upcall.c:1467
#7  recv_upcalls (handler=0x3339f90, handler=0x3339f90) at 
ofproto/ofproto_dpif_upcall.c:887
#8  0x0049dc7a in udpif_upcall_handler (arg=0x3339f90) at 
ofproto/ofproto_dpif_upcall.c:783
#9  0x00546e48 in ovsthread_wrapper (aux_=) at 
lib/ovs_thread.c:682
#10 0x7fba38f97e45 in start_thread () from /usr/lib64/libpthread.so.0
#11 0x7fba37442afd in clone () from /usr/lib64/libc.so.6

and the relative log
2018-11-10T16:03:10.796061+08:00|warning|ovs-vswitchd[18219]|xlate_report_error[647]|1|ofproto_dpif_xlate(handler16)|:
 received packet on unknown port 1 while processing 
udp,in_port=1,vlan_tci=0x,dl_src=38:4c:4f:cb:62:5f,dl_dst=38:4c:4f:cb:62:53,nw_src=199.168.1.106,nw_dst=199.168.1.32,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=53744,tp_dst=4789
 on bridge br-1
2018-11-10T16:03:10.796497+08:00|warning|ovs-vswitchd[18219]|xlate_report_error[647]|2|ofproto_dpif_xlate(handler16)|:
 received packet on unknown port 1 while processing 
udp,in_port=1,vlan_tci=0x,dl_src=38:4c:4f:cb:62:5f,dl_dst=38:4c:4f:cb:62:53,nw_src=199.168.1.106,nw_dst=199.168.1.32,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=53744,tp_dst=4789
 on bridge br-1
2018-11-10T16:03:10.796871+08:00|warning|ovs-vswitchd[18219]|xlate_report_error[647]|1|ofproto_dpif_xlate(handler19)|:
 received packet on unknown port 1 while processing 
udp,in_port=1,vlan_tci=0x,dl_src=38:4c:4f:cb:62:5f,dl_dst=38:4c:4f:cb:62:53,nw_src=199.168.1.106,nw_dst=199.168.1.32,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=41088,tp_dst=4789
 on bridge br-1
2018-11-10T16:03:10.797261+08:00|info|ovs-vswitchd[18219]|recv_upcalls[848]|2|ofproto_dpif_upcall(handler19)|:
 received packet on unassociated datapath port 5
2018-11-10T16:03:10.797616+08:00|info|ovs-vswitchd[18219]|recv_upcalls[848]|3|ofproto_dpif_upcall(handler19)|:
 received packet on unassociated datapath port 5
2018-11-10T16:03:10.797833+08:00|info|ovs-vswitchd[18219]|recv_upcalls[848]|4|ofproto_dpif_upcall(handler19)|:
 received packet on unassociated datapath port 5
2018-11-10T16:03:10.798202+08:00|info|ovs-vswitchd[18219]|recv_upcalls[848]|5|ofproto_dpif_upcall(handler19)|:
 received packet on unassociated datapath port 5
2018-11-10T16:03:10.798434+08:00|info|ovs-vswitchd[18219]|recv_upcalls[848]|6|ofproto_dpif_upcall(handler19)|:
 received packet on unassociated datapath port 5
2018-11-10T16:03:10.801329+08:00|info|ovs-vswitchd[18219]|revalidate[2473]|1|ofproto_dpif_upcall(revalidator20)|:
 Unexpected ukey transition from