On Fri, Oct 5, 2018 at 2:32 AM Guru Shetty <g...@ovn.org> wrote: > > > On Thu, 4 Oct 2018 at 13:29, Numan Siddique <nusid...@redhat.com> wrote: > >> >> Thanks for the review Guru. >> >> Please see below for the comments. >> >> >> >> On Fri, Oct 5, 2018 at 12:09 AM Guru Shetty <g...@ovn.org> wrote: >> >>> >>> >>> On Mon, 17 Sep 2018 at 02:24, <nusid...@redhat.com> wrote: >>> >>>> From: Numan Siddique <nusid...@redhat.com> >>>> >>>> When a child vlan interface is created inside a VM, the below kernel >>>> message >>>> is seen and IPv6 doesn't work on that interface. >>>> >>> >>> On which interface doesn't IPv6 work? On the Vm's interface or on the >>> container's interface? >>> >>> >> >> It is seen on the container's interface. >> >>> >>>> [ 138.000753] IPv6: vlan4: IPv6 duplicate address <IPv6 LLA> detected! >>>> >>>> When a child port sends a broadcast packet, OVN delivers the same >>>> packet back to the child port (and hence the DAD check fails). >>>> >>> >>> Is this limited to IP broadcast only? Or does this happen with mac >>> broadcast too? (I am surprised why I hadn't seen this behavior with ARP >>> packets of containers). >>> >> >> It happens with mac broadcast. You can actually reproduce the issue by >> applying the code in tests/ovn.at without this patch. >> >> Looks like I have confused you with the patch. Let me try to give an >> example. Suppose we have switch - sw0 - with ports p1, p2 and p3 and all of >> them reside in the same chassis. >> When p1 sends a broadcast/multicast packet, the below logical flow is hit >> >> table=17(ls_in_l2_lkup ), priority=100 , match=(eth.mcast), >> action=(outport = "_MC_flood"; output;) >> >> This translates to the OF flow >> - metadata=0x1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 >> actions=load:0xffff->NXM_NX_REG15[],resubmit(,32) >> >> In the table 33 if reg15 has 0xffff, it will resubmit the flow with reg15 >> set to each logical port >> >> i.e reg15=0xffff,metadata=0x1 >> actions=load:0x1->NXM_NX_REG15[],resubmit(,34),load:0x2>NXM_NX_REG15[],resubmit(,34),load:0x3->NXM_NX_REG15[],resubmit(,34),load:0xffff->NXM_NX_REG15[] >> >> In the normal case, the packet will not be delivered back to p1 and >> instead will be dropped in table 34 (OFTABLE_CHECK_LOOPBACK) >> since reg10[0] is not set. >> >> table=34, n_packets=0, n_bytes=0, >> priority=100,reg10=0/0x1,reg14=0x2,reg15=0x2,metadata=0x1 actions=drop >> >> When the register reg10[0] is set, it means the packet needs to be loop >> backed to the source port. So table 64 has flows to clear the in_port >> i,.e >> table=64, priority=100,reg10=0x1/0x1,reg15=0x1,metadata=0x1 >> actions=push:NXM_OF_IN_PORT[],load:0->NXM_OF_IN_PORT[],resubmit(,65),pop:NXM_OF_IN_PORT[] >> >> Now if we take the example of child ports, let's say "p2" is the child >> port of "p1" and in_port of p1 is 1 >> In table 0, we will see 2 flows for in_port=1 >> >> match=(in_port=1) - >> actions=(load:0x1->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[] and submit to >> table 8) >> match=(in_port=1, dl_vlan=4) actions(load:0x1->NXM_NX_REG10[0], >> strip_vlan, load:0x1->OXM_OF_METADATA[],load:0x2->NXM_NX_REG14[] , and >> submit to table 8) >> >> Suppose if p2 sends a braodcast packet, reg10[0] is set in table 0. >> >> In this case, since reg10[0] is set, the broadcast packet is not dropped >> in table 34 for port "p2" and it gets delivered back. >> So this patch instead of using reg10[0], it uses a new register bit >> -MLF_NESTED_CONTAINER_BIT >> > > Thanks for the detailed response. I understand till the above point. So > not using NXM_NX_REG10[0] (or MLF_ALLOW_LOOPBACK_BIT) actually lets the > packet drop in table 34. Can you please add that information to the commit > message. I think it will be helpful in the future. > > I then don't understand, what we are doing with the new code > in OFTABLE_SAVE_INPORT. I would expect that the new code is needed to save > inport if the destination is a nested container. Not quite understand why > we have a match for parent port. I will wait for the comment in the code to > understand it. >
The code to push, clear the inport and pop the inport is already present for the child ports - https://github.com/openvswitch/ovs/blob/master/ovn/controller/physical.c#L265 And the code to clear the inport for the parent port or any port when reg10[0] is set is here - https://github.com/openvswitch/ovs/blob/master/ovn/controller/physical.c#L256 Now with this patch since we are using reg10[5] to indicate the packet is from child ports, we need to clear the in_port if the packet from the child port needs to be sent to the parent port. And that's why have a new match for the parent ports. I changed the signature of the function 'put_local_common_flows' because I wanted to add the flow to check reg10[5] only if a port is a parent port. Otherwise we will add the flow to check reg10[5] for all the ports in the switch. I will add proper documentation and make it clear. Thanks Numan > > >> >> We also need to clear the in_port in table 64 if MLF_NESTED_CONTAINER_BIT >> is set. >> >> I will address the review comments and submit p2. >> >> Hope this clears your confusion. >> >> Thanks >> Numan >> >> >> >>> >>> >>> This is because 'MLF_ALLOW_LOOPBACK_BIT' is set in REG10 in table 0 when >>>> the packet is received from any child port and table >>>> 'OFTABLE_CHECK_LOOPBACK' >>>> doesn't drop the packet. >>>> >>> >>> I had thought that the previous tables only replicate packets to ports >>> other than the inport. But I may be wrong. >>> Some of my comments further down could be stupid because I may have not >>> understood the problem well. >>> >>> >>>> >>>> This patch fixes the issue by using a new register bit >>>> (MLF_NESTED_CONTAINER_BIT) >>>> instead of 'MLF_ALLOW_LOOPBACK_BIT' and sets it in REG10 for the >>>> packets received >>>> from child ports. >>>> >>>> Signed-off-by: Numan Siddique <nusid...@redhat.com> >>>> --- >>>> ovn/controller/ofctrl.c | 29 ++++++++++++++++++++--------- >>>> ovn/controller/ofctrl.h | 5 +++++ >>>> ovn/controller/physical.c | 38 ++++++++++++++++++++++++++++++++------ >>>> ovn/lib/logical-fields.h | 4 ++++ >>>> tests/ovn.at | 11 +++++++++++ >>>> 5 files changed, 72 insertions(+), 15 deletions(-) >>>> >>>> diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c >>>> index 96c57f143..2f2b185ae 100644 >>>> --- a/ovn/controller/ofctrl.c >>>> +++ b/ovn/controller/ofctrl.c >>>> @@ -630,9 +630,11 @@ ofctrl_recv(const struct ofp_header *oh, enum >>>> ofptype type) >>>> * >>>> * The caller should initialize its own hmap to hold the flows. */ >>>> void >>>> -ofctrl_add_flow(struct hmap *desired_flows, >>>> - uint8_t table_id, uint16_t priority, uint64_t cookie, >>>> - const struct match *match, const struct ofpbuf >>>> *actions) >>>> +ofctrl_check_and_add_flow(struct hmap *desired_flows, >>>> + uint8_t table_id, uint16_t priority, >>>> uint64_t cookie, >>>> + const struct match *match, >>>> + const struct ofpbuf *actions, >>>> + bool log_duplicate_flow) >>>> { >>>> struct ovn_flow *f = xmalloc(sizeof *f); >>>> f->table_id = table_id; >>>> @@ -644,13 +646,14 @@ ofctrl_add_flow(struct hmap *desired_flows, >>>> f->cookie = cookie; >>>> >>>> if (ovn_flow_lookup(desired_flows, f)) { >>>> - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); >>>> - if (!VLOG_DROP_DBG(&rl)) { >>>> - char *s = ovn_flow_to_string(f); >>>> - VLOG_DBG("dropping duplicate flow: %s", s); >>>> - free(s); >>>> + if (log_duplicate_flow) { >>>> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, >>>> 5); >>>> + if (!VLOG_DROP_DBG(&rl)) { >>>> + char *s = ovn_flow_to_string(f); >>>> + VLOG_DBG("dropping duplicate flow: %s", s); >>>> + free(s); >>>> + } >>>> } >>>> - >>>> ovn_flow_destroy(f); >>>> return; >>>> } >>>> @@ -658,6 +661,14 @@ ofctrl_add_flow(struct hmap *desired_flows, >>>> hmap_insert(desired_flows, &f->hmap_node, f->hmap_node.hash); >>>> } >>>> >>>> +void >>>> +ofctrl_add_flow(struct hmap *desired_flows, >>>> + uint8_t table_id, uint16_t priority, uint64_t cookie, >>>> + const struct match *match, const struct ofpbuf >>>> *actions) >>>> +{ >>>> + ofctrl_check_and_add_flow(desired_flows, table_id, priority, >>>> cookie, >>>> + match, actions, true); >>>> +} >>>> >>>> /* ovn_flow. */ >>>> >>>> diff --git a/ovn/controller/ofctrl.h b/ovn/controller/ofctrl.h >>>> index e3fc95b05..ae0cfa513 100644 >>>> --- a/ovn/controller/ofctrl.h >>>> +++ b/ovn/controller/ofctrl.h >>>> @@ -55,4 +55,9 @@ void ofctrl_add_flow(struct hmap *desired_flows, >>>> uint8_t table_id, >>>> uint16_t priority, uint64_t cookie, >>>> const struct match *, const struct ofpbuf >>>> *ofpacts); >>>> >>>> +void ofctrl_check_and_add_flow(struct hmap *desired_flows, uint8_t >>>> table_id, >>>> + uint16_t priority, uint64_t cookie, >>>> + const struct match *, >>>> + const struct ofpbuf *ofpacts, >>>> + bool log_duplicate_flow); >>>> #endif /* ovn/ofctrl.h */ >>>> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c >>>> index c38d7b09f..d781ae459 100644 >>>> --- a/ovn/controller/physical.c >>>> +++ b/ovn/controller/physical.c >>>> @@ -189,7 +189,8 @@ get_zone_ids(const struct sbrec_port_binding >>>> *binding, >>>> >>>> static void >>>> put_local_common_flows(uint32_t dp_key, uint32_t port_key, >>>> - bool nested_container, const struct zone_ids >>>> *zone_ids, >>>> + uint32_t parent_port_key, >>>> + const struct zone_ids *zone_ids, >>>> struct ofpbuf *ofpacts_p, struct hmap >>>> *flow_table) >>>> { >>>> struct match match; >>>> @@ -253,7 +254,7 @@ put_local_common_flows(uint32_t dp_key, uint32_t >>>> port_key, >>>> ofpbuf_clear(ofpacts_p); >>>> match_set_metadata(&match, htonll(dp_key)); >>>> match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key); >>>> - if (!nested_container) { >>>> + if (!parent_port_key) { >>>> match_set_reg_masked(&match, MFF_LOG_FLAGS - MFF_REG0, >>>> MLF_ALLOW_LOOPBACK, MLF_ALLOW_LOOPBACK); >>>> } >>>> >>> I think before your patch, the above code would always save inport for >>> nested containers. They continue to do so now. I hope that was your >>> intention? >>> >>> >>> >>>> @@ -264,6 +265,25 @@ put_local_common_flows(uint32_t dp_key, uint32_t >>>> port_key, >>>> put_stack(MFF_IN_PORT, ofpact_put_STACK_POP(ofpacts_p)); >>>> ofctrl_add_flow(flow_table, OFTABLE_SAVE_INPORT, 100, 0, >>>> >>> >>> &match, ofpacts_p); >>>> >>> + >>>> >>> Can you add a comment here on what we are doing below? >>> >>> When a packet is headed to a nested container, the outport in OVN should >>> be that of the destination nested container. >>> But here it looks like we are matching on the OVN outport being the >>> parent port and saying that if the outport is parent port, we should save >>> the physical inport. >>> Is that what we are doing? I don't quite understand how this helps to >>> prevent packet from going back to the originator child port. I am sure I am >>> missing something simple here. >>> >>> + if (parent_port_key) { >>>> + match_init_catchall(&match); >>>> + ofpbuf_clear(ofpacts_p); >>>> + match_set_metadata(&match, htonll(dp_key)); >>>> + match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, >>>> parent_port_key); >>>> + match_set_reg_masked(&match, MFF_LOG_FLAGS - MFF_REG0, >>>> + MLF_NESTED_CONTAINER, >>>> MLF_NESTED_CONTAINER); >>>> + >>>> + put_stack(MFF_IN_PORT, ofpact_put_STACK_PUSH(ofpacts_p)); >>>> + put_load(0, MFF_IN_PORT, 0, 16, ofpacts_p); >>>> + put_resubmit(OFTABLE_LOG_TO_PHY, ofpacts_p); >>>> + put_stack(MFF_IN_PORT, ofpact_put_STACK_POP(ofpacts_p)); >>>> + /* When a parent port has multiple child ports, we don't want >>>> to >>>> + * log the duplicate flow message. >>>> + */ >>>> + ofctrl_check_and_add_flow(flow_table, OFTABLE_SAVE_INPORT, >>>> 100, 0, >>>> + &match, ofpacts_p, false); >>>> + } >>>> } >>>> >>>> static void >>>> @@ -328,7 +348,7 @@ consider_port_binding(struct ovsdb_idl_index >>>> *sbrec_chassis_by_name, >>>> } >>>> >>>> struct zone_ids binding_zones = get_zone_ids(binding, >>>> ct_zones); >>>> - put_local_common_flows(dp_key, port_key, false, &binding_zones, >>>> + put_local_common_flows(dp_key, port_key, 0, &binding_zones, >>>> ofpacts_p, flow_table); >>>> >>>> match_init_catchall(&match); >>>> @@ -452,6 +472,7 @@ consider_port_binding(struct ovsdb_idl_index >>>> *sbrec_chassis_by_name, >>>> >>>> int tag = 0; >>>> bool nested_container = false; >>>> + const struct sbrec_port_binding *parent_port = NULL; >>>> ofp_port_t ofport; >>>> bool is_remote = false; >>>> if (binding->parent_port && *binding->parent_port) { >>>> @@ -463,6 +484,8 @@ consider_port_binding(struct ovsdb_idl_index >>>> *sbrec_chassis_by_name, >>>> if (ofport) { >>>> tag = *binding->tag; >>>> nested_container = true; >>>> + parent_port = lport_lookup_by_name( >>>> + sbrec_port_binding_by_name, binding->parent_port); >>>> } >>>> } else { >>>> ofport = u16_to_ofp(simap_get(&localvif_to_ofport, >>>> @@ -523,7 +546,8 @@ consider_port_binding(struct ovsdb_idl_index >>>> *sbrec_chassis_by_name, >>>> */ >>>> >>>> struct zone_ids zone_ids = get_zone_ids(binding, ct_zones); >>>> - put_local_common_flows(dp_key, port_key, nested_container, >>>> &zone_ids, >>>> + uint32_t parent_port_key = parent_port ? >>>> parent_port->tunnel_key : 0; >>>> + put_local_common_flows(dp_key, port_key, parent_port_key, >>>> &zone_ids, >>>> ofpacts_p, flow_table); >>>> >>>> /* Table 0, Priority 150 and 100. >>>> @@ -553,8 +577,10 @@ consider_port_binding(struct ovsdb_idl_index >>>> *sbrec_chassis_by_name, >>>> if (nested_container) { >>>> /* When a packet comes from a container sitting behind >>>> a >>>> * parent_port, we should let it loopback to other >>>> containers >>>> - * or the parent_port itself. */ >>>> - put_load(MLF_ALLOW_LOOPBACK, MFF_LOG_FLAGS, 0, 1, >>>> ofpacts_p); >>>> + * or the parent_port itself. Indicate this by setting >>>> the >>>> + * MLF_NESTED_CONTAINER_BIT in MFF_LOG_FLAGS.*/ >>>> + put_load(1, MFF_LOG_FLAGS, MLF_NESTED_CONTAINER_BIT, 1, >>>> + ofpacts_p); >>>> } >>>> ofpact_put_STRIP_VLAN(ofpacts_p); >>>> } >>>> diff --git a/ovn/lib/logical-fields.h b/ovn/lib/logical-fields.h >>>> index b1dbb035a..ab31327e9 100644 >>>> --- a/ovn/lib/logical-fields.h >>>> +++ b/ovn/lib/logical-fields.h >>>> @@ -50,6 +50,7 @@ enum mff_log_flags_bits { >>>> MLF_FORCE_SNAT_FOR_DNAT_BIT = 2, >>>> MLF_FORCE_SNAT_FOR_LB_BIT = 3, >>>> MLF_LOCAL_ONLY_BIT = 4, >>>> + MLF_NESTED_CONTAINER_BIT = 5, >>>> }; >>>> >>>> /* MFF_LOG_FLAGS_REG flag assignments */ >>>> @@ -75,6 +76,9 @@ enum mff_log_flags { >>>> * hypervisors should instead only be output to local targets >>>> */ >>>> MLF_LOCAL_ONLY = (1 << MLF_LOCAL_ONLY_BIT), >>>> + >>>> + /* Indicate that a packet has received from a nested container. */ >>>> + MLF_NESTED_CONTAINER = (1 << MLF_NESTED_CONTAINER_BIT), >>>> }; >>>> >>>> #endif /* ovn/lib/logical-fields.h */ >>>> diff --git a/tests/ovn.at b/tests/ovn.at >>>> index 769e09f81..a7aba5ccd 100644 >>>> --- a/tests/ovn.at >>>> +++ b/tests/ovn.at >>>> @@ -7038,6 +7038,17 @@ >>>> packet=${dst_mac}${src_mac}8100000208004500001c000000003f110100${src_ip}${dst_ip >>>> echo $packet >> expected1 >>>> OVN_CHECK_PACKETS([hv1/vm1-tx.pcap], [expected1]) >>>> >>>> +# Send broadcast packet from foo1. foo1 should not receive the same >>>> packet. >>>> +src_mac="f00000010205" >>>> +dst_mac="ffffffffffff" >>>> +src_ip=`ip_to_hex 192 168 1 2` >>>> +dst_ip=`ip_to_hex 255 255 255 255` >>>> >>>> +packet=${dst_mac}${src_mac}8100000108004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 >>>> +as hv1 ovs-appctl netdev-dummy/receive vm1 $packet >>>> + >>>> +# expected packet at VM1 >>>> +OVN_CHECK_PACKETS([hv1/vm1-tx.pcap], [expected1]) >>>> + >>>> OVN_CLEANUP([hv1],[hv2]) >>>> >>>> AT_CLEANUP >>>> -- >>>> 2.17.1 >>>> >>>> _______________________________________________ >>>> dev mailing list >>>> d...@openvswitch.org >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> >>> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev