Thanks, I applied this. I see a lot more test failures with a modified hash function. James, I'm surprised that you only see one failure on big-endian.
On Wed, Mar 06, 2019 at 09:35:03AM -0500, Mark Michelson wrote: > https://patchwork.ozlabs.org/patch/1052369/ > > On 3/6/19 8:59 AM, Mark Michelson wrote: > > OK, I think I see what the problem is now. > > > > The issue is with the order that ovn_ports are visited in > > join_logical_ports. If a logical switch port is connected to a logical > > router port, then if the logical switch port is visited first, > > everything works fine. If the logical router port is visited first, then > > it can result in duplicate addresses being assigned. As you pointed out, > > the hash function on a big endian system will be different, so ports are > > being visited in a different order on that system. However, this could > > just as easily occur on a little endian system as well, depending on the > > values being computed by the hash function. > > > > I will submit a fix as soon as possible. > > Thanks, > > Mark > > > > On 3/6/19 8:45 AM, Mark Michelson wrote: > > > Hi Ben, > > > > > > Thanks for reaching out. Right now it's not obvious why big endian > > > architecture would cause duplicate IPs to be assigned. My initial > > > thought was that the wrong byte ordering was being used when adding > > > IP addresses to IPAM, but it appears that host byte ordering is > > > being used consistently when performing the calculations. > > > > > > James, is that the only IPAM-related test failure you are seeing? If > > > so, that can help narrow where I need to look to fix this. Thanks. > > > > > > On 3/5/19 9:04 PM, Ben Pfaff wrote: > > > > [adding some folks who've worked on IPAM lately] > > > > > > > > On Tue, Mar 05, 2019 at 10:26:58AM +0000, James Page wrote: > > > > > Test 2768 is consistently failing on s390x in Ubuntu development. > > > > > > > > > > Note that this is the only big-endian architecture we build > > > > > for so I would > > > > > suspect something endian-y as the root cause. > > > > > > > > Yes, it's related to big-endian, in particular related to the OVS hash > > > > function, which produces different results depending on endianness. > > > > I can reproduce it on amd64 by changing the hash function: > > > > > > > > ---------------------------------------------------------------------- > > > > diff --git a/lib/hash.h b/lib/hash.h > > > > index eb3776500abe..c7fc6430c71d 100644 > > > > --- a/lib/hash.h > > > > +++ b/lib/hash.h > > > > @@ -1,5 +1,5 @@ > > > > /* > > > > - * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2014, 2016 Nicira, Inc. > > > > + * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2014, 2016, 2019 > > > > Nicira, Inc. > > > > * > > > > * Licensed under the Apache License, Version 2.0 (the "License"); > > > > * you may not use this file except in compliance with the License. > > > > @@ -87,6 +87,7 @@ static inline uint32_t mhash_finish(uint32_t hash) > > > > hash ^= hash >> 13; > > > > hash *= 0xc2b2ae35; > > > > hash ^= hash >> 16; > > > > + hash = ~hash; > > > > return hash; > > > > } > > > > @@ -184,7 +185,7 @@ static inline uint32_t hash_finish(uint64_t > > > > hash, uint64_t final) > > > > /* The finishing multiplier 0x805204f3 has been experimentally > > > > * derived to pass the testsuite hash tests. */ > > > > hash = _mm_crc32_u64(hash, final) * 0x805204f3; > > > > - return hash ^ (uint32_t)hash >> 16; /* Increase entropy in LSBs. */ > > > > + return ~(hash ^ (uint32_t)hash >> 16); /* Increase entropy > > > > in LSBs. */ > > > > } > > > > /* Returns the hash of the 'n' 32-bit words at 'p_', starting > > > > from 'basis'. > > > > ---------------------------------------------------------------------- > > > > > > > > The actual failure might be due to a more serious bug though. When I > > > > apply the following patch: > > > > > > > > ---------------------------------------------------------------------- > > > > diff --git a/tests/ovn.at b/tests/ovn.at > > > > index 2af225a678e1..a53204f544e9 100644 > > > > --- a/tests/ovn.at > > > > +++ b/tests/ovn.at > > > > @@ -12273,6 +12273,8 @@ for i in 2 3 4; do > > > > ovn-nbctl --wait=sb lsp-set-addresses swp$i > > > > "02:00:00:00:00:0$i dynamic" > > > > cidr=$(ovn-nbctl get logical_switch_port swp$i > > > > dynamic_addresses |cut -f2 -d' '|cut -f1 -d\") > > > > ovn-nbctl lrp-add ro$i rop$i 02:00:00:00:00:0$i $cidr/24 > > > > -- set logical_switch_port swp$i type=router > > > > options:router-port=rop$i addresses=router; > > > > + printf "\nadded swp$i\n" > > > > + ovn-nbctl list logical_switch_port > > > > AT_CHECK_UNQUOTED([ovn-nbctl get logical_router_port rop$i > > > > networks], [0], [[["192.168.1.$i/24"]] > > > > ]) > > > > done > > > > ---------------------------------------------------------------------- > > > > > > > > the test then reports that ports swp2 and swp3 are both being assigned > > > > dynamic address 192.168.1.2 (instead of 192.168.1.2 and 192.168.1.3, > > > > respectively), as you can see from the "dynamic_addresses" lines > > > > following "added swp3": > > > > > > > > ---------------------------------------------------------------------- > > > > 2772. ovn.at:12264: testing ovn -- ipam router ports ... > > > > creating ovn-sb database > > > > creating ovn-nb database > > > > starting ovn-northd > > > > starting backup ovn-northd > > > > > > > > added swp2 > > > > _uuid : d78add30-9d8e-4d49-bea6-8f4bc947ab3b > > > > addresses : [router] > > > > dhcpv4_options : [] > > > > dhcpv6_options : [] > > > > dynamic_addresses : "02:00:00:00:00:02 192.168.1.2" > > > > enabled : [] > > > > external_ids : {} > > > > name : "swp2" > > > > options : {router-port="rop2"} > > > > parent_name : [] > > > > port_security : [] > > > > tag : [] > > > > tag_request : [] > > > > type : router > > > > up : true > > > > ../../tests/ovn.at:12278: ovn-nbctl get logical_router_port > > > > rop$i networks > > > > > > > > added swp3 > > > > _uuid : d78add30-9d8e-4d49-bea6-8f4bc947ab3b > > > > addresses : [router] > > > > dhcpv4_options : [] > > > > dhcpv6_options : [] > > > > dynamic_addresses : "02:00:00:00:00:02 192.168.1.2" > > > > enabled : [] > > > > external_ids : {} > > > > name : "swp2" > > > > options : {router-port="rop2"} > > > > parent_name : [] > > > > port_security : [] > > > > tag : [] > > > > tag_request : [] > > > > type : router > > > > up : true > > > > > > > > _uuid : 0542f623-43a8-407d-adbe-0927999c78b1 > > > > addresses : [router] > > > > dhcpv4_options : [] > > > > dhcpv6_options : [] > > > > dynamic_addresses : "02:00:00:00:00:03 192.168.1.2" > > > > enabled : [] > > > > external_ids : {} > > > > name : "swp3" > > > > options : {router-port="rop3"} > > > > parent_name : [] > > > > port_security : [] > > > > tag : [] > > > > tag_request : [] > > > > type : router > > > > up : true > > > > ../../tests/ovn.at:12278: ovn-nbctl get logical_router_port > > > > rop$i networks > > > > --- - 2019-03-05 17:58:01.066306125 -0800 > > > > +++ > > > > /home/blp/nicira/ovs/_build/tests/testsuite.dir/at-groups/2772/stdout > > > > 2019-03-05 17:58:01.060040470 -0800 > > > > @@ -1,2 +1,2 @@ > > > > -["192.168.1.3/24"] > > > > +["192.168.1.2/24"] > > > > ---------------------------------------------------------------------- > > > > > > > > Mark and Lorenzo, this seems like something you should look into? > > > > > > > > Thanks, > > > > > > > > Ben. > > > > > > > > > > _______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
