On 4/24/20 3:19 AM, Han Zhou wrote:
> Based on the discussion in OVN meeting today I did some more testing, and 
> here are my findings.
> 
> - With ICMP (ping) between same source and destination it is always same 
> bucket selected by dp_hash.
> - With "nc" specifying same TCP 5-tuples, the packets can end up into 
> different buckets. This is similar to what Numan and Maciej observed.
> 
> However, I was using the OVN ECMP feature to test instead of LB. Since ECMP 
> feature doesn't use conntrack, here are some more findings. The bucket 
> selection changes only between 2 buckets, and the change happens when the 
> packet datapath changes between userspace and kernel datapath. Let's say the 
> first packet of a flow (megaflow) goes to userspace, it hits bucket1, and 
> then if I send the second and more packets immediately they will all hit 
> bucket2, but if I wait for a while until the flow disappears from the 
> megaflow cache and then send the next packet, it will hit bucket1 again. This 
> behavior is consistent.
> 
> So I think it might be because of the different implementation of dp_hash in 
> userspace and kernel datapath, the different buckets were selected (thanks 
> Ilya for this hint in today's meeting).
> Numan/Maciej, in your tests did you see more than 2 buckets hit for same 
> 5-tuples? If the above theory is right, you should see at most 2 buckets hit. 
> For LB, since it uses CT and only the first packet uses the group, all 
> packets of the same flow would always be forwarded to same LB backend. I 
> guess if you wait long enough between the tests, you should see all tests 
> hitting same backend. It would be great if you could confirm this.
> 
> For ECMP, this behavior will cause occasional packets out of order even for a 
> single flow (for a burst of packets after some idle time), because CT is not 
> used (and we can't use it because when peered with physical ECMP router 
> groups we can't ensure the return traffic from physical routers hits same LR).
> 
> For LB it causes the unexpected behavior that is reported in this thread.
> 
> For the fix, I think we should figure out how to make sure dp_hash always 
> uses same hash algorithm for both userspace and kernel implementation, if 
> possible.
> I am ok with the patch from Numan for the capability of configuring the 
> desired hash method instead of always using default. However, using "hash" 
> may be a huge performance sacrifice since the packets are always handled in 
> slowpath, especially for ECMP. Even though LB uses CT, for short-lived flow 
> scenario this is still a big performance penalty (for long lived flows of LB 
> it may be ok since the majority of packets are still in fastpath).
> 
> I am not familiar with the dp_hash implementation. I will do some more study, 
> but any idea on how to ensure the consistency of dp_hash is highly 
> appreciated!

I had an impression that packet hash is routed from the datapath to userspace
and back, but it turned out that it's really recent change.  Seems like 
following
changes are required:

1. Linux kernel: bd1903b7c459 ("net: openvswitch: add hash info to upcall")
   This is avaialble starting from upstream kernel v5.5.

2. OVS: 0442bfb11d6c ("ofproto-dpif-upcall: Echo HASH attribute back to 
datapath.")
   This is available on branch-2.13.

With above two patches first and subsequent packets should have same dp_hash
calculated by the kernel datapath.


> 
> Thanks,
> Han
> 
> On Tue, Apr 21, 2020 at 1:05 AM Daniel Alvarez Sanchez <[email protected] 
> <mailto:[email protected]>> wrote:
>>
>> Thanks Numan for the investigation and the great explanation!
>>
>> On Tue, Apr 21, 2020 at 9:38 AM Numan Siddique <[email protected] 
>> <mailto:[email protected]>> wrote:
>>>
>>> On Fri, Apr 17, 2020 at 12:56 PM Han Zhou <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> >
>>> >
>>> >
>>> > On Tue, Apr 7, 2020 at 7:03 AM Maciej Jozefczyk <[email protected] 
>>> > <mailto:[email protected]>> wrote:
>>> > >
>>> > > Hello!
>>> > >
>>> > > I would like to ask you to clarify how the OVN Load balancing algorithm 
>>> > > works.
>>> > >
>>> > > Based on the action [1]:
>>> > > 1) If connection is alive the same 'backend' will be chosen,
>>> > >
>>> > > 2) If it is a new connection the backend will be chosen based on 
>>> > > selection_method=dp_hash [2].
>>> > > Based on changelog the dp_hash uses '5 tuple hash' [3].
>>> > > The hash is calculated based on values: source and destination IP,  
>>> > > source port, protocol and arbitrary value - 42. [4]
>>> > > Based on that information we could name it SOURCE_IP_PORT.
>>> > >
>>> > > Unfortunately we recently got a bug report in OVN Octavia provider 
>>> > > driver project, that the Load Balancing in OVN
>>> > > works differently [5]. The report shows even when the test uses the 
>>> > > same source ip and port, but new TCP connection,
>>> > > traffic is randomly distributed, but based on [2] it shouldn't?
>>> > >
>>> > > Is it a bug?  Is something else taken to account while creating a hash? 
>>> > > Can it be fixed in OVS/OVN?
>>> > >
>>> > >
>>> > >
>>> > > Thanks,
>>> > > Maciej
>>> > >
>>> > >
>>> > > [1] https://github.com/ovn-org/ovn/blob/branch-20.03/lib/actions.c#L1017
>>> > > [2] https://github.com/ovn-org/ovn/blob/branch-20.03/lib/actions.c#L1059
>>> > > [3] 
>>> > > https://github.com/openvswitch/ovs/blob/d58b59c17c70137aebdde37d3c01c26a26b28519/NEWS#L364-L371
>>> > > [4] 
>>> > > https://github.com/openvswitch/ovs/blob/74286173f4d7f51f78e9db09b07a6d4d65263252/lib/flow.c#L2217
>>> > > [5] https://bugs.launchpad.net/neutron/+bug/1871239
>>> > >
>>> > > --
>>> > > Best regards,
>>> > > Maciej Józefczyk
>>> > > _______________________________________________
>>> > > discuss mailing list
>>> > > [email protected] <mailto:[email protected]>
>>> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>> >
>>> > Hi Maciej,
>>> >
>>> > Thanks for reporting. It is definitely strange that same 5-tuple flow 
>>> > resulted in hitting different backends. I didn't observed such behavior 
>>> > before (maybe I should try again myself to confirm). Can you make sure 
>>> > during the testing the group bucket didn't change? You can do so by:
>>> > # ovs-ofctl dump-groups br-int
>>> > and also check the group stats and see if multiple buckets has counter 
>>> > increased during the test
>>> > # ovs-ofctl dump-group-stats br-int [group]
>>> >
>>> > For the 5-tuple hash function you are seeing flow_hash_5tuple(), it is 
>>> > using all the 5-tuples. It adds both ports (src and dst) at once:
>>> >        /* Add both ports at once. */
>>> >         hash = hash_add(hash,
>>> >                         ((const uint32_t *)flow)[offsetof(struct flow, 
>>> > tp_src)
>>> >                                                  / sizeof(uint32_t)]);
>>> >
>>> > The tp_src is the start of the offset, and the size is 32, meaning both 
>>> > src and dst, each is 16 bits. (Although I am not sure if dp_hash method 
>>> > is using this function or not. Need to check more code)
>>> >
>>> > BTW, I am not sure why Neutron give it the name SOURCE_IP_PORT. Shall it 
>>> > be called just 5-TUPLE, since protocol, destination IP and PORT are also 
>>> > considered in the hash.
>>> >
>>>
>>>
>>> Hi Maciej and Han,
>>>
>>> I did some testing and I can confirm as you're saying. OVN is not
>>> choosing the same backend with the src ip, src port fixed.
>>>
>>> I think there is an issue with OVN on how it is programming the group
>>> flows.  OVN is setting the selection_method as dp_hash.
>>> But when ovs-vswitchd receives the  GROUP_MOD openflow message, I
>>> noticed that the selection_method is not set.
>>> From the code I see that selection_method will be encoded only if
>>> ovn-controller uses openflow version 1.5 [1]
>>>
>>> Since selection_method is NULL, vswitchd uses the dp_hash method [2].
>>> dp_hash means it uses the hash calculated by
>>> the datapath. In the case of kernel datapath, from what I understand
>>> it uses skb_get_hash().
>>>
>>> I modified the vswitchd code to use the selection_method "hash" if
>>> selection_method is not set. In this case the load balancer
>>> works as expected. For a fixed src ip, src port, dst ip and dst port,
>>> the group action is selecting the same bucket always. [3]
>>>
>>> I think we need to fix a few issues in OVN
>>>   - Use openflow 1.5 so that ovn can set selection_method
>>>  -  Use "hash" method if dp_hash is not choosing the same bucket for
>>> 5-tuple hash.
>>>   - May be provide the option for the CMS to choose an algorithm i.e.
>>> to use dp_hash or hash.
>>>
>> I'd rather not expose this to the CMS as it depends on the datapath 
>> implementation as per [0] but maybe it makes sense to eventually abstract it 
>> to the CMS in a more LB-ish way (common algorithm names used in load 
>> balancing) in the case at some point the LB feature is enhanced somehow to 
>> support more algorithms.
>>
>> I believe that for OVN LB users, using OF 1.5 to force the use of 'hash' 
>> would be the best solution now.
>>
>> My 2 cents as I'm not an LB expert.
>>
>> I also recall that we tested this in the past and seemed to be working. I 
>> have been checking further in the doc [0] and found this paragraph:
>>
>> "If no selection method is specified, Open vSwitch up to release 2.9 applies 
>> the hash method with default fields. From 2.10 onwards Open vSwitch defaults 
>> to the dp_hash method with symmetric L3/L4 hash algorithm, unless the 
>> weighted group buck‐ ets cannot be mapped to a maximum of 64 dp_hash values 
>> with sufficient accuracy. In those rare cases Open vSwitch 2.10 and later 
>> fall back to the hash method with the default set of hash fields."
>>
>> The explanation seems to be that when we tested the feature we relied on OVS 
>> 2.9 and hence the confusion.
>>
>> Thanks a lot again!
>> Daniel
>>
>> [0] http://www.openvswitch.org/support/dist-docs/ovs-ofctl.8.html
>>  
>>>
>>> I'll look into it on how to support this.
>>>
>>> [1] - https://github.com/openvswitch/ovs/blob/master/lib/ofp-group.c#L2120
>>>        https://github.com/openvswitch/ovs/blob/master/lib/ofp-group.c#L2082
>>>
>>> [2] - 
>>> https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif.c#L5108
>>> [3] - 
>>> https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-xlate.c#L4553
>>>
>>>
>>> Thanks
>>> Numan
>>>
>>>
>>> > Thanks,
>>> > Han
>>> > _______________________________________________
>>> > discuss mailing list
>>> > [email protected] <mailto:[email protected]>
>>> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>> _______________________________________________
>>> discuss mailing list
>>> [email protected] <mailto:[email protected]>
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to