On Thu, Apr 30, 2020 at 2:58 PM Han Zhou <[email protected]> wrote:
>
> Thanks Maciej for testing.
>
> On Tue, Apr 28, 2020 at 5:36 AM Maciej Jozefczyk <[email protected]> wrote:
> > Here are my findings.
> >
> > 1) Test LB and sleep 1 second between calls:
> > ./get-data.py --lb-dest 172.24.4.58:60092 --sleep-time 1
> >
> > result: http://paste.openstack.org/show/792818/
> > Different backeds are selected and different buckets are being hit in 
> > group_id=3. Sometimes the bucket1 is hit, sometimes bucket0.
> > Output from groups dumps during the test: 
> > http://paste.openstack.org/show/792820/
> >
> >
> > 2) Test LB and sleep 60 second between calls:
> > ./get-data.py --lb-dest 172.24.4.58:60092 --sleep-time 60
> >
> > Result: http://paste.openstack.org/show/792822/
> > Output from group stats: http://paste.openstack.org/show/792823/
> > Always one bucket is hit (bucket0) and requests are pointed to the same 
> > backend.
> >
>
> This test result proved our earlier analysis: different hash values generated 
> between kernel fastpath and userspace slowpath was the culprit.
>
> >
> >
> > On Fri, Apr 24, 2020 at 6:09 PM Ilya Maximets <[email protected]> wrote:
> >>
> >> On 4/24/20 3:19 AM, Han Zhou wrote:
> >> > Based on the discussion in OVN meeting today I did some more testing, 
> >> > and here are my findings.
> >> >
> >> > - With ICMP (ping) between same source and destination it is always same 
> >> > bucket selected by dp_hash.
> >> > - With "nc" specifying same TCP 5-tuples, the packets can end up into 
> >> > different buckets. This is similar to what Numan and Maciej observed.
> >> >
> >> > However, I was using the OVN ECMP feature to test instead of LB. Since 
> >> > ECMP feature doesn't use conntrack, here are some more findings. The 
> >> > bucket selection changes only between 2 buckets, and the change happens 
> >> > when the packet datapath changes between userspace and kernel datapath. 
> >> > Let's say the first packet of a flow (megaflow) goes to userspace, it 
> >> > hits bucket1, and then if I send the second and more packets immediately 
> >> > they will all hit bucket2, but if I wait for a while until the flow 
> >> > disappears from the megaflow cache and then send the next packet, it 
> >> > will hit bucket1 again. This behavior is consistent.
> >> >
> >> > So I think it might be because of the different implementation of 
> >> > dp_hash in userspace and kernel datapath, the different buckets were 
> >> > selected (thanks Ilya for this hint in today's meeting).
> >> > Numan/Maciej, in your tests did you see more than 2 buckets hit for same 
> >> > 5-tuples? If the above theory is right, you should see at most 2 buckets 
> >> > hit. For LB, since it uses CT and only the first packet uses the group, 
> >> > all packets of the same flow would always be forwarded to same LB 
> >> > backend. I guess if you wait long enough between the tests, you should 
> >> > see all tests hitting same backend. It would be great if you could 
> >> > confirm this.
> >> >
> >> > For ECMP, this behavior will cause occasional packets out of order even 
> >> > for a single flow (for a burst of packets after some idle time), because 
> >> > CT is not used (and we can't use it because when peered with physical 
> >> > ECMP router groups we can't ensure the return traffic from physical 
> >> > routers hits same LR).
> >> >
> >> > For LB it causes the unexpected behavior that is reported in this thread.
> >> >
> >> > For the fix, I think we should figure out how to make sure dp_hash 
> >> > always uses same hash algorithm for both userspace and kernel 
> >> > implementation, if possible.
> >> > I am ok with the patch from Numan for the capability of configuring the 
> >> > desired hash method instead of always using default. However, using 
> >> > "hash" may be a huge performance sacrifice since the packets are always 
> >> > handled in slowpath, especially for ECMP. Even though LB uses CT, for 
> >> > short-lived flow scenario this is still a big performance penalty (for 
> >> > long lived flows of LB it may be ok since the majority of packets are 
> >> > still in fastpath).
> >> >
> >> > I am not familiar with the dp_hash implementation. I will do some more 
> >> > study, but any idea on how to ensure the consistency of dp_hash is 
> >> > highly appreciated!
> >>
> >> I had an impression that packet hash is routed from the datapath to 
> >> userspace
> >> and back, but it turned out that it's really recent change.  Seems like 
> >> following
> >> changes are required:
> >>
> >> 1. Linux kernel: bd1903b7c459 ("net: openvswitch: add hash info to upcall")
> >>    This is avaialble starting from upstream kernel v5.5.
> >>
> >> 2. OVS: 0442bfb11d6c ("ofproto-dpif-upcall: Echo HASH attribute back to 
> >> datapath.")
> >>    This is available on branch-2.13.
> >>
> >> With above two patches first and subsequent packets should have same 
> >> dp_hash
> >> calculated by the kernel datapath.
> >>
>
> Thanks Ilya for this information! Do you know why the kernel fix (1) was not 
> backported to OVS repo? Was it missed or on purpose? cc Tonghao Zhang.
> I had a test using ovn-fake-multinode, with OVS master, the problem 
> disappeared. However I am using an old kernel module and I don't think the 
> patch (1) is there.
My most patches were applied in linux kernel upstream, and I don't
know who backports that, but I am fine to backport it after holiday.
Thanks.
> >>
> >> >
> >> > Thanks,
> >> > Han
> >> >
> >> > On Tue, Apr 21, 2020 at 1:05 AM Daniel Alvarez Sanchez 
> >> > <[email protected] <mailto:[email protected]>> wrote:
> >> >>
> >> >> Thanks Numan for the investigation and the great explanation!
> >> >>
> >> >> On Tue, Apr 21, 2020 at 9:38 AM Numan Siddique <[email protected] 
> >> >> <mailto:[email protected]>> wrote:
> >> >>>
> >> >>> On Fri, Apr 17, 2020 at 12:56 PM Han Zhou <[email protected] 
> >> >>> <mailto:[email protected]>> wrote:
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > On Tue, Apr 7, 2020 at 7:03 AM Maciej Jozefczyk <[email protected] 
> >> >>> > <mailto:[email protected]>> wrote:
> >> >>> > >
> >> >>> > > Hello!
> >> >>> > >
> >> >>> > > I would like to ask you to clarify how the OVN Load balancing 
> >> >>> > > algorithm works.
> >> >>> > >
> >> >>> > > Based on the action [1]:
> >> >>> > > 1) If connection is alive the same 'backend' will be chosen,
> >> >>> > >
> >> >>> > > 2) If it is a new connection the backend will be chosen based on 
> >> >>> > > selection_method=dp_hash [2].
> >> >>> > > Based on changelog the dp_hash uses '5 tuple hash' [3].
> >> >>> > > The hash is calculated based on values: source and destination IP, 
> >> >>> > >  source port, protocol and arbitrary value - 42. [4]
> >> >>> > > Based on that information we could name it SOURCE_IP_PORT.
> >> >>> > >
> >> >>> > > Unfortunately we recently got a bug report in OVN Octavia provider 
> >> >>> > > driver project, that the Load Balancing in OVN
> >> >>> > > works differently [5]. The report shows even when the test uses 
> >> >>> > > the same source ip and port, but new TCP connection,
> >> >>> > > traffic is randomly distributed, but based on [2] it shouldn't?
> >> >>> > >
> >> >>> > > Is it a bug?  Is something else taken to account while creating a 
> >> >>> > > hash? Can it be fixed in OVS/OVN?
> >> >>> > >
> >> >>> > >
> >> >>> > >
> >> >>> > > Thanks,
> >> >>> > > Maciej
> >> >>> > >
> >> >>> > >
> >> >>> > > [1] 
> >> >>> > > https://github.com/ovn-org/ovn/blob/branch-20.03/lib/actions.c#L1017
> >> >>> > > [2] 
> >> >>> > > https://github.com/ovn-org/ovn/blob/branch-20.03/lib/actions.c#L1059
> >> >>> > > [3] 
> >> >>> > > https://github.com/openvswitch/ovs/blob/d58b59c17c70137aebdde37d3c01c26a26b28519/NEWS#L364-L371
> >> >>> > > [4] 
> >> >>> > > https://github.com/openvswitch/ovs/blob/74286173f4d7f51f78e9db09b07a6d4d65263252/lib/flow.c#L2217
> >> >>> > > [5] https://bugs.launchpad.net/neutron/+bug/1871239
> >> >>> > >
> >> >>> > > --
> >> >>> > > Best regards,
> >> >>> > > Maciej Józefczyk
> >> >>> > > _______________________________________________
> >> >>> > > discuss mailing list
> >> >>> > > [email protected] <mailto:[email protected]>
> >> >>> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >> >>> >
> >> >>> > Hi Maciej,
> >> >>> >
> >> >>> > Thanks for reporting. It is definitely strange that same 5-tuple 
> >> >>> > flow resulted in hitting different backends. I didn't observed such 
> >> >>> > behavior before (maybe I should try again myself to confirm). Can 
> >> >>> > you make sure during the testing the group bucket didn't change? You 
> >> >>> > can do so by:
> >> >>> > # ovs-ofctl dump-groups br-int
> >> >>> > and also check the group stats and see if multiple buckets has 
> >> >>> > counter increased during the test
> >> >>> > # ovs-ofctl dump-group-stats br-int [group]
> >> >>> >
> >> >>> > For the 5-tuple hash function you are seeing flow_hash_5tuple(), it 
> >> >>> > is using all the 5-tuples. It adds both ports (src and dst) at once:
> >> >>> >        /* Add both ports at once. */
> >> >>> >         hash = hash_add(hash,
> >> >>> >                         ((const uint32_t *)flow)[offsetof(struct 
> >> >>> > flow, tp_src)
> >> >>> >                                                  / 
> >> >>> > sizeof(uint32_t)]);
> >> >>> >
> >> >>> > The tp_src is the start of the offset, and the size is 32, meaning 
> >> >>> > both src and dst, each is 16 bits. (Although I am not sure if 
> >> >>> > dp_hash method is using this function or not. Need to check more 
> >> >>> > code)
> >> >>> >
> >> >>> > BTW, I am not sure why Neutron give it the name SOURCE_IP_PORT. 
> >> >>> > Shall it be called just 5-TUPLE, since protocol, destination IP and 
> >> >>> > PORT are also considered in the hash.
> >> >>> >
> >> >>>
> >> >>>
> >> >>> Hi Maciej and Han,
> >> >>>
> >> >>> I did some testing and I can confirm as you're saying. OVN is not
> >> >>> choosing the same backend with the src ip, src port fixed.
> >> >>>
> >> >>> I think there is an issue with OVN on how it is programming the group
> >> >>> flows.  OVN is setting the selection_method as dp_hash.
> >> >>> But when ovs-vswitchd receives the  GROUP_MOD openflow message, I
> >> >>> noticed that the selection_method is not set.
> >> >>> From the code I see that selection_method will be encoded only if
> >> >>> ovn-controller uses openflow version 1.5 [1]
> >> >>>
> >> >>> Since selection_method is NULL, vswitchd uses the dp_hash method [2].
> >> >>> dp_hash means it uses the hash calculated by
> >> >>> the datapath. In the case of kernel datapath, from what I understand
> >> >>> it uses skb_get_hash().
> >> >>>
> >> >>> I modified the vswitchd code to use the selection_method "hash" if
> >> >>> selection_method is not set. In this case the load balancer
> >> >>> works as expected. For a fixed src ip, src port, dst ip and dst port,
> >> >>> the group action is selecting the same bucket always. [3]
> >> >>>
> >> >>> I think we need to fix a few issues in OVN
> >> >>>   - Use openflow 1.5 so that ovn can set selection_method
> >> >>>  -  Use "hash" method if dp_hash is not choosing the same bucket for
> >> >>> 5-tuple hash.
> >> >>>   - May be provide the option for the CMS to choose an algorithm i.e.
> >> >>> to use dp_hash or hash.
> >> >>>
> >> >> I'd rather not expose this to the CMS as it depends on the datapath 
> >> >> implementation as per [0] but maybe it makes sense to eventually 
> >> >> abstract it to the CMS in a more LB-ish way (common algorithm names 
> >> >> used in load balancing) in the case at some point the LB feature is 
> >> >> enhanced somehow to support more algorithms.
> >> >>
> >> >> I believe that for OVN LB users, using OF 1.5 to force the use of 
> >> >> 'hash' would be the best solution now.
> >> >>
> >> >> My 2 cents as I'm not an LB expert.
> >> >>
> >> >> I also recall that we tested this in the past and seemed to be working. 
> >> >> I have been checking further in the doc [0] and found this paragraph:
> >> >>
> >> >> "If no selection method is specified, Open vSwitch up to release 2.9 
> >> >> applies the hash method with default fields. From 2.10 onwards Open 
> >> >> vSwitch defaults to the dp_hash method with symmetric L3/L4 hash 
> >> >> algorithm, unless the weighted group buck‐ ets cannot be mapped to a 
> >> >> maximum of 64 dp_hash values with sufficient accuracy. In those rare 
> >> >> cases Open vSwitch 2.10 and later fall back to the hash method with the 
> >> >> default set of hash fields."
> >> >>
> >> >> The explanation seems to be that when we tested the feature we relied 
> >> >> on OVS 2.9 and hence the confusion.
> >> >>
> >> >> Thanks a lot again!
> >> >> Daniel
> >> >>
> >> >> [0] http://www.openvswitch.org/support/dist-docs/ovs-ofctl.8.html
> >> >>
> >> >>>
> >> >>> I'll look into it on how to support this.
> >> >>>
> >> >>> [1] - 
> >> >>> https://github.com/openvswitch/ovs/blob/master/lib/ofp-group.c#L2120
> >> >>>        
> >> >>> https://github.com/openvswitch/ovs/blob/master/lib/ofp-group.c#L2082
> >> >>>
> >> >>> [2] - 
> >> >>> https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif.c#L5108
> >> >>> [3] - 
> >> >>> https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-xlate.c#L4553
> >> >>>
> >> >>>
> >> >>> Thanks
> >> >>> Numan
> >> >>>
> >> >>>
> >> >>> > Thanks,
> >> >>> > Han
> >> >>> > _______________________________________________
> >> >>> > discuss mailing list
> >> >>> > [email protected] <mailto:[email protected]>
> >> >>> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >> >>> _______________________________________________
> >> >>> discuss mailing list
> >> >>> [email protected] <mailto:[email protected]>
> >> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >>
> >> _______________________________________________
> >> discuss mailing list
> >> [email protected]
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
> >
> >
> > --
> > Best regards,
> > Maciej Józefczyk



-- 
Best regards, Tonghao
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to