Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-02-12 Thread Vishal Deep Ajmera via dev
> > I am inclined to use option 2 to have per-pmd cmap of bonds however,
> > I see that even in this option we will need to have one cmap (or hmap)
> > of bonds at global datapath level. This will take care of
> > reconfigure_pmd_threads
> > scenario. New PMDs needs to be configured with bond cmap from global
> map.
>
> OK. Make sense for a newly created threads.
>
Hi Ilya,

I have posted new patch addressing your comments. Let me know if it looks ok 
now.

Warm Regards,
Vishal Ajmera
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-02-06 Thread Ilya Maximets
On 2/6/20 7:39 AM, Vishal Deep Ajmera wrote:
>>>
>>> So, the root cause of all the issues in this patch, in my understanding,
>>> is the fact that you need to collect statistics for all the bond hashes
>>> in order to be able to rebalance traffic.  This forces you to have access
>>> to PMD local caches.
>>>
>>> The basic idea how to overcome this issue is to not have PMD local bond
>>> cache, but have an RCU-protected data structure instead.
>>>
>>> Memory layout scheme in current patch consists of 3 layers:
>>>   1. Global hash map for all bonds. One for the whole datapath.
>>>  Protected by dp->bond_mutex.
>>>   2. Hash map of all the bonds. One for each PMD thread.
>>>  Protected by pmd->bond_mutex.
>>>   3. Hash map of all the bonds. Local copy that could be used
>>>  lockless, but only by the owning PMD thread.
>>>
>>> Suggested layout #1:
>>>   Single global concurrent hash map (cmap) for all bonds. One for the 
>>> whole
>>>   datapath.  Bond addition/deletion protected by the dp->bond_mutex.
>>>   Reads are lockless since protected by RCU.  Statistics updates must be
>>>   fully atomic (i.e. atomic_add_relaxed()).
>>>
>>> Suggested layout #2:
>>>   One cmap for each PMD thread (no global one).  Bond addition/deletion
>>>   protected by the pmd->bond_mutex.  Reads are lockless since protected
>>>   by RCU.  Statistics updates should be atomic in terms of reads and 
>>> writes.
>>>   (non_atomic_ullong_add() function could be used).
>>>   (This is similar to how we store per-PMD flow tables.)
>>>
>>> #1 will consume a lot less memory, but could scale worse in case of too 
>>> many
>>> threads trying to send traffic to the same bond port.  #2 might be a bit
>>> faster and more scalable in terms of performance, but less efficient in
>>> memory consumption and might be slower in terms of response to slave
>> updates
>>> since we have to update hash maps on all the threads.
>>> Both solutions doesn't require any reconfiguration of running PMD threads.
>>>
>>> Note that to update cmap entry, you will likely need to prepare the new 
>>> cmap
>>> node and use a cmap_replace().  Statistics copy in this case might be a 
>>> bit
>>> tricky because you may lost part of additions within the period while PMD
>>> threads are still using the old entry. To avoid that, statistics copying
>>> should be RCU-postponed.  However, I'm not sure if we need highly accurate
>>> stats there.
>>>
> 
> Hi Ilya,
> 
> I am inclined to use option 2 to have per-pmd cmap of bonds however,
> I see that even in this option we will need to have one cmap (or hmap)
> of bonds at global datapath level. This will take care of 
> reconfigure_pmd_threads
> scenario. New PMDs needs to be configured with bond cmap from global map.

OK. Make sense for a newly created threads.


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-02-06 Thread Vishal Deep Ajmera via dev
> >
> > So, the root cause of all the issues in this patch, in my understanding,
> > is the fact that you need to collect statistics for all the bond hashes
> > in order to be able to rebalance traffic.  This forces you to have access
> > to PMD local caches.
> >
> > The basic idea how to overcome this issue is to not have PMD local bond
> > cache, but have an RCU-protected data structure instead.
> >
> > Memory layout scheme in current patch consists of 3 layers:
> >   1. Global hash map for all bonds. One for the whole datapath.
> >  Protected by dp->bond_mutex.
> >   2. Hash map of all the bonds. One for each PMD thread.
> >  Protected by pmd->bond_mutex.
> >   3. Hash map of all the bonds. Local copy that could be used
> >  lockless, but only by the owning PMD thread.
> >
> > Suggested layout #1:
> >   Single global concurrent hash map (cmap) for all bonds. One for the 
> > whole
> >   datapath.  Bond addition/deletion protected by the dp->bond_mutex.
> >   Reads are lockless since protected by RCU.  Statistics updates must be
> >   fully atomic (i.e. atomic_add_relaxed()).
> >
> > Suggested layout #2:
> >   One cmap for each PMD thread (no global one).  Bond addition/deletion
> >   protected by the pmd->bond_mutex.  Reads are lockless since protected
> >   by RCU.  Statistics updates should be atomic in terms of reads and 
> > writes.
> >   (non_atomic_ullong_add() function could be used).
> >   (This is similar to how we store per-PMD flow tables.)
> >
> > #1 will consume a lot less memory, but could scale worse in case of too 
> > many
> > threads trying to send traffic to the same bond port.  #2 might be a bit
> > faster and more scalable in terms of performance, but less efficient in
> > memory consumption and might be slower in terms of response to slave
> updates
> > since we have to update hash maps on all the threads.
> > Both solutions doesn't require any reconfiguration of running PMD threads.
> >
> > Note that to update cmap entry, you will likely need to prepare the new 
> > cmap
> > node and use a cmap_replace().  Statistics copy in this case might be a 
> > bit
> > tricky because you may lost part of additions within the period while PMD
> > threads are still using the old entry. To avoid that, statistics copying
> > should be RCU-postponed.  However, I'm not sure if we need highly accurate
> > stats there.
> >

Hi Ilya,

I am inclined to use option 2 to have per-pmd cmap of bonds however,
I see that even in this option we will need to have one cmap (or hmap)
of bonds at global datapath level. This will take care of 
reconfigure_pmd_threads
scenario. New PMDs needs to be configured with bond cmap from global map.

Warm Regards,
Vishal Ajmera

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-01-30 Thread Vishal Deep Ajmera via dev
>
> So, the root cause of all the issues in this patch, in my understanding,
> is the fact that you need to collect statistics for all the bond hashes
> in order to be able to rebalance traffic.  This forces you to have access
> to PMD local caches.
>
> The basic idea how to overcome this issue is to not have PMD local bond
> cache, but have an RCU-protected data structure instead.
>
> Memory layout scheme in current patch consists of 3 layers:
>   1. Global hash map for all bonds. One for the whole datapath.
>  Protected by dp->bond_mutex.
>   2. Hash map of all the bonds. One for each PMD thread.
>  Protected by pmd->bond_mutex.
>   3. Hash map of all the bonds. Local copy that could be used
>  lockless, but only by the owning PMD thread.
>
> Suggested layout #1:
>   Single global concurrent hash map (cmap) for all bonds. One for the whole
>   datapath.  Bond addition/deletion protected by the dp->bond_mutex.
>   Reads are lockless since protected by RCU.  Statistics updates must be
>   fully atomic (i.e. atomic_add_relaxed()).
>
> Suggested layout #2:
>   One cmap for each PMD thread (no global one).  Bond addition/deletion
>   protected by the pmd->bond_mutex.  Reads are lockless since protected
>   by RCU.  Statistics updates should be atomic in terms of reads and writes.
>   (non_atomic_ullong_add() function could be used).
>   (This is similar to how we store per-PMD flow tables.)
>
> #1 will consume a lot less memory, but could scale worse in case of too many
> threads trying to send traffic to the same bond port.  #2 might be a bit
> faster and more scalable in terms of performance, but less efficient in
> memory consumption and might be slower in terms of response to slave updates
> since we have to update hash maps on all the threads.
> Both solutions doesn't require any reconfiguration of running PMD threads.
>
> Note that to update cmap entry, you will likely need to prepare the new cmap
> node and use a cmap_replace().  Statistics copy in this case might be a bit
> tricky because you may lost part of additions within the period while PMD
> threads are still using the old entry. To avoid that, statistics copying
> should be RCU-postponed.  However, I'm not sure if we need highly accurate
> stats there.
>
> Any thoughts and suggestion (alternative solutions) are welcome to discuss.

Thanks Ilya. I will have a look at it and implement it in the next patch-set.

Warm Regards,
Vishal Ajmera

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-01-28 Thread Ilya Maximets
On 20.01.2020 17:09, Ilya Maximets wrote:
> On 08.01.2020 06:18, Vishal Deep Ajmera wrote:
>> Problem:
>> 
>> In OVS-DPDK, flows with output over a bond interface of type “balance-tcp”
>> (using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer into
>> "HASH" and "RECIRC" datapath actions. After recirculation, the packet is
>> forwarded to the bond member port based on 8-bits of the datapath hash
>> value computed through dp_hash. This causes performance degradation in the
>> following ways:
>>
>> 1. The recirculation of the packet implies another lookup of the packet’s
>> flow key in the exact match cache (EMC) and potentially Megaflow classifier
>> (DPCLS). This is the biggest cost factor.
>>
>> 2. The recirculated packets have a new “RSS” hash and compete with the
>> original packets for the scarce number of EMC slots. This implies more
>> EMC misses and potentially EMC thrashing causing costly DPCLS lookups.
>>
>> 3. The 256 extra megaflow entries per bond for dp_hash bond selection put
>> additional load on the revalidation threads.
>>
>> Owing to this performance degradation, deployments stick to “balance-slb”
>> bond mode even though it does not do active-active load balancing for
>> VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same
>> source MAC address.
>>
>> Proposed optimization:
>> --
>> This proposal introduces a new load-balancing output action instead of
>> recirculation.
>>
>> Maintain one table per-bond (could just be an array of uint16's) and
>> program it the same way internal flows are created today for each possible
>> hash value(256 entries) from ofproto layer. Use this table to load-balance
>> flows as part of output action processing.
>>
>> Currently xlate_normal() -> output_normal() -> 
>> bond_update_post_recirc_rules()
>> -> bond_may_recirc() and compose_output_action__() generate
>> “dp_hash(hash_l4(0))” and “recirc()” actions. In this case the
>> RecircID identifies the bond. For the recirculated packets the ofproto layer
>> installs megaflow entries that match on RecircID and masked dp_hash and send
>> them to the corresponding output port.
>>
>> Instead, we will now generate action as
>> "lb_output(bond,)"
>>
>> This combines hash computation (only if needed, else re-use RSS hash) and
>> inline load-balancing over the bond. This action is used *only* for 
>> balance-tcp
>> bonds in OVS-DPDK datapath (the OVS kernel datapath remains unchanged).
>>
>> Example:
>> 
>> Current scheme:
>> ---
>> With 1 IP-UDP flow:
>>
>> flow-dump from pmd on cpu core: 2
>> recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no),
>>  packets:2828969, bytes:181054016, used:0.000s, 
>> actions:hash(hash_l4(0)),recirc(0x1)
>>
>> recirc_id(0x1),dp_hash(0x113683bd/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>>  packets:2828937, bytes:181051968, used:0.000s, actions:2
>>
>> With 8 IP-UDP flows (with random UDP src port): (all hitting same DPCL):
>>
>> flow-dump from pmd on cpu core: 2
>> recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no),
>>  packets:2674009, bytes:171136576, used:0.000s, 
>> actions:hash(hash_l4(0)),recirc(0x1)
>>
>> recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>>  packets:377395, bytes:24153280, used:0.000s, actions:2
>> recirc_id(0x1),dp_hash(0xb236c260/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>>  packets:333486, bytes:21343104, used:0.000s, actions:1
>> recirc_id(0x1),dp_hash(0x7d89eb18/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>>  packets:348461, bytes:22301504, used:0.000s, actions:1
>> recirc_id(0x1),dp_hash(0xa78d75df/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>>  packets:633353, bytes:40534592, used:0.000s, actions:2
>> recirc_id(0x1),dp_hash(0xb58d846f/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>>  packets:319901, bytes:20473664, used:0.001s, actions:2
>> recirc_id(0x1),dp_hash(0x24534406/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>>  packets:334985, bytes:21439040, used:0.001s, actions:1
>> recirc_id(0x1),dp_hash(0x3cf32550/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>>  packets:326404, bytes:20889856, used:0.001s, actions:1
>>
>> New scheme:
>> ---
>> We can do with a single flow entry (for any number of new flows):
>>
>> in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no),
>>  packets:2674009, bytes:171136576, used:0.000s, actions:lb_output(bond,1)
>>
>> A new CLI has been added to dump datapath bond cache as given below.
>>
>> “sudo ovs-appctl dpif-netdev/dp-bond-show [dp]”
>>
>> 

Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-01-22 Thread Ilya Maximets
On 21.01.2020 21:54, Ben Pfaff wrote:
> On Mon, Jan 20, 2020 at 05:09:05PM +0100, Ilya Maximets wrote:
>>> +static struct tx_bond *
>>> +tx_bond_lookup(const struct hmap *hmap, uint32_t bond_id)
>>> +{
>>> +struct tx_bond *tx;
>>> +
>>> +HMAP_FOR_EACH_IN_BUCKET (tx, node, hash_bond_id(bond_id), hmap) {
>>
>> Why not HMAP_FOR_EACH_WITH_HASH ?
> 
> HMAP_FOR_EACH_IN_BUCKET is a small optimization over
> HMAP_FOR_EACH_WITH_HASH in cases where the comparison of the key is
> cheap.  Comparing the hash is an optimization if the key is expensive to
> compare (like an arbitrary string) but it just costs an extra integer
> comparison if the key is a small integer as here.

OK.  Thanks for explanation.
 
>> BTW, it's better to calculate hash before the loop to be sure that
>> we're not recalculating it on each iteration.
> 
> A lot of code assumes that the hash only gets evaluated once.
> 

Sure.  And there is no real issue here, but this just catches my eyes
for some reason.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-01-21 Thread Ben Pfaff
On Mon, Jan 20, 2020 at 05:09:05PM +0100, Ilya Maximets wrote:
> > +static struct tx_bond *
> > +tx_bond_lookup(const struct hmap *hmap, uint32_t bond_id)
> > +{
> > +struct tx_bond *tx;
> > +
> > +HMAP_FOR_EACH_IN_BUCKET (tx, node, hash_bond_id(bond_id), hmap) {
> 
> Why not HMAP_FOR_EACH_WITH_HASH ?

HMAP_FOR_EACH_IN_BUCKET is a small optimization over
HMAP_FOR_EACH_WITH_HASH in cases where the comparison of the key is
cheap.  Comparing the hash is an optimization if the key is expensive to
compare (like an arbitrary string) but it just costs an extra integer
comparison if the key is a small integer as here.

> BTW, it's better to calculate hash before the loop to be sure that
> we're not recalculating it on each iteration.

A lot of code assumes that the hash only gets evaluated once.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-01-20 Thread Ilya Maximets
On 08.01.2020 06:18, Vishal Deep Ajmera wrote:
> Problem:
> 
> In OVS-DPDK, flows with output over a bond interface of type “balance-tcp”
> (using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer into
> "HASH" and "RECIRC" datapath actions. After recirculation, the packet is
> forwarded to the bond member port based on 8-bits of the datapath hash
> value computed through dp_hash. This causes performance degradation in the
> following ways:
> 
> 1. The recirculation of the packet implies another lookup of the packet’s
> flow key in the exact match cache (EMC) and potentially Megaflow classifier
> (DPCLS). This is the biggest cost factor.
> 
> 2. The recirculated packets have a new “RSS” hash and compete with the
> original packets for the scarce number of EMC slots. This implies more
> EMC misses and potentially EMC thrashing causing costly DPCLS lookups.
> 
> 3. The 256 extra megaflow entries per bond for dp_hash bond selection put
> additional load on the revalidation threads.
> 
> Owing to this performance degradation, deployments stick to “balance-slb”
> bond mode even though it does not do active-active load balancing for
> VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same
> source MAC address.
> 
> Proposed optimization:
> --
> This proposal introduces a new load-balancing output action instead of
> recirculation.
> 
> Maintain one table per-bond (could just be an array of uint16's) and
> program it the same way internal flows are created today for each possible
> hash value(256 entries) from ofproto layer. Use this table to load-balance
> flows as part of output action processing.
> 
> Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules()
> -> bond_may_recirc() and compose_output_action__() generate
> “dp_hash(hash_l4(0))” and “recirc()” actions. In this case the
> RecircID identifies the bond. For the recirculated packets the ofproto layer
> installs megaflow entries that match on RecircID and masked dp_hash and send
> them to the corresponding output port.
> 
> Instead, we will now generate action as
> "lb_output(bond,)"
> 
> This combines hash computation (only if needed, else re-use RSS hash) and
> inline load-balancing over the bond. This action is used *only* for 
> balance-tcp
> bonds in OVS-DPDK datapath (the OVS kernel datapath remains unchanged).
> 
> Example:
> 
> Current scheme:
> ---
> With 1 IP-UDP flow:
> 
> flow-dump from pmd on cpu core: 2
> recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no),
>  packets:2828969, bytes:181054016, used:0.000s, 
> actions:hash(hash_l4(0)),recirc(0x1)
> 
> recirc_id(0x1),dp_hash(0x113683bd/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:2828937, bytes:181051968, used:0.000s, actions:2
> 
> With 8 IP-UDP flows (with random UDP src port): (all hitting same DPCL):
> 
> flow-dump from pmd on cpu core: 2
> recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no),
>  packets:2674009, bytes:171136576, used:0.000s, 
> actions:hash(hash_l4(0)),recirc(0x1)
> 
> recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:377395, bytes:24153280, used:0.000s, actions:2
> recirc_id(0x1),dp_hash(0xb236c260/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:333486, bytes:21343104, used:0.000s, actions:1
> recirc_id(0x1),dp_hash(0x7d89eb18/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:348461, bytes:22301504, used:0.000s, actions:1
> recirc_id(0x1),dp_hash(0xa78d75df/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:633353, bytes:40534592, used:0.000s, actions:2
> recirc_id(0x1),dp_hash(0xb58d846f/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:319901, bytes:20473664, used:0.001s, actions:2
> recirc_id(0x1),dp_hash(0x24534406/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:334985, bytes:21439040, used:0.001s, actions:1
> recirc_id(0x1),dp_hash(0x3cf32550/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:326404, bytes:20889856, used:0.001s, actions:1
> 
> New scheme:
> ---
> We can do with a single flow entry (for any number of new flows):
> 
> in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no),
>  packets:2674009, bytes:171136576, used:0.000s, actions:lb_output(bond,1)
> 
> A new CLI has been added to dump datapath bond cache as given below.
> 
> “sudo ovs-appctl dpif-netdev/dp-bond-show [dp]”
> 
> root@ubuntu-190:performance_scripts # sudo ovs-appctl dpif-netdev/dp-bond-show
> Bond cache:
> bond-id 1 :
> bucket 0 - slave 2
>   

Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-01-20 Thread Vishal Deep Ajmera via dev
> >
> > Thanks for the patch!
> >
> > I have a few minor stylistic suggestions, see below.
> >
> > I'd like to hear Ilya's opinion on this.
>
> Thanks Ben for review!  I'll take another look at this patch in a next
> couple of days.
>
> >

Hi Ilya, Ben,

Let me know if you have any comments. Can we get this patch reviewed/merged 
before the branching is done?

Warm Regards,
Vishal Ajmera

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-01-20 Thread Ilya Maximets
On 20.01.2020 10:00, Vishal Deep Ajmera wrote:
>>>
>>> Thanks for the patch!
>>>
>>> I have a few minor stylistic suggestions, see below.
>>>
>>> I'd like to hear Ilya's opinion on this.
>>
>> Thanks Ben for review!  I'll take another look at this patch in a next
>> couple of days.
>>
>>>
> 
> Hi Ilya, Ben,
> 
> Let me know if you have any comments. Can we get this patch reviewed/merged 
> before the branching is done?

Sorry it takes so long.  Release time is very busy.
I'll try to look at this patch today.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-01-09 Thread Ilya Maximets
On 08.01.2020 21:33, Ben Pfaff wrote:
> On Wed, Jan 08, 2020 at 10:48:04AM +0530, Vishal Deep Ajmera wrote:
>> Problem:
>> 
>> In OVS-DPDK, flows with output over a bond interface of type “balance-tcp”
>> (using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer into
>> "HASH" and "RECIRC" datapath actions. After recirculation, the packet is
>> forwarded to the bond member port based on 8-bits of the datapath hash
>> value computed through dp_hash. This causes performance degradation in the
>> following ways:
> 
> Thanks for the patch!
> 
> I have a few minor stylistic suggestions, see below.
> 
> I'd like to hear Ilya's opinion on this.

Thanks Ben for review!  I'll take another look at this patch in a next
couple of days.

> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 78f9cf928de2..c3a54e96346e 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -378,11 +378,12 @@ struct dp_netdev {
>  
>  struct conntrack *conntrack;
>  struct pmd_auto_lb pmd_alb;
> +
>  /* Bonds.
>   *
>   * Any lookup into 'bonds' requires taking 'bond_mutex'. */
>  struct ovs_mutex bond_mutex;
> -struct hmap bonds;
> +struct hmap bonds;  /* Contains "struct tx_bond"s. */
>  };
>  
>  static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
> @@ -5581,10 +5582,7 @@ pmd_free_cached_bonds(struct dp_netdev_pmd_thread *pmd)
>  
>  /* Remove bonds from pmd which no longer exists. */
>  HMAP_FOR_EACH_SAFE (bond, next, node, >bond_cache) {
> -struct tx_bond *tx = NULL;
> -
> -tx = tx_bond_lookup(>tx_bonds, bond->bond_id);
> -if (!tx) {
> +if (!tx_bond_lookup(>tx_bonds, bond->bond_id)) {
>  /* Bond no longer exist. Delete it from pmd. */
>  hmap_remove(>bond_cache, >node);
>  free(bond);
> @@ -5601,10 +5599,11 @@ pmd_load_cached_bonds(struct dp_netdev_pmd_thread 
> *pmd)
>  pmd_free_cached_bonds(pmd);
>  hmap_shrink(>bond_cache);
>  
> -struct tx_bond *tx_bond, *tx_bond_cached;
> +struct tx_bond *tx_bond;
>  HMAP_FOR_EACH (tx_bond, node, >tx_bonds) {
>  /* Check if bond already exist on pmd. */
> -tx_bond_cached = tx_bond_lookup(>bond_cache, tx_bond->bond_id);
> +struct tx_bond *tx_bond_cached
> += tx_bond_lookup(>bond_cache, tx_bond->bond_id);
>  
>  if (!tx_bond_cached) {
>  /* Create new bond entry in cache. */
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-01-08 Thread Ben Pfaff
On Wed, Jan 08, 2020 at 10:48:04AM +0530, Vishal Deep Ajmera wrote:
> Problem:
> 
> In OVS-DPDK, flows with output over a bond interface of type “balance-tcp”
> (using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer into
> "HASH" and "RECIRC" datapath actions. After recirculation, the packet is
> forwarded to the bond member port based on 8-bits of the datapath hash
> value computed through dp_hash. This causes performance degradation in the
> following ways:

Thanks for the patch!

I have a few minor stylistic suggestions, see below.

I'd like to hear Ilya's opinion on this.

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 78f9cf928de2..c3a54e96346e 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -378,11 +378,12 @@ struct dp_netdev {
 
 struct conntrack *conntrack;
 struct pmd_auto_lb pmd_alb;
+
 /* Bonds.
  *
  * Any lookup into 'bonds' requires taking 'bond_mutex'. */
 struct ovs_mutex bond_mutex;
-struct hmap bonds;
+struct hmap bonds;  /* Contains "struct tx_bond"s. */
 };
 
 static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
@@ -5581,10 +5582,7 @@ pmd_free_cached_bonds(struct dp_netdev_pmd_thread *pmd)
 
 /* Remove bonds from pmd which no longer exists. */
 HMAP_FOR_EACH_SAFE (bond, next, node, >bond_cache) {
-struct tx_bond *tx = NULL;
-
-tx = tx_bond_lookup(>tx_bonds, bond->bond_id);
-if (!tx) {
+if (!tx_bond_lookup(>tx_bonds, bond->bond_id)) {
 /* Bond no longer exist. Delete it from pmd. */
 hmap_remove(>bond_cache, >node);
 free(bond);
@@ -5601,10 +5599,11 @@ pmd_load_cached_bonds(struct dp_netdev_pmd_thread *pmd)
 pmd_free_cached_bonds(pmd);
 hmap_shrink(>bond_cache);
 
-struct tx_bond *tx_bond, *tx_bond_cached;
+struct tx_bond *tx_bond;
 HMAP_FOR_EACH (tx_bond, node, >tx_bonds) {
 /* Check if bond already exist on pmd. */
-tx_bond_cached = tx_bond_lookup(>bond_cache, tx_bond->bond_id);
+struct tx_bond *tx_bond_cached
+= tx_bond_lookup(>bond_cache, tx_bond->bond_id);
 
 if (!tx_bond_cached) {
 /* Create new bond entry in cache. */
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v10 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

2020-01-07 Thread Vishal Deep Ajmera via dev
Problem:

In OVS-DPDK, flows with output over a bond interface of type “balance-tcp”
(using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer into
"HASH" and "RECIRC" datapath actions. After recirculation, the packet is
forwarded to the bond member port based on 8-bits of the datapath hash
value computed through dp_hash. This causes performance degradation in the
following ways:

1. The recirculation of the packet implies another lookup of the packet’s
flow key in the exact match cache (EMC) and potentially Megaflow classifier
(DPCLS). This is the biggest cost factor.

2. The recirculated packets have a new “RSS” hash and compete with the
original packets for the scarce number of EMC slots. This implies more
EMC misses and potentially EMC thrashing causing costly DPCLS lookups.

3. The 256 extra megaflow entries per bond for dp_hash bond selection put
additional load on the revalidation threads.

Owing to this performance degradation, deployments stick to “balance-slb”
bond mode even though it does not do active-active load balancing for
VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same
source MAC address.

Proposed optimization:
--
This proposal introduces a new load-balancing output action instead of
recirculation.

Maintain one table per-bond (could just be an array of uint16's) and
program it the same way internal flows are created today for each possible
hash value(256 entries) from ofproto layer. Use this table to load-balance
flows as part of output action processing.

Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules()
-> bond_may_recirc() and compose_output_action__() generate
“dp_hash(hash_l4(0))” and “recirc()” actions. In this case the
RecircID identifies the bond. For the recirculated packets the ofproto layer
installs megaflow entries that match on RecircID and masked dp_hash and send
them to the corresponding output port.

Instead, we will now generate action as
"lb_output(bond,)"

This combines hash computation (only if needed, else re-use RSS hash) and
inline load-balancing over the bond. This action is used *only* for balance-tcp
bonds in OVS-DPDK datapath (the OVS kernel datapath remains unchanged).

Example:

Current scheme:
---
With 1 IP-UDP flow:

flow-dump from pmd on cpu core: 2
recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no),
 packets:2828969, bytes:181054016, used:0.000s, 
actions:hash(hash_l4(0)),recirc(0x1)

recirc_id(0x1),dp_hash(0x113683bd/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
 packets:2828937, bytes:181051968, used:0.000s, actions:2

With 8 IP-UDP flows (with random UDP src port): (all hitting same DPCL):

flow-dump from pmd on cpu core: 2
recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no),
 packets:2674009, bytes:171136576, used:0.000s, 
actions:hash(hash_l4(0)),recirc(0x1)

recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
 packets:377395, bytes:24153280, used:0.000s, actions:2
recirc_id(0x1),dp_hash(0xb236c260/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
 packets:333486, bytes:21343104, used:0.000s, actions:1
recirc_id(0x1),dp_hash(0x7d89eb18/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
 packets:348461, bytes:22301504, used:0.000s, actions:1
recirc_id(0x1),dp_hash(0xa78d75df/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
 packets:633353, bytes:40534592, used:0.000s, actions:2
recirc_id(0x1),dp_hash(0xb58d846f/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
 packets:319901, bytes:20473664, used:0.001s, actions:2
recirc_id(0x1),dp_hash(0x24534406/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
 packets:334985, bytes:21439040, used:0.001s, actions:1
recirc_id(0x1),dp_hash(0x3cf32550/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
 packets:326404, bytes:20889856, used:0.001s, actions:1

New scheme:
---
We can do with a single flow entry (for any number of new flows):

in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no),
 packets:2674009, bytes:171136576, used:0.000s, actions:lb_output(bond,1)

A new CLI has been added to dump datapath bond cache as given below.

“sudo ovs-appctl dpif-netdev/dp-bond-show [dp]”

root@ubuntu-190:performance_scripts # sudo ovs-appctl dpif-netdev/dp-bond-show
Bond cache:
bond-id 1 :
bucket 0 - slave 2
bucket 1 - slave 1
bucket 2 - slave 2
bucket 3 - slave 1

Performance improvement:

With a prototype of the proposed idea, the following perf improvement is seen
with Phy-VM-Phy UDP