Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-31 Thread Mohamad Haj Yahia
>> Is it multi host configuration or bare metal ?
>
>
> multihost
>
>> Do you have internal loopback traffic between different hosts ?
>
>
> in a multihost? how can I check that?
> Is there an ethtool command?
>

You can check that by sending traffic from one PF to another on the same port.

>>> broken firmware or expected behavior?
>>
>>
>> which driver did you test ? backported or net-next ?
>
>
> both backported and net-next with Tom's patches.
>
>> if it is backported driver please verify that on driver load the
>> following occurs :
>>
>> 1. VPORTS change events are globally enabled:
>> in mlx5_start_eqs@eq.c:
>> async_event_mask |= (1ull << MLX5_EVENT_TYPE_NIC_VPORT_CHANGE);
>
>
> this one is done.
>
>> 2. UC address change events are enabled for vport 0 (PF):
>> In eswitch_attach or on eswitch_init (depends on the kernel version)
>> @eswitch.c
>> esw_enable_vport(esw, 0, UC_ADDR_CHANGE); is called.
>
>
> this one is not. Tom's proposal to compile out eswitch.c
> removes invocation of mlx5_eswitch_attach() and
> corresponding esw_enable_vport() call as well.
> The question is why is it necessary?
> What will break if it's not done?
> so far we don't see any adverse effects in both multihost
> and baremetal setups.
>
Please pay attention that if you compile out the eswitch.c we will not
program the l2 table.
However, the FW populate the L2 table with default entries after
init_hca command (at the driver load stage).
For each PF, By default its permanent mac pointing to it in the l2 table.
This can explain why its working for you without setting the l2 entries.

Now, if you try to set another mac for that PF or try to reach mac
addresses that are not the permantent address (virtual mac addresses
over the PF) it will fail.
So for this reason its crucial that the PF program the l2 table (which
needs the UC_ADDR_CHANGE set in the PF vport context).

>> BTW folks, i am going to be on vacation for the rest of the week, so
>> please expect slow responses.
>
>
> have a great time off. I hope other mlx folks can answer.
>


Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-30 Thread Alexei Starovoitov

On 1/30/17 1:18 PM, Saeed Mahameed wrote:

On Mon, Jan 30, 2017 at 6:45 PM, Alexei Starovoitov  wrote:

On 1/29/17 1:11 AM, Saeed Mahameed wrote:



ConnectX4/5 and hopefully so on .. provide three different isolated
steering layers:

3. vport layer: avaialbe for any PF/VF vport nic driver instance
(netdevice), it allows vlan/mac filtering
   ,RSS hashing and n-tuple steering (for both encapsulated and
nonencapsulated traffic) and RFS steering. ( the code above only
writes flow entries of a PF/VF to its own vport flow tables, there is
another mechanism to propagate l2 steering rules down to eswitch from
the vport layer.

2. eswitch layer: Available for PFs only with
HCA_CAP.vport_group_manager capability set.
it allows steering between PF and different VFs on the same host (vlan
mac steering and ACL filters in sriov legacy mode, and fancy n-tuple
steering and offloads for switchdev mode - eswitch_offloads.c - )
if this table is not create the default is pass-throu traffic to PF

1. L2 table: Available for PFs only with HCA_CAP.vport_group_manager
capability set.
needed for MH configurations and only PF is allowed and should write
"request UC MAC - set_l2_table_entry" on behalf of the PF itself and
it's own VFs.

- On a bare metal machine only layer 3 is required (all traffic is
passed to the PF vport).
- On a MH configuration layer 3 and 1 are required.
- On a SRIOV configuration layer 3 and 2 are required.
- On MH with SRIOV all layers are required.

in the driver, eswitch and L2 layers are handled by PF@eswitch.c.

So for your question:

PF always init_eswitch ( no eswitch -sriov- tables are created), and
the eswitch will start listening for vport_change_events.

A PF/VF or netdev vport instance on any steering changes updates
should call  mlx5e_vport_context_update[1]

vport_context_update is A FW command that will store the current
UC/MC/VLAN list and promiscuity info of a vport.

The FW will generate an event to the PF driver eswitch manager (vport
manager) mlx5_eswitch_vport_event [2], and the PF eswitch will call
set_l2_table_entry for each UC mac on each vport change event of any
vport (including its own vport), in case of SRIOV is enabled it will
update eswitch tables as well.

To simplify my answer the function calls are:
Vport VF/PF netdevice:
mlx5e_set_rx_mode_work
  mlx5e_vport_context_update
 mlx5e_vport_context_update_addr_list  --> FW event will be
generated to the PF esiwtch manager

PF eswitch manager(eswitch.c) on a vport change FW event:
mlx5_eswitch_vport_event
esw_vport_change_handler
 esw_vport_change_handle_locked
 esw_apply_vport_addr_list
esw_add_uc_addr
   set_l2_table_entry --> this will
update the l2 table in case MH is enabled.



all makes sense. To test this logic I added printk-s
to above functions, but I only see:
# ip link set eth0 addr 24:8a:07:47:2b:6e
[  148.861914] mlx5e_vport_context_update_addr_list: is_uc 1 err 0
[  148.875152] mlx5e_vport_context_update_addr_list: is_uc 0 err 0

MLX5_EVENT_TYPE_NIC_VPORT_CHANGE doesn't come into mlx5_eq_int().


Strange, just double checked and i got those events on latest net-next
bare-metal box.


Yet nic seems to work fine. Packets come and go.



Is it multi host configuration or bare metal ?


multihost


Do you have internal loopback traffic between different hosts ?


in a multihost? how can I check that?
Is there an ethtool command?


broken firmware or expected behavior?


which driver did you test ? backported or net-next ?


both backported and net-next with Tom's patches.


if it is backported driver please verify that on driver load the
following occurs :

1. VPORTS change events are globally enabled:
in mlx5_start_eqs@eq.c:
async_event_mask |= (1ull << MLX5_EVENT_TYPE_NIC_VPORT_CHANGE);


this one is done.


2. UC address change events are enabled for vport 0 (PF):
In eswitch_attach or on eswitch_init (depends on the kernel version) @eswitch.c
esw_enable_vport(esw, 0, UC_ADDR_CHANGE); is called.


this one is not. Tom's proposal to compile out eswitch.c
removes invocation of mlx5_eswitch_attach() and
corresponding esw_enable_vport() call as well.
The question is why is it necessary?
What will break if it's not done?
so far we don't see any adverse effects in both multihost
and baremetal setups.


BTW folks, i am going to be on vacation for the rest of the week, so
please expect slow responses.


have a great time off. I hope other mlx folks can answer.



Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-30 Thread Saeed Mahameed
On Mon, Jan 30, 2017 at 6:45 PM, Alexei Starovoitov  wrote:
> On 1/29/17 1:11 AM, Saeed Mahameed wrote:
>>
>>
>> ConnectX4/5 and hopefully so on .. provide three different isolated
>> steering layers:
>>
>> 3. vport layer: avaialbe for any PF/VF vport nic driver instance
>> (netdevice), it allows vlan/mac filtering
>>   ,RSS hashing and n-tuple steering (for both encapsulated and
>> nonencapsulated traffic) and RFS steering. ( the code above only
>> writes flow entries of a PF/VF to its own vport flow tables, there is
>> another mechanism to propagate l2 steering rules down to eswitch from
>> the vport layer.
>>
>> 2. eswitch layer: Available for PFs only with
>> HCA_CAP.vport_group_manager capability set.
>> it allows steering between PF and different VFs on the same host (vlan
>> mac steering and ACL filters in sriov legacy mode, and fancy n-tuple
>> steering and offloads for switchdev mode - eswitch_offloads.c - )
>> if this table is not create the default is pass-throu traffic to PF
>>
>> 1. L2 table: Available for PFs only with HCA_CAP.vport_group_manager
>> capability set.
>> needed for MH configurations and only PF is allowed and should write
>> "request UC MAC - set_l2_table_entry" on behalf of the PF itself and
>> it's own VFs.
>>
>> - On a bare metal machine only layer 3 is required (all traffic is
>> passed to the PF vport).
>> - On a MH configuration layer 3 and 1 are required.
>> - On a SRIOV configuration layer 3 and 2 are required.
>> - On MH with SRIOV all layers are required.
>>
>> in the driver, eswitch and L2 layers are handled by PF@eswitch.c.
>>
>> So for your question:
>>
>> PF always init_eswitch ( no eswitch -sriov- tables are created), and
>> the eswitch will start listening for vport_change_events.
>>
>> A PF/VF or netdev vport instance on any steering changes updates
>> should call  mlx5e_vport_context_update[1]
>>
>> vport_context_update is A FW command that will store the current
>> UC/MC/VLAN list and promiscuity info of a vport.
>>
>> The FW will generate an event to the PF driver eswitch manager (vport
>> manager) mlx5_eswitch_vport_event [2], and the PF eswitch will call
>> set_l2_table_entry for each UC mac on each vport change event of any
>> vport (including its own vport), in case of SRIOV is enabled it will
>> update eswitch tables as well.
>>
>> To simplify my answer the function calls are:
>> Vport VF/PF netdevice:
>> mlx5e_set_rx_mode_work
>>  mlx5e_vport_context_update
>> mlx5e_vport_context_update_addr_list  --> FW event will be
>> generated to the PF esiwtch manager
>>
>> PF eswitch manager(eswitch.c) on a vport change FW event:
>> mlx5_eswitch_vport_event
>>esw_vport_change_handler
>> esw_vport_change_handle_locked
>> esw_apply_vport_addr_list
>>esw_add_uc_addr
>>   set_l2_table_entry --> this will
>> update the l2 table in case MH is enabled.
>
>
> all makes sense. To test this logic I added printk-s
> to above functions, but I only see:
> # ip link set eth0 addr 24:8a:07:47:2b:6e
> [  148.861914] mlx5e_vport_context_update_addr_list: is_uc 1 err 0
> [  148.875152] mlx5e_vport_context_update_addr_list: is_uc 0 err 0
>
> MLX5_EVENT_TYPE_NIC_VPORT_CHANGE doesn't come into mlx5_eq_int().

Strange, just double checked and i got those events on latest net-next
bare-metal box.

> Yet nic seems to work fine. Packets come and go.
>

Is it multi host configuration or bare metal ?
Do you have internal loopback traffic between different hosts ?

> broken firmware or expected behavior?

which driver did you test ? backported or net-next ?

if it is backported driver please verify that on driver load the
following occurs :

1. VPORTS change events are globally enabled:
in mlx5_start_eqs@eq.c:
async_event_mask |= (1ull << MLX5_EVENT_TYPE_NIC_VPORT_CHANGE);

2. UC address change events are enabled for vport 0 (PF):
In eswitch_attach or on eswitch_init (depends on the kernel version) @eswitch.c
esw_enable_vport(esw, 0, UC_ADDR_CHANGE); is called.

>
> # ethtool -i eth0
> driver: mlx5_core
> version: 3.0-1 (January 2015)
> firmware-version: 14.16.2024
>

BTW folks, i am going to be on vacation for the rest of the week, so
please expect slow responses.


Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-30 Thread Alexei Starovoitov

On 1/29/17 1:11 AM, Saeed Mahameed wrote:


ConnectX4/5 and hopefully so on .. provide three different isolated
steering layers:

3. vport layer: avaialbe for any PF/VF vport nic driver instance
(netdevice), it allows vlan/mac filtering
  ,RSS hashing and n-tuple steering (for both encapsulated and
nonencapsulated traffic) and RFS steering. ( the code above only
writes flow entries of a PF/VF to its own vport flow tables, there is
another mechanism to propagate l2 steering rules down to eswitch from
the vport layer.

2. eswitch layer: Available for PFs only with
HCA_CAP.vport_group_manager capability set.
it allows steering between PF and different VFs on the same host (vlan
mac steering and ACL filters in sriov legacy mode, and fancy n-tuple
steering and offloads for switchdev mode - eswitch_offloads.c - )
if this table is not create the default is pass-throu traffic to PF

1. L2 table: Available for PFs only with HCA_CAP.vport_group_manager
capability set.
needed for MH configurations and only PF is allowed and should write
"request UC MAC - set_l2_table_entry" on behalf of the PF itself and
it's own VFs.

- On a bare metal machine only layer 3 is required (all traffic is
passed to the PF vport).
- On a MH configuration layer 3 and 1 are required.
- On a SRIOV configuration layer 3 and 2 are required.
- On MH with SRIOV all layers are required.

in the driver, eswitch and L2 layers are handled by PF@eswitch.c.

So for your question:

PF always init_eswitch ( no eswitch -sriov- tables are created), and
the eswitch will start listening for vport_change_events.

A PF/VF or netdev vport instance on any steering changes updates
should call  mlx5e_vport_context_update[1]

vport_context_update is A FW command that will store the current
UC/MC/VLAN list and promiscuity info of a vport.

The FW will generate an event to the PF driver eswitch manager (vport
manager) mlx5_eswitch_vport_event [2], and the PF eswitch will call
set_l2_table_entry for each UC mac on each vport change event of any
vport (including its own vport), in case of SRIOV is enabled it will
update eswitch tables as well.

To simplify my answer the function calls are:
Vport VF/PF netdevice:
mlx5e_set_rx_mode_work
 mlx5e_vport_context_update
mlx5e_vport_context_update_addr_list  --> FW event will be
generated to the PF esiwtch manager

PF eswitch manager(eswitch.c) on a vport change FW event:
mlx5_eswitch_vport_event
   esw_vport_change_handler
esw_vport_change_handle_locked
esw_apply_vport_addr_list
   esw_add_uc_addr
  set_l2_table_entry --> this will
update the l2 table in case MH is enabled.


all makes sense. To test this logic I added printk-s
to above functions, but I only see:
# ip link set eth0 addr 24:8a:07:47:2b:6e
[  148.861914] mlx5e_vport_context_update_addr_list: is_uc 1 err 0
[  148.875152] mlx5e_vport_context_update_addr_list: is_uc 0 err 0

MLX5_EVENT_TYPE_NIC_VPORT_CHANGE doesn't come into mlx5_eq_int().
Yet nic seems to work fine. Packets come and go.

broken firmware or expected behavior?

# ethtool -i eth0
driver: mlx5_core
version: 3.0-1 (January 2015)
firmware-version: 14.16.2024



Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-29 Thread Saeed Mahameed
On Sat, Jan 28, 2017 at 7:52 PM, Alexei Starovoitov  wrote:
> On 1/28/17 3:20 AM, Saeed Mahameed wrote:
>>
>> On Sat, Jan 28, 2017 at 1:23 AM, Alexei Starovoitov  wrote:
>>>
>>> On 1/27/17 1:15 PM, Saeed Mahameed wrote:


 It is only mandatory for configurations that needs eswitch, where the
 driver has no way to know about them, for a good old bare metal box,
 eswitch is not needed.

 we can do some work to strip the l2 table logic - needed for PFs to
 work on multi-host - out of eswitch but again that would further
 complicate the driver code since eswitch will still need to update l2
 tables for VFs.
>>>
>>>
>>>
>>> Saeed,
>>> for multi-host setups every host in that multi-host doesn't
>>> actually see the eswitch, no? Otherwise broken driver on one machine
>>> can affect the other hosts in the same bundle? Please double check,
>>
>>
>> each host (PF) has its own eswitch, and each eswitch lives in its own
>> "steering-space"
>>   and it can't affect others.
>>
>>> since this is absolutely critical HW requirement.
>>>
>>
>> The only shared HW resources between hosts (PFs) is the simple l2 table,
>> and the only thing a host can ask from the l2 talbe (FW) is: "forward
>> UC MAC to me", and it is the responsibility of the the driver eswitch
>> to do so.
>>
>> the l2 table is created and managed by FW, SW eswitch can only request
>> from FW, and the FW is trusted.
>
>
> ok. clear. thanks for explaining.
> Could you describe the sequence of function calls within mlx5
> that does the assignment of uc mac for PF ?
> since I'm missing where eswitch is involved.
> I can see:
> mlx5e_nic_enable | mlx5e_set_mac
>   queue_work(priv->wq, &priv->set_rx_mode_work);
> mlx5e_set_rx_mode_work
>   mlx5e_apply_netdev_addr
> mlx5e_add_l2_flow_rule
>

It is a  little bit more complicated than this :).

ConnectX4/5 and hopefully so on .. provide three different isolated
steering layers:

3. vport layer: avaialbe for any PF/VF vport nic driver instance
(netdevice), it allows vlan/mac filtering
 ,RSS hashing and n-tuple steering (for both encapsulated and
nonencapsulated traffic) and RFS steering. ( the code above only
writes flow entries of a PF/VF to its own vport flow tables, there is
another mechanism to propagate l2 steering rules down to eswitch from
the vport layer.

2. eswitch layer: Available for PFs only with
HCA_CAP.vport_group_manager capability set.
it allows steering between PF and different VFs on the same host (vlan
mac steering and ACL filters in sriov legacy mode, and fancy n-tuple
steering and offloads for switchdev mode - eswitch_offloads.c - )
if this table is not create the default is pass-throu traffic to PF

1. L2 table: Available for PFs only with HCA_CAP.vport_group_manager
capability set.
needed for MH configurations and only PF is allowed and should write
"request UC MAC - set_l2_table_entry" on behalf of the PF itself and
it's own VFs.

- On a bare metal machine only layer 3 is required (all traffic is
passed to the PF vport).
- On a MH configuration layer 3 and 1 are required.
- On a SRIOV configuration layer 3 and 2 are required.
- On MH with SRIOV all layers are required.

in the driver, eswitch and L2 layers are handled by PF@eswitch.c.

So for your question:

PF always init_eswitch ( no eswitch -sriov- tables are created), and
the eswitch will start listening for vport_change_events.

A PF/VF or netdev vport instance on any steering changes updates
should call  mlx5e_vport_context_update[1]

vport_context_update is A FW command that will store the current
UC/MC/VLAN list and promiscuity info of a vport.

The FW will generate an event to the PF driver eswitch manager (vport
manager) mlx5_eswitch_vport_event [2], and the PF eswitch will call
set_l2_table_entry for each UC mac on each vport change event of any
vport (including its own vport), in case of SRIOV is enabled it will
update eswitch tables as well.

To simplify my answer the function calls are:
Vport VF/PF netdevice:
mlx5e_set_rx_mode_work
mlx5e_vport_context_update
   mlx5e_vport_context_update_addr_list  --> FW event will be
generated to the PF esiwtch manager

PF eswitch manager(eswitch.c) on a vport change FW event:
mlx5_eswitch_vport_event
  esw_vport_change_handler
   esw_vport_change_handle_locked
   esw_apply_vport_addr_list
  esw_add_uc_addr
 set_l2_table_entry --> this will
update the l2 table in case MH is enabled.

Sorry for the long answer :)
-Saeed

[1] 
http://lxr.free-electrons.com/source/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c#L440
[2] 
http://lxr.free-electrons.com/source/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c#L1675


Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-28 Thread Alexei Starovoitov

On 1/28/17 3:20 AM, Saeed Mahameed wrote:

On Sat, Jan 28, 2017 at 1:23 AM, Alexei Starovoitov  wrote:

On 1/27/17 1:15 PM, Saeed Mahameed wrote:


It is only mandatory for configurations that needs eswitch, where the
driver has no way to know about them, for a good old bare metal box,
eswitch is not needed.

we can do some work to strip the l2 table logic - needed for PFs to
work on multi-host - out of eswitch but again that would further
complicate the driver code since eswitch will still need to update l2
tables for VFs.



Saeed,
for multi-host setups every host in that multi-host doesn't
actually see the eswitch, no? Otherwise broken driver on one machine
can affect the other hosts in the same bundle? Please double check,


each host (PF) has its own eswitch, and each eswitch lives in its own
"steering-space"
  and it can't affect others.


since this is absolutely critical HW requirement.



The only shared HW resources between hosts (PFs) is the simple l2 table,
and the only thing a host can ask from the l2 talbe (FW) is: "forward
UC MAC to me", and it is the responsibility of the the driver eswitch
to do so.

the l2 table is created and managed by FW, SW eswitch can only request
from FW, and the FW is trusted.


ok. clear. thanks for explaining.
Could you describe the sequence of function calls within mlx5
that does the assignment of uc mac for PF ?
since I'm missing where eswitch is involved.
I can see:
mlx5e_nic_enable | mlx5e_set_mac
  queue_work(priv->wq, &priv->set_rx_mode_work);
mlx5e_set_rx_mode_work
  mlx5e_apply_netdev_addr
mlx5e_add_l2_flow_rule



Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-28 Thread Saeed Mahameed
On Sat, Jan 28, 2017 at 1:23 AM, Alexei Starovoitov  wrote:
> On 1/27/17 1:15 PM, Saeed Mahameed wrote:
>>
>> It is only mandatory for configurations that needs eswitch, where the
>> driver has no way to know about them, for a good old bare metal box,
>> eswitch is not needed.
>>
>> we can do some work to strip the l2 table logic - needed for PFs to
>> work on multi-host - out of eswitch but again that would further
>> complicate the driver code since eswitch will still need to update l2
>> tables for VFs.
>
>
> Saeed,
> for multi-host setups every host in that multi-host doesn't
> actually see the eswitch, no? Otherwise broken driver on one machine
> can affect the other hosts in the same bundle? Please double check,

each host (PF) has its own eswitch, and each eswitch lives in its own
"steering-space"
 and it can't affect others.

> since this is absolutely critical HW requirement.
>

The only shared HW resources between hosts (PFs) is the simple l2 table,
and the only thing a host can ask from the l2 talbe (FW) is: "forward
UC MAC to me", and it is the responsibility of the the driver eswitch
to do so.

the l2 table is created and managed by FW, SW eswitch can only request
from FW, and the FW is trusted.


Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-27 Thread Alexei Starovoitov

On 1/27/17 1:15 PM, Saeed Mahameed wrote:

It is only mandatory for configurations that needs eswitch, where the
driver has no way to know about them, for a good old bare metal box,
eswitch is not needed.

we can do some work to strip the l2 table logic - needed for PFs to
work on multi-host - out of eswitch but again that would further
complicate the driver code since eswitch will still need to update l2
tables for VFs.


Saeed,
for multi-host setups every host in that multi-host doesn't
actually see the eswitch, no? Otherwise broken driver on one machine
can affect the other hosts in the same bundle? Please double check,
since this is absolutely critical HW requirement.



Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-27 Thread Saeed Mahameed
On Fri, Jan 27, 2017 at 8:42 PM, Tom Herbert  wrote:
> On Fri, Jan 27, 2017 at 10:28 AM, Saeed Mahameed
>  wrote:
>> On Fri, Jan 27, 2017 at 8:16 PM, Tom Herbert  wrote:
>>> On Fri, Jan 27, 2017 at 10:05 AM, Saeed Mahameed
>>>  wrote:
 On Fri, Jan 27, 2017 at 7:50 PM, Tom Herbert  wrote:
> On Fri, Jan 27, 2017 at 9:38 AM, Saeed Mahameed
>  wrote:
>> On Fri, Jan 27, 2017 at 7:34 AM, Or Gerlitz  wrote:
>>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert  
>>> wrote:
 Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
 whether the eswitch code is built. Change Kconfig and Makefile
 accordingly.
>>>
>>> Tom, FWIW, please note that the basic e-switch functionality is needed
>>> also when SRIOV isn't of use, this is for a multi host configuration.
>>>
>>
>> Right, set_l2_table_entry@eswitch.c need to be called by PF for any UC
>> MAC address wanted by VF or PF.
>> To keep one flow in the code, the implementation is done as part of 
>> eswitch.
>>
>> so in multi-host configuration (where there are 4 PFs) each PF should
>> invoke set_l2_table_entry_cmd  for each one of its own UC MACs.
>>
>> populating the l2 table is done using the whole eswitch event driven
>> mechanisms, it is not easy and IMH not right to separate eswitch
>> tables from l2 table (same management logic, different tables).
>>
>> Anyways as Or stated this is just an FYI, eswitch needs to be enabled
>> on Multi-host configuration.
>>
> What indicate a multi-host configuration?

 nothing in the driver, it is transparent.

>>> So then we always need the eswitch code to be built even if someone
>>> never uses any of it?
>>>
>>
>> yes.
>> but for your convenience all you need is to compile eswitch.c.
>> esiwtch_offoalds.c and en_rep.c can be compiled out for basic ethernet.
>>
> Well eswitch.c is 2200 LOC. en_rep.c and eswitch_offloads.c are 1600
> LOC. If we _must_ have eswitch.c then there's probably not much point
> then. But I am still finding it hard to fathom that eswitch has now
> become a mandatory component of Ethernet drivers/devices.
>

It is only mandatory for configurations that needs eswitch, where the
driver has no way to know about them, for a good old bare metal box,
eswitch is not needed.

we can do some work to strip the l2 table logic - needed for PFs to
work on multi-host - out of eswitch but again that would further
complicate the driver code since eswitch will still need to update l2
tables for VFs.


> Tom
>
>
>>> Or.
>>>
>>> My WW (and same for the rest of the IL team..) has ended so I will be
>>> able to further look on this series and comment on Sunday.


Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-27 Thread Saeed Mahameed
On Fri, Jan 27, 2017 at 8:33 PM, Tom Herbert  wrote:
> On Fri, Jan 27, 2017 at 10:19 AM, Saeed Mahameed
>  wrote:
>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert  wrote:
>>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>>> whether the eswitch code is built. Change Kconfig and Makefile
>>> accordingly.
>>>
>>> Signed-off-by: Tom Herbert 
>>> ---
>>>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig   | 11 +++
>>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  6 +-
>>>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 92 
>>> +--
>>>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   | 39 +++---
>>>  drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  4 +-
>>>  drivers/net/ethernet/mellanox/mlx5/core/main.c| 16 ++--
>>>  drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |  6 +-
>>>  7 files changed, 125 insertions(+), 49 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
>>> b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
>>> index ddb4ca4..27aae58 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
>>> @@ -30,3 +30,14 @@ config MLX5_CORE_EN_DCB
>>>   This flag is depended on the kernel's DCB support.
>>>
>>>   If unsure, set to Y
>>> +
>>> +config MLX5_CORE_EN_ESWITCH
>>> +   bool "Ethernet switch"
>>> +   default y
>>> +   depends on MLX5_CORE_EN
>>> +   ---help---
>>> + Say Y here if you want to use Ethernet switch (E-switch). E-Switch
>>> + is the software entity that represents and manages ConnectX4
>>> + inter-HCA ethernet l2 switching.
>>> +
>>> + If unsure, set to Y
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
>>> b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>> index 9f43beb..17025d8 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>> @@ -5,9 +5,11 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o 
>>> pagealloc.o \
>>> mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
>>> fs_counters.o rl.o lag.o dev.o
>>>
>>> -mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \
>>> +mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o \
>>> en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \
>>> en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \
>>> -   en_tc.o en_arfs.o en_rep.o en_fs_ethtool.o en_selftest.o
>>> +   en_tc.o en_arfs.o en_fs_ethtool.o en_selftest.o
>>>
>>>  mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
>>> +
>>> +mlx5_core-$(CONFIG_MLX5_CORE_EN_ESWITCH) += eswitch.o eswitch_offloads.o 
>>> en_rep.o
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> index e829143..1db4d98 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> @@ -38,7 +38,9 @@
>>>  #include 
>>>  #include "en.h"
>>>  #include "en_tc.h"
>>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>>  #include "eswitch.h"
>>> +#endif
>>
>> Wouldn't it be cleaner if we left the main code (en_main.c) untouched
>> and kept this
>> #include "eswitch.h" and instead of filling the main flow code with
>> #ifdefs, in eswitch.h
>> we can create eswitch mock API functions when
>> CONFIG_MLX5_CORE_EN_ESWITCH is not enabled ? the main flow will be
>> clean from ifdefs and will complie with mock functions.
>>
>> we did this with accelerated RFS feature. look for "#ifndef
>> CONFIG_RFS_ACCEL" in en.h
>>
> There are still occurrences of CONFIG_RFS_ACCEL in en_main.c and
> main.c. For eswitch its a header problem, not everything related to
> eswitch was extracted out of main though backend functions. There is a
> lot of code that related to eswitch that is intertwined with the core
> code.
>

Interesting, i just did a quick look and it seems to me all eswitch
logic in en_main.c can be kept untouched if we have the right mock
functions, on the other hand it seems that there are a lot of
eswitch functions to mock, i am not sure it is a good thing anymore,
let's leave it as is for now.


Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-27 Thread Tom Herbert
On Fri, Jan 27, 2017 at 10:19 AM, Saeed Mahameed
 wrote:
> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert  wrote:
>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>> whether the eswitch code is built. Change Kconfig and Makefile
>> accordingly.
>>
>> Signed-off-by: Tom Herbert 
>> ---
>>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig   | 11 +++
>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  6 +-
>>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 92 
>> +--
>>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   | 39 +++---
>>  drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  4 +-
>>  drivers/net/ethernet/mellanox/mlx5/core/main.c| 16 ++--
>>  drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |  6 +-
>>  7 files changed, 125 insertions(+), 49 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
>> b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
>> index ddb4ca4..27aae58 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
>> @@ -30,3 +30,14 @@ config MLX5_CORE_EN_DCB
>>   This flag is depended on the kernel's DCB support.
>>
>>   If unsure, set to Y
>> +
>> +config MLX5_CORE_EN_ESWITCH
>> +   bool "Ethernet switch"
>> +   default y
>> +   depends on MLX5_CORE_EN
>> +   ---help---
>> + Say Y here if you want to use Ethernet switch (E-switch). E-Switch
>> + is the software entity that represents and manages ConnectX4
>> + inter-HCA ethernet l2 switching.
>> +
>> + If unsure, set to Y
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
>> b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> index 9f43beb..17025d8 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> @@ -5,9 +5,11 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o 
>> pagealloc.o \
>> mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
>> fs_counters.o rl.o lag.o dev.o
>>
>> -mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \
>> +mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o \
>> en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \
>> en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \
>> -   en_tc.o en_arfs.o en_rep.o en_fs_ethtool.o en_selftest.o
>> +   en_tc.o en_arfs.o en_fs_ethtool.o en_selftest.o
>>
>>  mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
>> +
>> +mlx5_core-$(CONFIG_MLX5_CORE_EN_ESWITCH) += eswitch.o eswitch_offloads.o 
>> en_rep.o
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> index e829143..1db4d98 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> @@ -38,7 +38,9 @@
>>  #include 
>>  #include "en.h"
>>  #include "en_tc.h"
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>  #include "eswitch.h"
>> +#endif
>
> Wouldn't it be cleaner if we left the main code (en_main.c) untouched
> and kept this
> #include "eswitch.h" and instead of filling the main flow code with
> #ifdefs, in eswitch.h
> we can create eswitch mock API functions when
> CONFIG_MLX5_CORE_EN_ESWITCH is not enabled ? the main flow will be
> clean from ifdefs and will complie with mock functions.
>
> we did this with accelerated RFS feature. look for "#ifndef
> CONFIG_RFS_ACCEL" in en.h
>
There are still occurrences of CONFIG_RFS_ACCEL in en_main.c and
main.c. For eswitch its a header problem, not everything related to
eswitch was extracted out of main though backend functions. There is a
lot of code that related to eswitch that is intertwined with the core
code.

>>  #include "vxlan.h"
>>
>>  struct mlx5e_rq_param {
>> @@ -585,10 +587,12 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
>>
>> switch (priv->params.rq_wq_type) {
>> case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>> if (mlx5e_is_vf_vport_rep(priv)) {
>> err = -EINVAL;
>> goto err_rq_wq_destroy;
>> }
>> +#endif
>>
>> rq->handle_rx_cqe = mlx5e_handle_rx_cqe_mpwrq;
>> rq->alloc_wqe = mlx5e_alloc_rx_mpwqe;
>> @@ -617,10 +621,14 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
>> goto err_rq_wq_destroy;
>> }
>>
>> -   if (mlx5e_is_vf_vport_rep(priv))
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>> +   if (mlx5e_is_vf_vport_rep(priv)) {
>> rq->handle_rx_cqe = mlx5e_handle_rx_cqe_rep;
>> -   else
>> +   } else
>> +#endif
>> +   {
>> rq->handle_rx_cqe = mlx5e_handle_rx_cqe;
>> +   }
>>
>> rq->alloc_wqe = 

Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-27 Thread Tom Herbert
On Fri, Jan 27, 2017 at 10:28 AM, Saeed Mahameed
 wrote:
> On Fri, Jan 27, 2017 at 8:16 PM, Tom Herbert  wrote:
>> On Fri, Jan 27, 2017 at 10:05 AM, Saeed Mahameed
>>  wrote:
>>> On Fri, Jan 27, 2017 at 7:50 PM, Tom Herbert  wrote:
 On Fri, Jan 27, 2017 at 9:38 AM, Saeed Mahameed
  wrote:
> On Fri, Jan 27, 2017 at 7:34 AM, Or Gerlitz  wrote:
>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert  
>> wrote:
>>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>>> whether the eswitch code is built. Change Kconfig and Makefile
>>> accordingly.
>>
>> Tom, FWIW, please note that the basic e-switch functionality is needed
>> also when SRIOV isn't of use, this is for a multi host configuration.
>>
>
> Right, set_l2_table_entry@eswitch.c need to be called by PF for any UC
> MAC address wanted by VF or PF.
> To keep one flow in the code, the implementation is done as part of 
> eswitch.
>
> so in multi-host configuration (where there are 4 PFs) each PF should
> invoke set_l2_table_entry_cmd  for each one of its own UC MACs.
>
> populating the l2 table is done using the whole eswitch event driven
> mechanisms, it is not easy and IMH not right to separate eswitch
> tables from l2 table (same management logic, different tables).
>
> Anyways as Or stated this is just an FYI, eswitch needs to be enabled
> on Multi-host configuration.
>
 What indicate a multi-host configuration?
>>>
>>> nothing in the driver, it is transparent.
>>>
>> So then we always need the eswitch code to be built even if someone
>> never uses any of it?
>>
>
> yes.
> but for your convenience all you need is to compile eswitch.c.
> esiwtch_offoalds.c and en_rep.c can be compiled out for basic ethernet.
>
Well eswitch.c is 2200 LOC. en_rep.c and eswitch_offloads.c are 1600
LOC. If we _must_ have eswitch.c then there's probably not much point
then. But I am still finding it hard to fathom that eswitch has now
become a mandatory component of Ethernet drivers/devices.

Tom


>> Or.
>>
>> My WW (and same for the rest of the IL team..) has ended so I will be
>> able to further look on this series and comment on Sunday.


Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-27 Thread Saeed Mahameed
On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert  wrote:
> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
> whether the eswitch code is built. Change Kconfig and Makefile
> accordingly.
>
> Signed-off-by: Tom Herbert 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig   | 11 +++
>  drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  6 +-
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 92 
> +--
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   | 39 +++---
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c  |  4 +-
>  drivers/net/ethernet/mellanox/mlx5/core/main.c| 16 ++--
>  drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |  6 +-
>  7 files changed, 125 insertions(+), 49 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
> b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> index ddb4ca4..27aae58 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> @@ -30,3 +30,14 @@ config MLX5_CORE_EN_DCB
>   This flag is depended on the kernel's DCB support.
>
>   If unsure, set to Y
> +
> +config MLX5_CORE_EN_ESWITCH
> +   bool "Ethernet switch"
> +   default y
> +   depends on MLX5_CORE_EN
> +   ---help---
> + Say Y here if you want to use Ethernet switch (E-switch). E-Switch
> + is the software entity that represents and manages ConnectX4
> + inter-HCA ethernet l2 switching.
> +
> + If unsure, set to Y
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
> b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> index 9f43beb..17025d8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> @@ -5,9 +5,11 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o 
> pagealloc.o \
> mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
> fs_counters.o rl.o lag.o dev.o
>
> -mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \
> +mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o \
> en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \
> en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \
> -   en_tc.o en_arfs.o en_rep.o en_fs_ethtool.o en_selftest.o
> +   en_tc.o en_arfs.o en_fs_ethtool.o en_selftest.o
>
>  mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
> +
> +mlx5_core-$(CONFIG_MLX5_CORE_EN_ESWITCH) += eswitch.o eswitch_offloads.o 
> en_rep.o
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index e829143..1db4d98 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -38,7 +38,9 @@
>  #include 
>  #include "en.h"
>  #include "en_tc.h"
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>  #include "eswitch.h"
> +#endif

Wouldn't it be cleaner if we left the main code (en_main.c) untouched
and kept this
#include "eswitch.h" and instead of filling the main flow code with
#ifdefs, in eswitch.h
we can create eswitch mock API functions when
CONFIG_MLX5_CORE_EN_ESWITCH is not enabled ? the main flow will be
clean from ifdefs and will complie with mock functions.

we did this with accelerated RFS feature. look for "#ifndef
CONFIG_RFS_ACCEL" in en.h

>  #include "vxlan.h"
>
>  struct mlx5e_rq_param {
> @@ -585,10 +587,12 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
>
> switch (priv->params.rq_wq_type) {
> case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
> if (mlx5e_is_vf_vport_rep(priv)) {
> err = -EINVAL;
> goto err_rq_wq_destroy;
> }
> +#endif
>
> rq->handle_rx_cqe = mlx5e_handle_rx_cqe_mpwrq;
> rq->alloc_wqe = mlx5e_alloc_rx_mpwqe;
> @@ -617,10 +621,14 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
> goto err_rq_wq_destroy;
> }
>
> -   if (mlx5e_is_vf_vport_rep(priv))
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
> +   if (mlx5e_is_vf_vport_rep(priv)) {
> rq->handle_rx_cqe = mlx5e_handle_rx_cqe_rep;
> -   else
> +   } else
> +#endif
> +   {
> rq->handle_rx_cqe = mlx5e_handle_rx_cqe;
> +   }
>
> rq->alloc_wqe = mlx5e_alloc_rx_wqe;
> rq->dealloc_wqe = mlx5e_dealloc_rx_wqe;
> @@ -2198,7 +2206,6 @@ static void mlx5e_netdev_set_tcs(struct net_device 
> *netdev)
>  int mlx5e_open_locked(struct net_device *netdev)
>  {
> struct mlx5e_priv *priv = netdev_priv(netdev);
> -   struct mlx5_core_dev *mdev = priv->mdev;
> int num_txqs;
> int err;
>
> @@ -2233,11 +2240,13 @@ int mlx5e_open_locked(struct net_device *netdev)
> if

Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-27 Thread Saeed Mahameed
On Fri, Jan 27, 2017 at 7:50 PM, Tom Herbert  wrote:
> On Fri, Jan 27, 2017 at 9:38 AM, Saeed Mahameed
>  wrote:
>> On Fri, Jan 27, 2017 at 7:34 AM, Or Gerlitz  wrote:
>>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert  wrote:
 Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
 whether the eswitch code is built. Change Kconfig and Makefile
 accordingly.
>>>
>>> Tom, FWIW, please note that the basic e-switch functionality is needed
>>> also when SRIOV isn't of use, this is for a multi host configuration.
>>>
>>
>> Right, set_l2_table_entry@eswitch.c need to be called by PF for any UC
>> MAC address wanted by VF or PF.
>> To keep one flow in the code, the implementation is done as part of eswitch.
>>
>> so in multi-host configuration (where there are 4 PFs) each PF should
>> invoke set_l2_table_entry_cmd  for each one of its own UC MACs.
>>
>> populating the l2 table is done using the whole eswitch event driven
>> mechanisms, it is not easy and IMH not right to separate eswitch
>> tables from l2 table (same management logic, different tables).
>>
>> Anyways as Or stated this is just an FYI, eswitch needs to be enabled
>> on Multi-host configuration.
>>
> What indicate a multi-host configuration?

nothing in the driver, it is transparent.

>
>>> Or.
>>>
>>> My WW (and same for the rest of the IL team..) has ended so I will be
>>> able to further look on this series and comment on Sunday.


Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-27 Thread Saeed Mahameed
On Fri, Jan 27, 2017 at 8:16 PM, Tom Herbert  wrote:
> On Fri, Jan 27, 2017 at 10:05 AM, Saeed Mahameed
>  wrote:
>> On Fri, Jan 27, 2017 at 7:50 PM, Tom Herbert  wrote:
>>> On Fri, Jan 27, 2017 at 9:38 AM, Saeed Mahameed
>>>  wrote:
 On Fri, Jan 27, 2017 at 7:34 AM, Or Gerlitz  wrote:
> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert  wrote:
>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>> whether the eswitch code is built. Change Kconfig and Makefile
>> accordingly.
>
> Tom, FWIW, please note that the basic e-switch functionality is needed
> also when SRIOV isn't of use, this is for a multi host configuration.
>

 Right, set_l2_table_entry@eswitch.c need to be called by PF for any UC
 MAC address wanted by VF or PF.
 To keep one flow in the code, the implementation is done as part of 
 eswitch.

 so in multi-host configuration (where there are 4 PFs) each PF should
 invoke set_l2_table_entry_cmd  for each one of its own UC MACs.

 populating the l2 table is done using the whole eswitch event driven
 mechanisms, it is not easy and IMH not right to separate eswitch
 tables from l2 table (same management logic, different tables).

 Anyways as Or stated this is just an FYI, eswitch needs to be enabled
 on Multi-host configuration.

>>> What indicate a multi-host configuration?
>>
>> nothing in the driver, it is transparent.
>>
> So then we always need the eswitch code to be built even if someone
> never uses any of it?
>

yes.
but for your convenience all you need is to compile eswitch.c.
esiwtch_offoalds.c and en_rep.c can be compiled out for basic ethernet.

>>>
> Or.
>
> My WW (and same for the rest of the IL team..) has ended so I will be
> able to further look on this series and comment on Sunday.


Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-27 Thread Tom Herbert
On Fri, Jan 27, 2017 at 10:05 AM, Saeed Mahameed
 wrote:
> On Fri, Jan 27, 2017 at 7:50 PM, Tom Herbert  wrote:
>> On Fri, Jan 27, 2017 at 9:38 AM, Saeed Mahameed
>>  wrote:
>>> On Fri, Jan 27, 2017 at 7:34 AM, Or Gerlitz  wrote:
 On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert  wrote:
> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
> whether the eswitch code is built. Change Kconfig and Makefile
> accordingly.

 Tom, FWIW, please note that the basic e-switch functionality is needed
 also when SRIOV isn't of use, this is for a multi host configuration.

>>>
>>> Right, set_l2_table_entry@eswitch.c need to be called by PF for any UC
>>> MAC address wanted by VF or PF.
>>> To keep one flow in the code, the implementation is done as part of eswitch.
>>>
>>> so in multi-host configuration (where there are 4 PFs) each PF should
>>> invoke set_l2_table_entry_cmd  for each one of its own UC MACs.
>>>
>>> populating the l2 table is done using the whole eswitch event driven
>>> mechanisms, it is not easy and IMH not right to separate eswitch
>>> tables from l2 table (same management logic, different tables).
>>>
>>> Anyways as Or stated this is just an FYI, eswitch needs to be enabled
>>> on Multi-host configuration.
>>>
>> What indicate a multi-host configuration?
>
> nothing in the driver, it is transparent.
>
So then we always need the eswitch code to be built even if someone
never uses any of it?

>>
 Or.

 My WW (and same for the rest of the IL team..) has ended so I will be
 able to further look on this series and comment on Sunday.


Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-27 Thread Tom Herbert
On Fri, Jan 27, 2017 at 9:38 AM, Saeed Mahameed
 wrote:
> On Fri, Jan 27, 2017 at 7:34 AM, Or Gerlitz  wrote:
>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert  wrote:
>>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>>> whether the eswitch code is built. Change Kconfig and Makefile
>>> accordingly.
>>
>> Tom, FWIW, please note that the basic e-switch functionality is needed
>> also when SRIOV isn't of use, this is for a multi host configuration.
>>
>
> Right, set_l2_table_entry@eswitch.c need to be called by PF for any UC
> MAC address wanted by VF or PF.
> To keep one flow in the code, the implementation is done as part of eswitch.
>
> so in multi-host configuration (where there are 4 PFs) each PF should
> invoke set_l2_table_entry_cmd  for each one of its own UC MACs.
>
> populating the l2 table is done using the whole eswitch event driven
> mechanisms, it is not easy and IMH not right to separate eswitch
> tables from l2 table (same management logic, different tables).
>
> Anyways as Or stated this is just an FYI, eswitch needs to be enabled
> on Multi-host configuration.
>
What indicate a multi-host configuration?

>> Or.
>>
>> My WW (and same for the rest of the IL team..) has ended so I will be
>> able to further look on this series and comment on Sunday.


Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-27 Thread Saeed Mahameed
On Fri, Jan 27, 2017 at 7:34 AM, Or Gerlitz  wrote:
> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert  wrote:
>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>> whether the eswitch code is built. Change Kconfig and Makefile
>> accordingly.
>
> Tom, FWIW, please note that the basic e-switch functionality is needed
> also when SRIOV isn't of use, this is for a multi host configuration.
>

Right, set_l2_table_entry@eswitch.c need to be called by PF for any UC
MAC address wanted by VF or PF.
To keep one flow in the code, the implementation is done as part of eswitch.

so in multi-host configuration (where there are 4 PFs) each PF should
invoke set_l2_table_entry_cmd  for each one of its own UC MACs.

populating the l2 table is done using the whole eswitch event driven
mechanisms, it is not easy and IMH not right to separate eswitch
tables from l2 table (same management logic, different tables).

Anyways as Or stated this is just an FYI, eswitch needs to be enabled
on Multi-host configuration.

> Or.
>
> My WW (and same for the rest of the IL team..) has ended so I will be
> able to further look on this series and comment on Sunday.


Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable

2017-01-26 Thread Or Gerlitz
On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert  wrote:
> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
> whether the eswitch code is built. Change Kconfig and Makefile
> accordingly.

Tom, FWIW, please note that the basic e-switch functionality is needed
also when SRIOV isn't of use, this is for a multi host configuration.

Or.

My WW (and same for the rest of the IL team..) has ended so I will be
able to further look on this series and comment on Sunday.