Re: [PATCH v2] net/mlx4: Memcpy at slave_event should copy sizeof mlx4_eqe

2015-10-26 Thread Carol Soto



On 10/26/2015 12:02 PM, Or Gerlitz wrote:

On Mon, Oct 26, 2015 at 5:15 PM,   wrote:

From: Carol L Soto 

If the caps.eqe_size is bigger than the struct mlx4_eqe then there
is a potential for corrupting data at the master context. We can see
the message "Master failed to generate an EQE for slave: X" when the
event_eqe array wraps and we can see potential oops at the function
mlx4_GEN_EQE. Also correct a memset of cmd_eqe to use the sizeof
mlx4_eqe instead of eqe_size.

Fixes: 08ff32352d6f ('mlx4: 64-byte CQE/EQE support')
Signed-off-by: Carol L Soto 

Thanks Carol, I'd like to review this a bit more tomorrow and will
send it with another fix/es to net

Sure thanks.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Do not set shared_ports when nreq > MAX_MSIX

2015-10-07 Thread Carol Soto



On 10/7/2015 3:08 AM, Matan Barak wrote:



On 10/7/2015 10:25 AM, Matan Barak wrote:



On 10/7/2015 12:46 AM, Carol Soto wrote:



On 10/6/2015 4:39 PM, Or Gerlitz wrote:

On Wed, Oct 7, 2015 at 12:27 AM, <cls...@linux.vnet.ibm.com> wrote:

From: Carol L Soto <cls...@linux.vnet.ibm.com>

If we get MAX_MSIX interrupts would like to have each receive ring
with his own msix interrupt line.

so 9293267a3e2a  was only partially correct? and/or not fully optimal?
please elaborate more on that in your change log.

just not fully optimal, with commit 9293267a3e2a if I have 64 MSIXs and
2 ports I can get 8 rings for each port but then the rings will share
the interrupt lines. For 64 MSIXs we can have each ring with his own
interrupt line.


Fixes: 9293267a3e2a ('net/mlx4_core: Capping number of requested
MSIXs to MAX_MSIX')
Signed-off-by: Carol L Soto <cls...@linux.vnet.ibm.com>

Carol, you didn't use net/mlx4: prefix as ask for mlx4 driver patch
titles, so please repost, but before that I'd like to see an ack from
Matan for this patch as well.

Sorry completely missed it. When Matan acks will resend it.


The logic seems correct to me. When there are more nreqs than we could
possibly support (or want to support), there’s no reason to share the
EQs between the different ports.
Thanks for your fix.



Please also clean the shared_ports variable - we don't need this anymore.


Thanks for the feedback, will resend.
Carol

Regards,
Matan



Or.


---
  drivers/net/ethernet/mellanox/mlx4/main.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 006757f..f03f513 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2673,10 +2673,8 @@ static void mlx4_enable_msi_x(struct mlx4_dev
*dev)

 nreq = min_t(int, dev->caps.num_eqs -
dev->caps.reserved_eqs,
  nreq);
-   if (nreq > MAX_MSIX) {
+   if (nreq > MAX_MSIX)
 nreq = MAX_MSIX;
-   shared_ports = true;
-   }

 entries = kcalloc(nreq, sizeof *entries, 
GFP_KERNEL);

 if (!entries)
--
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html






--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Do not set shared_ports when nreq > MAX_MSIX

2015-10-06 Thread Carol Soto



On 10/6/2015 4:39 PM, Or Gerlitz wrote:

On Wed, Oct 7, 2015 at 12:27 AM,   wrote:

From: Carol L Soto 

If we get MAX_MSIX interrupts would like to have each receive ring
with his own msix interrupt line.

so 9293267a3e2a  was only partially correct? and/or not fully optimal?
please elaborate more on that in your change log.
just not fully optimal, with commit 9293267a3e2a if I have 64 MSIXs and 
2 ports I can get 8 rings for each port but then the rings will share 
the interrupt lines. For 64 MSIXs we can have each ring with his own 
interrupt line.



Fixes: 9293267a3e2a ('net/mlx4_core: Capping number of requested MSIXs to 
MAX_MSIX')
Signed-off-by: Carol L Soto 

Carol, you didn't use net/mlx4: prefix as ask for mlx4 driver patch
titles, so please repost, but before that I'd like to see an ack from
Matan for this patch as well.

Sorry completely missed it. When Matan acks will resend it.


Or.


---
  drivers/net/ethernet/mellanox/mlx4/main.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 006757f..f03f513 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2673,10 +2673,8 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)

 nreq = min_t(int, dev->caps.num_eqs - dev->caps.reserved_eqs,
  nreq);
-   if (nreq > MAX_MSIX) {
+   if (nreq > MAX_MSIX)
 nreq = MAX_MSIX;
-   shared_ports = true;
-   }

 entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL);
 if (!entries)
--
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net/mlx4_core: Test interrupts fail if not all comp vectors called request_irq

2015-10-05 Thread Carol Soto



On 10/4/2015 3:03 AM, Or Gerlitz wrote:

On 9/29/2015 9:38 PM, cls...@linux.vnet.ibm.com wrote:

From: Carol L Soto 

Test interrupts fails if not all completion vectors called
request_irq. This case can happen if only mlx4_en loads and
we have more completion vectors than rx rings.


good catch! is this a bug since the driver 0-day or was introduced by
some recent commit? in the latercase, please add a Fixes: tag before 
your S.O.B note.

Probably the issue was introduced by this one
Fixes: c66fa19c405a ('net/mlx4: Add EQ pool')


Signed-off-by: Carol L Soto 
---
  drivers/net/ethernet/mellanox/mlx4/eq.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c 
b/drivers/net/ethernet/mellanox/mlx4/eq.c

index 8e81e53..c344884 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -1364,6 +1364,10 @@ int mlx4_test_interrupts(struct mlx4_dev *dev)
   * and performing a NOP command
   */
  for(i = 0; !err && (i < dev->caps.num_comp_vectors); ++i) {
+/* Make sure request_irq was called */
+if (!priv->eq_table.eq[i].have_irq)
+continue;
+
  /* Temporary use polling for command completions */
  mlx4_cmd_use_polling(dev);




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether

2015-07-15 Thread Carol Soto



On 7/15/2015 5:54 PM, Nikolay Aleksandrov wrote:

On 07/16/2015 12:39 AM, Eric W. Biederman wrote:

Nikolay Aleksandrov ra...@blackwall.org writes:


From: Nikolay Aleksandrov niko...@cumulusnetworks.com

When the bonding is being unloaded and the netdevice notifier is
unregistered it executes NETDEV_UNREGISTER for each device which should
remove the bond's proc entry but if the device enslaved is not of
ARPHRD_ETHER type and is in front of the bonding, it may execute
bond_release_and_destroy() first which would release the last slave and
destroy the bond device leaving the proc entry and thus we will get the
following error (with dynamic debug on for bond_netdev_event to see the
events order):

I see the failure below.  I am perplexed at the description.  Does the
bonding driver actually make sense on a non-ethernet device?


Sometimes it does, I've seen it used with infiniband devices a lot, also
there were cases of some ppp users.


Would it not be better to make more sense to limit bonding to ethernet
devices so we don't get weird behavior?  I imagine there might be other
problems with bonding non-ethernet devices that no one has spotted,
or cares about.

Eric



My personal opinion would be to disable non-ethernet devices, but support was
already added and has been there for a long time so we have to fix this for
the older releases, I don't mind removing non-ethernet device support for 
net-next
but I'm guessing there're people still using that like the case that started
this thread.

Cheers,
  Nik

Yes, there are Infiniband users that uses bonding.

Carol

[  908.963051] eql: event: 9
[  908.963052] eql: IFF_SLAVE
[  908.963054] eql: event: 2
[  908.963056] eql: IFF_SLAVE
[  908.963058] eql: event: 6
[  908.963059] eql: IFF_SLAVE
[  908.963110] bond0: Releasing active interface eql
[  908.976168] bond0: Destroying bond bond0
[  908.976266] bond0 (unregistering): Released all slaves
[  908.984097] [ cut here ]
[  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
remove_proc_entry+0x112/0x160()
[  908.984110] remove_proc_entry: removing non-empty directory
'net/bonding', leaking at least 'bond0'
[  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
usb_common [last unloaded: bonding]

[  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: GW  O
4.2.0-rc2+ #8
[  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  908.984172]   81732d41 81525b34
8800358dfda8
[  908.984175]  8106c521 88003595af78 88003595af40
88003e3a4280
[  908.984178]  a058d040  8106c59a
8172ebd0
[  908.984181] Call Trace:
[  908.984188]  [81525b34] ? dump_stack+0x40/0x50
[  908.984193]  [8106c521] ? warn_slowpath_common+0x81/0xb0
[  908.984196]  [8106c59a] ? warn_slowpath_fmt+0x4a/0x50
[  908.984199]  [81218352] ? remove_proc_entry+0x112/0x160
[  908.984205]  [a05850e6] ? bond_destroy_proc_dir+0x26/0x30
[bonding]
[  908.984208]  [a057540e] ? bond_net_exit+0x8e/0xa0 [bonding]
[  908.984217]  [8142f407] ? ops_exit_list.isra.4+0x37/0x70
[  908.984225]  [8142f52d] ?
unregister_pernet_operations+0x8d/0xd0
[  908.984228]  [8142f58d] ?
unregister_pernet_subsys+0x1d/0x30
[  908.984232]  [a0585269] ? bonding_exit+0x23/0xdba [bonding]
[  908.984236]  [810e28ba] ? SyS_delete_module+0x18a/0x250
[  908.984241]  [81086f99] ? task_work_run+0x89/0xc0
[  908.984244]  [8152b732] ?
entry_SYSCALL_64_fastpath+0x16/0x75
[  908.984247] ---[ end trace 7c006ed4abbef24b ]---

Thus remove the proc entry manually if bond_release_and_destroy() is
used. Because of the checks in bond_remove_proc_entry() it's not a
problem for a bond device to change namespaces (the bug fixed by the
Fixes commit) but since commit
f9399814927ad (bonding: Don't allow bond devices to change network
namespaces.) that can't happen anyway.

Reported-by: Carol Soto cls...@linux.vnet.ibm.com
Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
Fixes: a64d49c3dd50 (bonding: Manage /proc/net/bonding/ entries from
   the netdev events)
---
  drivers/net/bonding/bond_main.c | 1 +
  1 file changed, 1

Re: [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether

2015-07-15 Thread Carol Soto



On 7/15/2015 2:52 PM, Nikolay Aleksandrov wrote:

From: Nikolay Aleksandrov niko...@cumulusnetworks.com

When the bonding is being unloaded and the netdevice notifier is
unregistered it executes NETDEV_UNREGISTER for each device which should
remove the bond's proc entry but if the device enslaved is not of
ARPHRD_ETHER type and is in front of the bonding, it may execute
bond_release_and_destroy() first which would release the last slave and
destroy the bond device leaving the proc entry and thus we will get the
following error (with dynamic debug on for bond_netdev_event to see the
events order):
[  908.963051] eql: event: 9
[  908.963052] eql: IFF_SLAVE
[  908.963054] eql: event: 2
[  908.963056] eql: IFF_SLAVE
[  908.963058] eql: event: 6
[  908.963059] eql: IFF_SLAVE
[  908.963110] bond0: Releasing active interface eql
[  908.976168] bond0: Destroying bond bond0
[  908.976266] bond0 (unregistering): Released all slaves
[  908.984097] [ cut here ]
[  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
remove_proc_entry+0x112/0x160()
[  908.984110] remove_proc_entry: removing non-empty directory
'net/bonding', leaking at least 'bond0'
[  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
usb_common [last unloaded: bonding]

[  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: GW  O
4.2.0-rc2+ #8
[  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  908.984172]   81732d41 81525b34
8800358dfda8
[  908.984175]  8106c521 88003595af78 88003595af40
88003e3a4280
[  908.984178]  a058d040  8106c59a
8172ebd0
[  908.984181] Call Trace:
[  908.984188]  [81525b34] ? dump_stack+0x40/0x50
[  908.984193]  [8106c521] ? warn_slowpath_common+0x81/0xb0
[  908.984196]  [8106c59a] ? warn_slowpath_fmt+0x4a/0x50
[  908.984199]  [81218352] ? remove_proc_entry+0x112/0x160
[  908.984205]  [a05850e6] ? bond_destroy_proc_dir+0x26/0x30
[bonding]
[  908.984208]  [a057540e] ? bond_net_exit+0x8e/0xa0 [bonding]
[  908.984217]  [8142f407] ? ops_exit_list.isra.4+0x37/0x70
[  908.984225]  [8142f52d] ?
unregister_pernet_operations+0x8d/0xd0
[  908.984228]  [8142f58d] ?
unregister_pernet_subsys+0x1d/0x30
[  908.984232]  [a0585269] ? bonding_exit+0x23/0xdba [bonding]
[  908.984236]  [810e28ba] ? SyS_delete_module+0x18a/0x250
[  908.984241]  [81086f99] ? task_work_run+0x89/0xc0
[  908.984244]  [8152b732] ?
entry_SYSCALL_64_fastpath+0x16/0x75
[  908.984247] ---[ end trace 7c006ed4abbef24b ]---

Thus remove the proc entry manually if bond_release_and_destroy() is
used. Because of the checks in bond_remove_proc_entry() it's not a
problem for a bond device to change namespaces (the bug fixed by the
Fixes commit) but since commit
f9399814927ad (bonding: Don't allow bond devices to change network
namespaces.) that can't happen anyway.

Reported-by: Carol Soto cls...@linux.vnet.ibm.com
Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
Fixes: a64d49c3dd50 (bonding: Manage /proc/net/bonding/ entries from
   the netdev events)
---
  drivers/net/bonding/bond_main.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 317a49480475..ec1404ec4d2f 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1916,6 +1916,7 @@ static int  bond_release_and_destroy(struct net_device 
*bond_dev,
bond_dev-priv_flags |= IFF_DISABLE_NETPOLL;
netdev_info(bond_dev, Destroying bond %s\n,
bond_dev-name);
+   bond_remove_proc_entry(bond);
unregister_netdevice(bond_dev);
}
return ret;

Tested-by: Carol L Soto cls...@linux.vnet.ibm.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html