[ewg] Re: [PATCH 1/3] IB/ehca: Replace vmalloc with kmalloc
Hi Roland, thanks for the quick review. I was hoping you could apply these changes for 2.6.30 because this will be the codebase for the next OFED release. The patch is well tested in HPC environment and we haven't seen any issues. Regarding Antons patch you are right. If a user allocates an unrealistically large queue pair it could happen that kmalloc() is not able to allocate the memory. In this case we will return ENOMEM to the user so the kernel will not be affected at all. We plan to add vmalloc() call in case kmalloc() fails for the next kernel release. Mit freundlichen Grüßen / Kind regards Stefan Roscher eHCA/eHEA Linux Driver Development IBM Systems Technology Group, Systems Software Development / FW I/O Firmware Entwicklung 2 --- IBM Deutschland Schoenaicher Str. 220 71032 Boeblingen Phone: +49-7031-16-2015 E-Mail: stefan.rosc...@de.ibm.com --- IBM Deutschland Research Development GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Roland Dreier rdre...@cisco.com To: Stefan Roscher ossro...@linux.vnet.ibm.com Cc: LinuxPPC-Dev linuxppc-...@ozlabs.org, LKML linux-ker...@vger.kernel.org, OF-EWG ewg@lists.openfabrics.org, Roland Dreier rola...@cisco.com, Joachim Fenkes/Germany/i...@ibmde, Christoph Raisch/Germany/i...@ibmde, Alexander Schmidt1/Germany/i...@ibmde, Stefan Roscher/Germany/i...@ibmde, Hoang-Nam Nguyen/Germany/i...@ibmde Date: 21.04.2009 19:34 Subject: Re: [PATCH 1/3] IB/ehca: Replace vmalloc with kmalloc + queue-queue_pages = kmalloc(nr_of_pages * sizeof(void *), GFP_KERNEL); How big might this buffer be? Any chance of allocation failure due to memory fragmentation? - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [GIT PULL] RDMA/nes: Turn off dynamic interrupt moderation as default
Tung, Chien Tin wrote: Vlad, Please pull nes OFED 1.4.1 RC4 update from: git://git.openfabrics.org/~ctung/ofed-1.4.1.git ofed_kernel It contains commit 4abc408a8605ac8e5f1829e3e79d6c56dbac1ce7 * Turn off dynamic interrupt moderation as default Thanks, Chien -- Chien Tung | chien.tin.t...@intel.com Done, Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] EWG/OFED meeting minutes for Apr 20, 09
Steve Wise wrote: What is the cut-off date to have all the backports done (not new features, just backports of 2.6.30 functionality to the older kernels)? Is it RC1? Target should be RC1, but I am sure some of the OSes will be only done in RC2 Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH 1/3] IB/ehca: Replace vmalloc with kmalloc
In case of large queue pairs there is the possibillity of allocation failures due to memory fragmentationo with kmalloc().To ensure the memory is allocated even if kmalloc() can not find chunks which are big enough, we try to allocate the memory with vmalloc(). Signed-off-by: Stefan Roscher stefan.rosc...@de.ibm.com --- On Tuesday 21 April 2009 07:34:30 pm Roland Dreier wrote: + queue-queue_pages = kmalloc(nr_of_pages * sizeof(void *), GFP_KERNEL); How big might this buffer be? Any chance of allocation failure due to memory fragmentation? - R. Hey Roland, yes you are right and here is the patch to circumvent the described problem. It will apply on top of the patchset. regards Stefan drivers/infiniband/hw/ehca/ipz_pt_fn.c | 17 + 1 files changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.c b/drivers/infiniband/hw/ehca/ipz_pt_fn.c index a260559..1227c59 100644 --- a/drivers/infiniband/hw/ehca/ipz_pt_fn.c +++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.c @@ -222,8 +222,11 @@ int ipz_queue_ctor(struct ehca_pd *pd, struct ipz_queue *queue, /* allocate queue page pointers */ queue-queue_pages = kmalloc(nr_of_pages * sizeof(void *), GFP_KERNEL); if (!queue-queue_pages) { - ehca_gen_err(Couldn't allocate queue page list); - return 0; + queue-queue_pages = vmalloc(nr_of_pages * sizeof(void *)); + if (!queue-queue_pages) { + ehca_gen_err(Couldn't allocate queue page list); + return 0; + } } memset(queue-queue_pages, 0, nr_of_pages * sizeof(void *)); @@ -240,7 +243,10 @@ int ipz_queue_ctor(struct ehca_pd *pd, struct ipz_queue *queue, ipz_queue_ctor_exit0: ehca_gen_err(Couldn't alloc pages queue=%p nr_of_pages=%x, queue, nr_of_pages); - kfree(queue-queue_pages); + if (is_vmalloc_addr(queue-queue_pages)) + vfree(queue-queue_pages); + else + kfree(queue-queue_pages); return 0; } @@ -262,7 +268,10 @@ int ipz_queue_dtor(struct ehca_pd *pd, struct ipz_queue *queue) free_page((unsigned long)queue-queue_pages[i]); } - kfree(queue-queue_pages); + if (is_vmalloc_addr(queue-queue_pages)) + vfree(queue-queue_pages); + else + kfree(queue-queue_pages); return 1; } -- 1.5.5 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH 1/3] IB/ehca: Replace vmalloc with kmalloc
On Wednesday 22 April 2009 04:10:18 pm michael wrote: Hi, I don't take the point, if it is not import use the vmalloc. Why you try with a kmalloc alloc first? and why do not use kzalloc? Because kmalloc() is faster than vmalloc() causing a huge performance win when someone allocates a large number of queue pairs. We fall back to vmalloc() only if kmalloc() can't deliver the memory chunk. We don't need kzalloc because we fill the list right after the alloc. regards Stefan ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] ipoib: disable napi while cq is being drained
Jack Morgenstein wrote: On Friday 17 April 2009 18:26, Yossi Etigin wrote: - ipoib_dbg(priv, bringing up interface\n); - -- if (!test_and_set_bit(IPOIB_FLAG_ADMIN_UP, priv-flags)) -- napi_enable(priv-napi); -+ set_bit(IPOIB_FLAG_ADMIN_UP, priv-flags); - - if (ipoib_pkey_dev_delay_open(dev)) - return 0; - -- if (ipoib_ib_dev_open(dev)) { -- napi_disable(priv-napi); -- return -EINVAL; -- } -+ if (ipoib_ib_dev_open(dev)) -+ return -EINVAL; I think there is a bug here in the error flow. You do set_bit(IPOIB_FLAG_ADMIN_UP, priv-flags); However, if there is an error return, you do not do clear_bit(IPOIB_FLAG_ADMIN_UP, priv-flags); Note that in the patch you prepared for Roland (in the general list), the clear_bit is done properly. (you probably need to arrange for an err_out: goto label which will do the clear_bit and return -EINVAL). - Jack P.S. we need the fix for ofed 1.4.1 ASAP. You are right - there probably is a bug here, the bit should be cleared. However, seems like it was there before this patch too. (the only relevant change in the patch is test_and_set_bit - set_bit) --Yossi ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH] ipoib: clear IPOIB_FLAG_ADMIN_UP if ipoib_open fails
If ipoib_open() fails, it should clear IPOIB_FLAG_ADMIN_UP bit and not leave if on. This is already fixed in 2.6.30. Reported-by: Jack Morgenstein ja...@dev.mellanox.co.il Signed-off-by: Yossi Etigin yos...@voltaire.com --- Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2009-04-22 19:45:11.0 +0300 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2009-04-22 19:51:51.0 +0300 @@ -112,11 +112,11 @@ int ipoib_open(struct net_device *dev) return 0; if (ipoib_ib_dev_open(dev)) - return -EINVAL; + goto err; if (ipoib_ib_dev_up(dev)) { ipoib_ib_dev_stop(dev, 1); - return -EINVAL; + goto err; } if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) { @@ -139,6 +139,9 @@ int ipoib_open(struct net_device *dev) netif_start_queue(dev); return 0; +err: + clear_bit(IPOIB_FLAG_ADMIN_UP, priv-flags); + return -EINVAL; } static int ipoib_stop(struct net_device *dev) ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] ipoib: clear IPOIB_FLAG_ADMIN_UP if ipoib_open fails
Yossi Etigin wrote: If ipoib_open() fails, it should clear IPOIB_FLAG_ADMIN_UP bit and not leave if on. This is already fixed in 2.6.30. Reported-by: Jack Morgenstein ja...@dev.mellanox.co.il Signed-off-by: Yossi Etigin yos...@voltaire.com --- Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2009-04-22 19:45:11.0 +0300 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2009-04-22 19:51:51.0 +0300 @@ -112,11 +112,11 @@ int ipoib_open(struct net_device *dev) return 0; if (ipoib_ib_dev_open(dev)) - return -EINVAL; + goto err; if (ipoib_ib_dev_up(dev)) { ipoib_ib_dev_stop(dev, 1); - return -EINVAL; + goto err; } if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) { @@ -139,6 +139,9 @@ int ipoib_open(struct net_device *dev) netif_start_queue(dev); return 0; +err: + clear_bit(IPOIB_FLAG_ADMIN_UP, priv-flags); + return -EINVAL; } static int ipoib_stop(struct net_device *dev) Hi Yossi, Please send this patch in the OFED format: kernel_patches/fixes/ipoib... with backports (if required). Thanks, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] ipoib: clear IPOIB_FLAG_ADMIN_UP if ipoib_open fails
Vladimir Sokolovsky wrote: Hi Yossi, Please send this patch in the OFED format: kernel_patches/fixes/ipoib... with backports (if required). Thanks, Vladimir Here is: diff --git a/kernel_patches/fixes/ipoib_0560_clear_admin_up_flag.patch b/kernel_patches/fixes/ipoib_0560_clear_admin_up_flag.patch new file mode 100644 index 000..78b8a67 --- /dev/null +++ b/kernel_patches/fixes/ipoib_0560_clear_admin_up_flag.patch @@ -0,0 +1,39 @@ +ipoib: clear IPOIB_FLAG_ADMIN_UP if ipoib_open fails + + If ipoib_open() fails, it should clear IPOIB_FLAG_ADMIN_UP bit and not +leave if on. +This is already fixed in 2.6.30. + +Reported-by: Jack Morgenstein ja...@dev.mellanox.co.il +Signed-off-by: Yossi Etigin yos...@voltaire.com + +--- + +Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c +=== +--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c2009-04-22 19:45:11.0 +0300 b/drivers/infiniband/ulp/ipoib/ipoib_main.c2009-04-22 19:51:51.0 +0300 +@@ -112,11 +112,11 @@ int ipoib_open(struct net_device *dev) + return 0; + + if (ipoib_ib_dev_open(dev)) +- return -EINVAL; ++ goto err; + + if (ipoib_ib_dev_up(dev)) { + ipoib_ib_dev_stop(dev, 1); +- return -EINVAL; ++ goto err; + } + + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) { +@@ -139,6 +139,9 @@ int ipoib_open(struct net_device *dev) + netif_start_queue(dev); + + return 0; ++err: ++ clear_bit(IPOIB_FLAG_ADMIN_UP, priv-flags); ++ return -EINVAL; + } + + static int ipoib_stop(struct net_device *dev) ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] ipoib: clear IPOIB_FLAG_ADMIN_UP if ipoib_open fails
Yossi Etigin wrote: Vladimir Sokolovsky wrote: Hi Yossi, Please send this patch in the OFED format: kernel_patches/fixes/ipoib... with backports (if required). Thanks, Vladimir Applied, Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] New Open MPI 1.3.2 SRPM uploaded
To the usual place. Should be included in the nightly 1.4.1 build. -- Jeff Squyres Cisco Systems ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH OFED-1.4.1] openibd: handle NFS-RDMA deps
There are new dependencies on the ib_core module from the NFS-RDMA modules. Since the NFS-RDMA modules are not unloaded in the current version, the ib_core module cannot be removed and the openibd stop will fail. By stopping all of the NFS services and unloading any dependent modules, the ib_core module can be unloaded and openibd can be successfully stopped. Signed-Off-By: Jon Mason j...@opengridcomputing.com --- diff --git a/ofed_scripts/openibd b/ofed_scripts/openibd index 4a79e2c..777a891 100755 --- a/ofed_scripts/openibd +++ b/ofed_scripts/openibd @@ -293,10 +293,10 @@ GEN1_UNLOAD_MODULES=ib_srp_target scsi_target ib_srp kdapltest_module ib_kdapl UNLOAD_MODULES=ib_mthca mlx4_ib ib_ipath ipath_core ib_ehca iw_nes iw_cxgb3 cxgb3 UNLOAD_MODULES=$UNLOAD_MODULES ib_ipoib ib_madeye ib_rds UNLOAD_MODULES=$UNLOAD_MODULES rds ib_ucm kdapl ib_srp_target scsi_target ib_srpt ib_srp ib_iser ib_sdp -UNLOAD_MODULES=$UNLOAD_MODULES rdma_ucm rdma_cm ib_addr ib_cm ib_local_sa findex +UNLOAD_MODULES=$UNLOAD_MODULES rdma_ucm svcrdma xprtrdma rdma_cm ib_addr ib_cm ib_local_sa findex UNLOAD_MODULES=$UNLOAD_MODULES ib_sa ib_uverbs ib_umad ib_mad ib_core -STATUS_MODULES=rdma_ucm ib_rds rds ib_srpt ib_srp qlgc_vnic ib_sdp rdma_cm ib_addr ib_local_sa findex ib_ipoib ib_ehca ib_ipath ipath_core mlx4_core mlx4_ib mlx4_en ib_mthca ib_uverbs ib_umad ib_ucm ib_sa ib_cm ib_mad ib_core iw_cxgb3 iw_nes +STATUS_MODULES=rdma_ucm ib_rds rds ib_srpt ib_srp qlgc_vnic ib_sdp svcrdma xprtrdma rdma_cm ib_addr ib_local_sa findex ib_ipoib ib_ehca ib_ipath ipath_core mlx4_core mlx4_ib mlx4_en ib_mthca ib_uverbs ib_umad ib_ucm ib_sa ib_cm ib_mad ib_core iw_cxgb3 iw_nes ipoib_ha_pidfile=/var/run/ipoib_ha.pid srp_daemon_pidfile=/var/run/srp_daemon.pid @@ -1297,11 +1297,28 @@ stop() fi fi - if [ -d /sys/class/infiniband_qlgc_vnic/ ]; then -if [ -x /etc/init.d/qlgc_vnic ]; then + if [ -d /sys/class/infiniband_qlgc_vnic/ ]; then + if [ -x /etc/init.d/qlgc_vnic ]; then /etc/init.d/qlgc_vnic stop 21 1/dev/null -fi fi + fi + + if [ -d /sys/module/nfs ]; then + rmmod nfs /dev/null 21 + fi + + if [ -d /sys/module/nfsd ]; then + if [ -x /etc/init.d/nfsserver ]; then + #For SLES + /etc/init.d/nfsserver stop + else + #For RHEL + /etc/init.d/rpcidmapd stop + umount /proc/fs/nfsd + fi + /etc/init.d/nfs stop + rmmod nfsd /dev/null 21 + fi # Unload modules if [ $UNLOAD_MODULES != ]; then ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg