Re: [ofa-general] This list expires... tomorrow?

2009-09-29 Thread Roland Dreier
Hi all. I propose the following plan to shutdown the general list: 1) unsubscribe all current subscribers 2) set the list to discard any incoming messages with an auto-discard message that points you to linux-r...@vger.kernel.org Sounds like a perfect plan to me... sorry for not

Re: [ofa-general] This list expires... tomorrow?

2009-09-29 Thread Roland Dreier
It's probably just me but I'm not ready yet. I haven't been able to post a patch to linux-rdma yet :-( What is going wrong when you try? - R. ___ general mailing list general@lists.openfabrics.org

Re: [ofa-general] [Bug 14235] New: SRP initiator lockup

2009-09-28 Thread Roland Dreier
If an SRP target processes SRP I/O slow enough, the SRP initiator locks up. INFO: task fio:6389 blocked for more than 120 seconds. echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. fio D 0 6389 6388 0x 880071dc5bd8

[ofa-general] Re: [PATCH v2] mlx4: configure cache line size

2009-09-24 Thread Roland Dreier
thanks, applied. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Re: [PATCH] mthca: Fix access to freed memory in catas processing

2009-09-24 Thread Roland Dreier
Thanks, applied. I almost missed this one because you sent it to general@ and not linux-r...@vger.kernel.org -- and I was using the fancy patchwork.kernel.org patch tracking stuff to see what I had to apply. So sending things to linux-rdma@ helps you too! Thanks, Roland

[ofa-general] Re: [PATCH V2] IB/ipoib: Do not turn on carrier to a non active port

2009-09-24 Thread Roland Dreier
thanks, applied. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] [GIT PULL] please pull infiniband.git

2009-09-24 Thread Roland Dreier
: Don't turn on carrier for a non-active port Roland Dreier (2): IB/mad: Fix lock-lock-timer deadlock in RMPP code Merge branches 'ipoib', 'mad', 'mlx4', 'mthca' and 'nes' into for-linus drivers/infiniband/core/mad_rmpp.c | 17 + drivers/infiniband/hw/mthca

[ofa-general] Re: [PATCH/RFC] IB/mad: Fix lock-lock-timer deadlock in RMPP code

2009-09-23 Thread Roland Dreier
Reviewed-by: Sean Hefty sean.he...@intel.com Thanks, I applied this. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit

[ofa-general] Re: [PATCH] mlx4: confiugre cache line size

2009-09-22 Thread Roland Dreier
+#if defined(cache_line_size) Why the #if here? Do we just need to include linux/cache.h explicitly to make sure we get the define? +*((u8 *) mailbox-buf + INIT_HCA_CACHELINE_SZ_OFFSET) = +order_base_2(cache_line_size() / 16) 5; Trivial but I think it's safe to assume a

[ofa-general] Re: [PATCH/RFC] IB/mad: Fix lock-lock-timer deadlock in RMPP code

2009-09-22 Thread Roland Dreier
The locking is needed to protect against items being removed from rmpp_list in recv_timeout_handler() and recv_cleanup_handler(). No new items should be added to the rmpp_list when ib_cancel_rmpp_recvs() is running (or there's a separate bug). OK so how about something like

[ofa-general] Re: [PATCH] mlx4: confiugre cache line size

2009-09-22 Thread Roland Dreier
I agree with you on both comments. Would you like me to resend or will you make the necessary changes? please resend, thanks. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To

[ofa-general] Re: [PATCH] IB/ipoib: Do not turn on carrier to a non active port

2009-09-21 Thread Roland Dreier
I may miss this but I don't see how ipoib_ib_dev_down() is called with rtnl held. It's called from ipoib_stop(), and .ndo_stop is called with rtnl held. Anyway, the new patch doesn't use delayed work. great ___ general mailing list

Re: [ofa-general] Re: [GIT PULL] please pull ummunotify

2009-09-17 Thread Roland Dreier
Anton Blanchard suggested a while back that this might be integrated with perf-counters, since perf-counters already does mmap() tracking and also provides events through an mmap()'ed buffer. Has anybody looked into this? I didn't see the original suggestion. Certainly hooking in to

Re: [ofa-general] Re: [GIT PULL] please pull ummunotify

2009-09-17 Thread Roland Dreier
So getting those events in the kernel is no problem -- we have the MMU notifier hooks that tell us exactly what we need to know. The issue is purely the way userspace registers interest in address ranges, and how to kernel returns the events. For perf counters it seems that one

[ofa-general] Re: [PATCH] IB/ipoib: Do not turn on carrier to a non active port

2009-09-17 Thread Roland Dreier
+if (ib_query_port(priv-ca, priv-port, attr) || +attr.state != IB_PORT_ACTIVE) { +ipoib_dbg(priv, wait with carrier until IB port is active\n); +if (test_bit(IPOIB_FLAG_OPER_UP, priv-flags)) +queue_delayed_work(ipoib_workqueue,

Re: [ofa-general] Re: [GIT PULL] please pull ummunotify

2009-09-17 Thread Roland Dreier
Hmm, or are you saying you can only get 1 event per registered range and allocate the thing on registration? That'd need some registration limit to avoid DoS scenarios. Yes, that's what I do. You're right, I should add a limit... although their are lots of ways for userspace to consume

Re: [ofa-general] Re: [GIT PULL] please pull ummunotify

2009-09-17 Thread Roland Dreier
Hmm, or are you saying you can only get 1 event per registered range and allocate the thing on registration? That'd need some registration limit to avoid DoS scenarios. Yes, that's what I do. You're right, I should add a limit... although their are lots of ways for

[ofa-general] Re: [PATCH] IB/ipoib: Do not turn on carrier to a non active port

2009-09-17 Thread Roland Dreier
And by the way, this current patch has a deadlock I think: @@ -724,6 +724,8 @@ int ipoib_ib_dev_down(struct net_device *dev, int flush) ipoib_dbg(priv, downing ib_dev\n); clear_bit(IPOIB_FLAG_OPER_UP, priv-flags); +cancel_delayed_work(priv-carrier_on_task);

[ofa-general] Re: Merge process for OFED patches

2009-09-16 Thread Roland Dreier
I noticed that there are seven SRP patches (bug fixes) present in OFED 1.4.1 that are not present in mainstream Linux kernels up to and including version 2.6.30. Do you know whether it is documented anywhere which process is followed for merging such patches in the mainstream Linux

[ofa-general] Re: [GIT PULL] please pull ummunotify

2009-09-16 Thread Roland Dreier
several review iterations on lkml and was in -mm and -next for quite a few weeks. Andrew is OK with merging it (I think -- Andrew please correct me if I misunderstood you). Roland Dreier (1): ummunotify: Userspace support for MMU notifications Documentation/Makefile

Re: [ofa-general] Re: [GIT PULL] please pull ummunotify

2009-09-15 Thread Roland Dreier
- I guess you have your MPI implementaion w/ ummunotify, right? Yes, Jeff Squyres (cc'ed) has an Open MPI prototype (mercurial tree at http://bitbucket.org/jsquyres/ummunot/). - I guess you have test sevaral pattern, right? if so, can we see your test result? Open MPI has a pretty

Re: [ofa-general] Re: [GIT PULL] please pull ummunotify

2009-09-15 Thread Roland Dreier
I don't remember seeing discussion of this on lkml. Yes it is in -next... eg http://lkml.org/lkml/2009/7/31/197 and followups, or search for v2 and earlier patches. Basically it allows app to 'trace itself'? ...with interesting mmap() interface, exporting int to userspace, hoping it

Re: [ofa-general] RE: Does the CMA user space support join multicast for IPv6 too?

2009-09-15 Thread Roland Dreier
Does ipoib map IPv6 multicast addresses to MGIDs directly? No, the IPv6 network stack is responsible for doing the mapping and passing the HW (ie IPoIB) address in... cf ipv6_ib_mc_map() in include/net/if_inet6.h ___ general mailing list

Re: [ofa-general] bug in ibv_rc_pingpong.c and ibv_uc_pingpong.c in ofed_1_4/libibverbs.git

2009-09-14 Thread Roland Dreier
Case m is missing a handy break statement. Setting mtu also sets rx_depth. Can workaround by following -m by -r which resets rx_depth. Yes, looks that way. Care to send a patch? - R. ___ general mailing list general@lists.openfabrics.org

[ofa-general] Re: [PATCH] Do not use enum object types for bitfields

2009-09-14 Thread Roland Dreier
OK, let's try this out and see how it goes... applied, thanks. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit

[ofa-general] Re: [PATCH mthca] Update function prototypes to match ibverbs

2009-09-14 Thread Roland Dreier
thanks, applied both this and mlx4 version. seems that libcxgb3, libipathverbs and libnes would want similar treatment? - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To

[ofa-general] Re: [PATCH] bug in ibv_rc_pingpong.c and ibv_uc_pingpong.c in ofed_1_4/libibverbs.git

2009-09-14 Thread Roland Dreier
your mailer seems to have mangled whitespace but I applied it by hand, thanks. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit

[ofa-general] Re: [GIT PULL] please pull ummunotify

2009-09-11 Thread Roland Dreier
Can I this version already solved fork() + COW issue? if so, could you please explain what happen at fork. Obviously RDMA point to either parent or child page, not both. but Corrent COW rule is, first touch process get copyed page and other process still own original page. I think it's

[ofa-general] Re: [GIT PULL] please pull ummunotify

2009-09-11 Thread Roland Dreier
My understanding of the code is that fork will end-up calling copy_page_range() on all VMA, and copy_page_range() calls mmu_notifier_invalidate_range_start() if is_cow_mapping() is true, which should be the case here. So you should get some invalidate events on fork. Yes, I agree

Re: [ofa-general] Re: [GIT PULL] please pull ummunotify

2009-09-11 Thread Roland Dreier
So.. What is the problem with fork? The semantics of what should happen seem natural enough to me, the PD doesn't get copied to the child, so the MR stays with the parent. COW events on the pinned region must be resolved so that the physical page stays with the process that has pinned

Re: [ofa-general] mlx4 second port lro issue

2009-09-10 Thread Roland Dreier
Anybody has seen lro doesn't work for port2 but works for port1 before? We found this issue on RHEL5. I haven't seen this. But I haven't really tried. Are you talking about mlx4_en or ipoib with mlx4_ib? What do you mean by lro not working -- the interface doesn't work with lro

[ofa-general] [GIT PULL] please pull infiniband.git

2009-09-10 Thread Roland Dreier
): IB: Use printk_once() for driver versions Roel Kluin (2): IB/ipath: strncpy() doesn't always NUL-terminate RDMA/amso1100: Check kmalloc() result in c2_register_device() Roland Dreier (15): IPoIB: Remove unused rdma/ib_cache.h includes IPoIB: Drop priv-lock before calling

[ofa-general] [GIT PULL] please pull ummunotify

2009-09-10 Thread Roland Dreier
review iterations on lkml and was in -mm and -next for quite a few weeks. Andrew is OK with merging it (I think -- Andrew please correct me if I misunderstood you). Roland Dreier (1): ummunotify: Userspace support for MMU notifications Documentation/Makefile |3

[ofa-general] Re: [PATCH 2/2] RDMA/cxgb3: clean up properly on FW mismatch failures.

2009-09-09 Thread Roland Dreier
thanks, applied both patches. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] [PATCH/RFC] IB/mad: Fix lock-lock-timer deadlock in RMPP code (was: [NEW PATCH] IB/mad: Fix possible lock-lock-timer deadlock)

2009-09-09 Thread Roland Dreier
Holding agent-lock across cancel_delayed_work() (which does del_timer_sync()) in ib_cancel_rmpp_recvs() leads to lockdep reports of possible lock-timer deadlocks if a consumer ever does something that connects agent-lock to a lock taken in IRQ context (cf

[ofa-general] Re: [PATCH/RFC] IB/mad: Fix lock-lock-timer deadlock in RMPP code

2009-09-09 Thread Roland Dreier
The locking is needed to protect against items being removed from rmpp_list in recv_timeout_handler() and recv_cleanup_handler(). No new items should be added to the rmpp_list when ib_cancel_rmpp_recvs() is running (or there's a separate bug). Ah, I see. That's trickier I

[ofa-general] Re: [NEW PATCH] IB/mad: Fix possible lock-lock-timer deadlock

2009-09-08 Thread Roland Dreier
The above issue does not occur with the for-next branch of the infiniband git tree, but does occur with 2.6.31-rc9 + aforementioned patches. As far as I can see commit 721d67cdca5b7642b380ca0584de8dceecf6102f

[ofa-general] Re: [NEW PATCH] IB/mad: Fix possible lock-lock-timer deadlock

2009-09-08 Thread Roland Dreier
Update: patch 721d67cdca5b7642b380ca0584de8dceecf6102f does not apply cleanly to 2.6.31-rc9, so I have been using a slightly modified version of this patch (http://bugzilla.kernel.org/attachment.cgi?id=22624). I have retested the 2.6.31-rc9 kernel with the following patches applied

[ofa-general] [PLEASE READ] Transition from general@lists.openfabrics.org to linux-r...@vger.kernel.org

2009-09-07 Thread Roland Dreier
. Finally, I'll plan to merge the following for 2.6.32: MAINTAINERS: InfiniBand/RDMA mailing list transition to vger InfiniBand/RDMA development discussion is moving from general@lists.openfabrics.org to linux-r...@vger.kernel.org. Signed-off-by: Roland Dreier rola...@cisco.com --- MAINTAINERS

[ofa-general] Re: [NEW PATCH] IB/mad: Fix possible lock-lock-timer deadlock

2009-09-07 Thread Roland Dreier
With 2.6.31-rc9 + patch 4e49627b9bc29a14b393c480e8c979e3bc922ef7 + the patch you posted at the start of this thread the following lockdep complaint was triggered on the SRP initiator system during SRP login: == [ INFO: HARDIRQ-safe

Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.32

2009-09-03 Thread Roland Dreier
What about RDMAoE? Patches were sent few weeks ago and it seems you ignore them. Sorry, I should have mentioned that. Yes, I have been ignoring the patches -- I want to get through XRC first, and also I would like to see a real spec for IBoE (that resolves issues like multicast interaction

Re: [ofa-general] [PATCH] IPoIB: check multicast address format (V2)

2009-09-02 Thread Roland Dreier
thanks for updating, applied. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Re: [PATCHv4] IB/mad: Allow tuning of QP0 and QP1 sizes

2009-09-02 Thread Roland Dreier
applied -- would be nice to have a way to do this automatically instead of yet another tunable to sysadmins to worry about, but oh well. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

Re: [ofa-general] performance to call ibv_poll_cq() vs. call select() on completion channel

2009-09-02 Thread Roland Dreier
I understand that ops.poll_cq is actually ibv_cmd_poll_cq(), right ? No, not for most devices. Look at libmthca, etc to see what the poll_cq method is set to. Do you mean during ibv_poll_cq() call, there is no system call involved ? Right, for most devices poll_cq can be done

[ofa-general] InfiniBand/RDMA merge plans for 2.6.32

2009-09-02 Thread Roland Dreier
Slusarz (1): IB: Use printk_once() for driver versions Roel Kluin (2): IB/ipath: strncpy() doesn't always NUL-terminate RDMA/amso1100: Check kmalloc() result in c2_register_device() Roland Dreier (13): IPoIB: Remove unused rdma/ib_cache.h includes IB: Use DEFINE_SPINLOCK

Re: [ofa-general] [PATCH] IPoIB: check multicast address format

2009-09-01 Thread Roland Dreier
The idea seems sound but checkpatch.pl gives 6 errors for this small patch! Also: +static int check_mcast(const u8 *addr,unsigned int addrlen, + const u8 *broadcast) name of the function could make it clearer what the expected return value is ... eg mcast_addr_is_valid()

[ofa-general] Re: [PATCH] IB/ehca: Fix CQE flags reporting

2009-09-01 Thread Roland Dreier
applied, thanks ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] [PATCHv2 RESEND] IB/IPoIB: Don't let a bad muticast address in the join list stop subsequent joins

2009-09-01 Thread Roland Dreier
Illegal multicast address can be handed for IPoIB from userspace. For example the command ip maddr add 33:33:00:00:00:01 dev ib0 injects an illegal muticast address to IPoIB that will start a join task for this address. However, whenever an illegal multicast address is passed to

[ofa-general] Re: Opinions on moving Linux InfiniBand/RDMA mailing list to vger?

2009-09-01 Thread Roland Dreier
n linux-r...@vger.kernel.org It's there, ready and waiting, should you choose to use it :-) Thanks again... how do we get archive links added -- is it manually? Right now we have http://www.spinics.net/lists/linux-rdma/ http://www.mail-archive.com/linux-r...@vger.kernel.org/ should be up

Re: [ofa-general] [PATCH] IB: dereference of dev-ibdev.iwcm in c2_register_device()

2009-09-01 Thread Roland Dreier
I tend to prefer patches that compile :) -- diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c index 3f2ee64..ad723bd 100644 --- a/drivers/infiniband/hw/amso1100/c2_provider.c +++ b/drivers/infiniband/hw/amso1100/c2_provider.c @@ -874,7

[ofa-general] Re: [PATCH V3] mlx4: Do not allow ib userspace open following a fatal event

2009-08-31 Thread Roland Dreier
Applied, thanks for redoing this. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] [PATCH] IB: dereference of dev-ibdev.iwcm in c2_register_device()

2009-08-31 Thread Roland Dreier
--- a/drivers/infiniband/hw/amso1100/c2_provider.c +++ b/drivers/infiniband/hw/amso1100/c2_provider.c @@ -851,6 +851,10 @@ int c2_register_device(struct c2_dev *dev) dev-ibdev.post_recv = c2_post_receive; dev-ibdev.iwcm = kmalloc(sizeof(*dev-ibdev.iwcm), GFP_KERNEL); +

[ofa-general] Re: [PATCH] IB/ehca: Construct MAD redirect replies from request MAD

2009-08-31 Thread Roland Dreier
this seems reasonable to me, applied, thanks. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] QDR IB cards supports card back to back connectivity

2009-08-28 Thread Roland Dreier
I would like know the QDR Infinibad cards will support to back to back connectivity or not ie with out IB swicth to enable the IB communication between the two machines . Yes, any IB port should be able to connect to any other IB port. You do need a subnet manager (SM) on every IB

[ofa-general] Re: [ewg] [PATCH] IB/ehca: Construct MAD redirect replies from request MAD

2009-08-28 Thread Roland Dreier
Given that you seem to like the rest of the code and Jason hasn't spoken up yet, I think we can have Roland merge this patch. Roland, what do you think? I don't see any problem with the idea and this does sound like a step forward, so I am planning on merging this (pending review).

[ofa-general] Re: Opinions on moving Linux InfiniBand/RDMA mailing list to vger?

2009-08-28 Thread Roland Dreier
It seems we only had positive responses to moving from general@ to a new linux-r...@vger.kernel.org list, so I'll work on a transition plan. For now, please continue to use gene...@lists.openfabrics.org. However, you may want to subscribe to the vger list to be ready for the transition; for

[ofa-general] Re: [PATCH V2] mlx4: Do not allow ib userspace open while device is being removed

2009-08-28 Thread Roland Dreier
checkpatch output: WARNING: suspect code indent for conditional statements (8, 12) #88: FILE: drivers/infiniband/hw/mlx4/main.c:345: + if (!dev-ib_active) + return ERR_PTR(-EAGAIN); ERROR: code indent should use tabs where possible #107: FILE:

[ofa-general] Re: [PATCH] mthca: Do not allow ib userspace open following device internal error

2009-08-28 Thread Roland Dreier
thanks, applied (and thanks for the detailed changelog, that really makes things easier) ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit

[ofa-general] Re: [PATCH v2] mlx4_core: Distinguish multiple IB cards in /proc/interrupts

2009-08-27 Thread Roland Dreier
Thanks, at long last I applied both the mthca and mlx4 versions of these patches (with some cleanups). - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please

Re: [ofa-general] Opinions on moving Linux InfiniBand/RDMA mailing list to vger?

2009-08-21 Thread Roland Dreier
The one question I would have would be do the kernel.org people want to see all of the traffic that we currently have on the open fabrics general list for all of the user-space components? I don't believe there's any problem with that. vger already hosts quite a few lists that are

[ofa-general] Re: Opinions on moving Linux InfiniBand/RDMA mailing list to vger?

2009-08-21 Thread Roland Dreier
linux-r...@vger.kernel.org It's there, ready and waiting, should you choose to use it :-) Thanks! - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please

Re: [ofa-general] [PATCH] uDAPL v2 - dapltest patches for mdep processor yield

2009-08-20 Thread Roland Dreier
+#define DT_Mdep_yield pthread_yield Be aware that on Linux I believe this turns into sched_yield(), which basically means put me at the end of the thread list ie wait for everyone else to get a turn ie possibly huge latency... ___ general mailing

Re: [ofw] Re: [ofa-general] [PATCH] uDAPL v2 - dapltest patches for mdep processor yield

2009-08-20 Thread Roland Dreier
Is sleep(0) a preferred way to go? I think the best solution is not coding spin-loops. Not sure what sleep(0) ends up turning into, but if you can tell the system I'm waiting for this object, wake me up when it's available then that should produce the best behavior. - R.

Re: [ofa-general][PATCH] mlx4_core: Avoid double icms free

2009-08-20 Thread Roland Dreier
thanks, applied. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Better way to get sufficient EQ context memory?

2009-08-20 Thread Roland Dreier
this with possible_cpus=32 and it still works for me -- if you get a chance on your Dell systems that would be helpful too) commit 58cafda0c3010fc2cdb0fc9be3fbd6d09640dd6f Author: Roland Dreier rola...@cisco.com Date: Thu Aug 20 14:26:21 2009 -0700 mlx4_core: Allocate and map sufficient ICM memory for EQ

[ofa-general] Opinions on moving Linux InfiniBand/RDMA mailing list to vger?

2009-08-20 Thread Roland Dreier
Lately, I've had a few emails that I thought would have been of interest to both lkml and also to gene...@lists.openfabrics.org. I've held back on cross-posting them because I know that general@ is subscribers-only, and the bounce messages are quite annoying to replies coming from lkml. The

Re: [ofa-general] ibv_rc_pingpong hangs with forks

2009-08-19 Thread Roland Dreier
The attached modification to ibv_rc_pingpong makes it never complete. Forking seems to do something bad. I noticed that forking right after ibv_post_recv() cancels the posted WR (as if it was never issued): sender keeps retrying. Is it expected behavior or a bug? Yes, forking is

Re: [ofa-general] What does IBV_WC_REM_OP_ERR after a verb send indicate?

2009-08-18 Thread Roland Dreier
I am getting this error on a verb send operation and I can't figure out what could be the cause; I searched for all instances of this error in the IB code and while I found 4, none was illuminating. IBV_WC_REM_OP_ERR corresponds to Remote Operation Error, which the IB spec describes as:

Re: [ofa-general] librdmacm - okay to select on a cm channel's file descriptor?

2009-08-16 Thread Roland Dreier
In an attempt to get unexpected DISCONNECT notifications during ib communication, I'm trying to use 'select()' on the cm channel's file descriptor, testing it for readability. I've found that this works some of the time, but not all of the time. What happens when it doesn't work?

[ofa-general] [PATCH/RFC] IB/mad: Fix possible deadlock (cancel_delayed_work inside spinlock)

2009-08-14 Thread Roland Dreier
How about this approach? Basically it just open-codes delayed work by splitting the timer and the work struct, and switches to mod_timer() instead of del_timer() + add_timer(). It passes very light testing here (basically I started ipoib and nothing blew up). --- drivers/infiniband/core/mad.c

Re: [ofa-general] [PATCHv3] IB/mad: Allow tuning of QP0 and QP1 sizes

2009-08-12 Thread Roland Dreier
Changed module paramater permissions to 0644 Does it really work if someone changes the module parameter at runtime after the module is loaded? - R. ___ general mailing list general@lists.openfabrics.org

[ofa-general] Re: [PATCH V2] mlx4: Do not allow ib userspace open while device is being removed

2009-08-11 Thread Roland Dreier
this is a continuation of thread: http://lists.openfabrics.org/pipermail/general/2009-July/060668.html Thanks for the pointer... it lets me reload my context. I see you didn't answer the question about mthca -- does it suffer from this problem as well? - R.

[ofa-general] Re: [PATCH 10/14] infiniband: use printk_once

2009-08-11 Thread Roland Dreier
thanks, applied. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Re: 2.6.30.1: possible irq lock inversion dependency detected

2009-08-11 Thread Roland Dreier
Even if it is really unlikely that this lock cycle would cause a deadlock, it would be great if this lock cycle could be removed. I'm not the only developer of kernel modules who runs tests with lockdep enabled, and it is unpractical to analyze long logfiles full of known lock cycles to

Re: [ofa-general] mlx4: device driver tries to sync DMA memory it has not allocated

2009-08-10 Thread Roland Dreier
Has anyone ever encountered a message like the one below ? This message was generated while booting a 2.6.30.4 kernel with CONFIG_DMA_API_DEBUG=y and before any out-of-tree kernel modules were loaded. [ cut here ] WARNING: at lib/dma-debug.c:635

Re: [ofa-general] Re: 2.6.30.1: possible irq lock inversion dependency detected

2009-08-10 Thread Roland Dreier
The lockdep report I obtained this morning with a 2.6.30.4 kernel and the two patches applied has been attached to the kernel bugzilla entry. This lockdep report was generated while testing the SRPT target software. I have double checked that the SRPT target implementation does not hold

[ofa-general] Re: [PATCH 10/14] infiniband: use printk_once

2009-08-09 Thread Roland Dreier
drivers/infiniband/hw/cxgb3/iwch.c |4 +--- drivers/infiniband/hw/mlx4/main.c |6 +- --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -540,15 +540,11 @@ static struct device_attribute *mlx4_class_attributes[] = { static void

[ofa-general] Re: [PATCH] mlx4_core: map sufficient ICM memory for EQs

2009-08-07 Thread Roland Dreier
Thanks, applied with a few cleanups: ilog2(roundup_pow_of_two()) - order_base_2() xxx * (1 yy) - xxx yy ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please

Re: [ofa-general] [PATCH v2 2/2] RDMA/cxgb3: wake up any waiters on peer close/abort.

2009-08-07 Thread Roland Dreier
thanks for respinning, got em both. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Re: sg_reset can trigger a NULL pointer dereference in the SRP initiator

2009-08-07 Thread Roland Dreier
A fix like the one below ? I think this gets us part of the way, but not quite. --- linux-2.6.30.4/drivers/infiniband/ulp/srp/ib_srp-orig.c 2009-08-03 12:13:11.0 +0200 +++ linux-2.6.30.4/drivers/infiniband/ulp/srp/ib_srp.c 2009-08-07 10:23:27.0 +0200 @@

Re: [ofa-general] IB kernel modules and the kobject release() method

2009-08-06 Thread Roland Dreier
After having enabled CONFIG_DEBUG_KOBJECT=y in the kernel config I noticed that messages appeared in the kernel log about the IB modules missing a kobject release() method. This happens both with a vanilla 2.6.30.4 kernel and with a 2.6.27.29 kernel + OFED 1.4.1. Has anyone noticed

Re: [ofa-general] [PATCHv4 10/10] mlx4: Add RDMAoE support - allow interfaces to correspond to each other

2009-08-06 Thread Roland Dreier
What about multicast though? Switches are going to have trouble with group membership lists for non IP packets.. Even just sending a ICMPv6 packet (with an IPv6 ethertype) isn't guaranteed to fix it. In this patch set, all multicast packets use the broadcast mac. We will address

[ofa-general] Re: sg_reset can trigger a NULL pointer dereference in the SRP initiator

2009-08-06 Thread Roland Dreier
Specifically scmnd-host_scribble can just be Zero. I see at last, thanks! The issue is that SRP is using host_scribble to hold an index, and index 0 is valid for us. I guess the fix is a bit complex, but basically we should use host_scribble to point to the request, and if we don't find a

Re: [ofa-general] IB kernel modules and the kobject release() method

2009-08-06 Thread Roland Dreier
Next I started (/etc/init.d/openibd start; /etc/init.d/opensmd start) and then stopped (/etc/init.d/opensmd stop; /etc/init.d/openibd stop) the IB subsystem. The broken message was logged during module unload only, not during module load. Oh I see... yes I get it on unload too, for any

Re: [ofa-general] IB kernel modules and the kobject release() method

2009-08-06 Thread Roland Dreier
Are you sure that this indicates a shortcoming in the kobject debugging code ? The most recent messages related to the message does not have a release() function, it is broken and must be fixed I could find on the LKML date from July 16, 2009 (http://lkml.org/lkml/2009/7/16/306 and

[ofa-general] Re: [PATCH 5/5] RDMA/nes: Rework the disconn routine for terminate and flushing

2009-08-05 Thread Roland Dreier
thanks, applied all 10 pending patches. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Re: [PATCH 2.6.30.4] Fix for NULL pointer dereference by SRP initiator triggered by a SCSI reset after the SRP connection has been closed

2009-08-05 Thread Roland Dreier
Now I'm confused about this patch for another reason: @@ -1429,6 +1431,8 @@ static int srp_reset_device(struct scsi_ return FAILED; if (req-tsk_status) return FAILED; +if (!req-scmnd-device) +return FAILED;

[ofa-general] Re: [PATCH 2.6.30.4] Fix for NULL pointer dereference by SRP initiator triggered by a SCSI reset after the SRP connection has been closed

2009-08-05 Thread Roland Dreier
The NULL pointer dereference happens when srp_reset_device() calls srp_send_tsk_mgmt(target, req, SRP_TSK_LUN_RESET) with req-scmnd-device == NULL. When the sg_reset command issues an SG_SCSI_RESET ioctl, scsi_reset_provider() is invoked and allocates an scmnd structure and sets

Re: [ofa-general] [PATCH linux-next 2/5] RDMA/cxgb3: Don't free the endpoint early.

2009-08-05 Thread Roland Dreier
- Endpoint flags now need to be set via atomic bitops because they can be set on both the iw_cxgb3 workqueue thread and user disconnect threads. +if (!test_bit(ABORT_REQ_IN_PROGRESS, ep-com.flags)) { +set_bit(ABORT_REQ_IN_PROGRESS, ep-com.flags); for atomicity, should

Re: [ofa-general] Re: 2.6.30.1: possible irq lock inversion dependency detected

2009-08-05 Thread Roland Dreier
Reported-by: Bart Van Assche bart.vanass...@gmail.com Signed-off-by: Roland Dreier rola...@cisco.com --- drivers/infiniband/ulp/ipoib/ipoib_main.c |7 ++- drivers/infiniband/ulp/ipoib/ipoib_multicast.c |2 ++ 2 files changed, 8 insertions(+), 1 deletions(-) diff --git a/drivers

[ofa-general] Re: [PATCH linux-next 1/5] RDMA/cxgb3: unregister leaks memory.

2009-08-05 Thread Roland Dreier
thanks, applied ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Re: [PATCH linux-next 3/5] RDMA/cxgb3: wake up any waiters on peer close/abort.

2009-08-05 Thread Roland Dreier
this one won't apply without 2/5, so I'll wait for you to resend both patches... ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit

[ofa-general] Re: [PATCH linux-next 4/5] RDMA/cxgb3: Set the appropriate IO channel in rdma_init work requests.

2009-08-05 Thread Roland Dreier
thanks, applied this and 5/5. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] Re: [PATCH] RDMA/nes: map MTU to IB_MTU_* and correctly report link state

2009-08-05 Thread Roland Dreier
thanks, applied ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Re: [PATCH] ipath: strncpy does not null terminate string

2009-08-05 Thread Roland Dreier
--- a/drivers/infiniband/hw/ipath/ipath_mad.c +++ b/drivers/infiniband/hw/ipath/ipath_mad.c @@ -60,7 +60,7 @@ static int recv_subn_get_nodedescription(struct ib_smp *smp, if (smp-attr_mod) smp-status |= IB_SMP_INVALID_FIELD; -strncpy(smp-data,

Re: [ofa-general] [PATCH] cma: fix access to freed memory

2009-08-05 Thread Roland Dreier
rdma_destroy_id and rdma_leave_multicast call ib_sa_free_multicast. This call will block until the join callback completes or is canceled. Can you describe the race with cma_ib_mc_handler in more detail? Also, cma_leave_mc_groups is only called from rdma_destroy_id. Locking

Re: [ofa-general] Setting the rate in Infiniband.

2009-08-05 Thread Roland Dreier
The reason why I need such small rates is because I interface the Infiniband HCA to an FPGA via an Infiniband physical link. Imagine the FPGA as a simple repeater that simply forwards the infiniband signals to the Target HCA. The FPGA cannot handle such a high data rate and neither do I

Re: [ofa-general] Re: [PATCH] cma: fix access to freed memory

2009-08-04 Thread Roland Dreier
Maybe it's just a loose connection but yet, it seems to me that operations on id_priv-mc_list should be protected. Should I send a different patch? seems ... should be is very weak justification for locking. What should they be protected from? - R.

[ofa-general] Re: [PATCH 2.6.30.4] Fix for NULL pointer dereference by SRP initiator triggered by a SCSI reset after the SRP connection has been closed

2009-08-04 Thread Roland Dreier
An update: apparently it is possible to trigger scmnd-device == NULL even without triggering a prior IB CM disconnect. The following shell commands are sufficient to trigger the WARN_ON statement in the patch below: rmmod ib_srp modprobe ib_srp ibsrpdm -c | while read target_info; do

[ofa-general] Re: [PATCH 2.6.30.4] Fix for NULL pointer dereference by SRP initiator triggered by a SCSI reset after the SRP connection has been closed

2009-08-04 Thread Roland Dreier
By the way, Vladislav Bolkhovitin was so kind to inform me that this issue is not specific to the SRP initiator. For more information, see also http://thread.gmane.org/gmane.linux.scsi/26166. I'm not sure I follow this exactly -- the idea is that sg_reset generates SCSI commands that are

  1   2   3   4   5   6   7   8   9   10   >